# Spanner

[Spanner](https://cloud.google.com/spanner) is the world’s first fully managed relational database service to offer both strong consistency and horizontal scalability for mission-critical online transaction processing (OLTP) applications.

## Setting up

To run this notebook, you will need a [Google Cloud Project](https://developers.google.com/workspace/guides/create-project), a [Spanner instance](https://cloud.google.com/spanner/docs/create-query-database-console#create-instance) with a [database](https://cloud.google.com/spanner/docs/create-query-database-console#create-database), and [Google credentials](https://developers.google.com/workspace/guides/create-credentials).

In [None]:
%pip install langchain-google-spanner

## Querying for Documents from Spanner

For more details on connecting to a Bigtable table, please check the [Python SDK documentation](https://cloud.google.com/python/docs/reference/spanner/latest).

In [None]:
from langchain_google_spanner import SpannerLoader

instance_id = "my_instance"
database_id = "my_database"
table_name = "my_table"
query = f"SELECT * from {table_name}"

### Create the loader

In [None]:
loader = SpannerLoader(
    instance_id,
    database_id,
    query,
)

### Load from table

You can fetch the documents by calling the `lazy_load` method that returns an iterator of documents.

In [None]:
for doc in loader.lazy_load():
    print(doc)
    break

### Custom client

The client created by default is the default client. To pass in `credentials` and `project` explicitly, a custom client can be passed to the constructor.

In [None]:
from google.cloud import spanner

custom_client = spanner.Client(project="my-project", credentials=creds)
loader = SpannerLoader(
    instance_id,
    database_id,
    query,
    client=custom_client,
)

### Customize Document Page Content & Metadata

The loader will returns a list of Documents with page content from a specific data columns. All other data columns will be added to metadata. Each row becomes a document.

#### Customize page content format

The SpannerLoader assumes there is a column called `page_content`. These defaults can be changed like so:

In [None]:
custom_content_loader = SpannerLoader(
    instance_id, database_id, query, content_columns=["custom_content"]
)

If multiple columns are specified, the page content's string format will default to `text` (space-separated string concatenation). There are other format that user can specify, including `text`, `JSON`, `YAML`, `CSV`.

#### Customize metadata format

The SpannerLoader assumes there is a metadata column called `langchain_metadata` that store JSON data. The metadata column will be used as the base dictionary. By default, all other column data will be added and may overwrite the original value. These defaults can be changed like so:

In [None]:
custom_metadata_loader = SpannerLoader(
    instance_id, database_id, query, metadata_columns=["column1", "column2"]
)

#### Customize JSON metadata column name

By default, the loader uses `langchain_metadata` as the base dictionary. This can be customized to select a JSON column to use as base dictionary for the Document's metadata.

In [None]:
custom_metadata_json_loader = SpannerLoader(
    instance_id, database_id, query, metadata_json_column="another-json-column"
)

### Custom staleness

The default [staleness](https://cloud.google.com/python/docs/reference/spanner/latest/snapshot-usage#beginning-a-snapshot) is 15s. This can be customized by specifying a weaker bound (which can either be to perform all reads as of a given timestamp), or as of a given duration in the past.

In [None]:
timestamp = datetime.datetime.utcnow()
custom_timestamp_loader = SpannerLoader(
    instance_id,
    database_id,
    query,
    staleness=timestamp,
)

In [None]:
duration = 20.0
custom_duration_loader = SpannerLoader(
    instance_id,
    database_id,
    query,
    staleness=duration,
)

### Turn on data boost

By default, the loader will not use [data boost](https://cloud.google.com/spanner/docs/databoost/databoost-overview) since it has additional costs associated, and require additional IAM permissions. However, user can choose to turn it on.

In [None]:
custom_databoost_loader = SpannerLoader(
    instance_id,
    database_id,
    query,
    databoost=True,
)

## Save Documents to table

It is possible to save documents into Spanner using the SpannerDocumentSaver. The SpannerDocumentSaver constructor is very similar to the SpannerLoader's one. In order to use the document saver, you will need to have existing [table](https://cloud.google.com/spanner/docs/create-query-database-console#create-schema).

In [None]:
from langchain_google_spanner import SpannerDocumentSaver

instance_id = "my_instance"
database_id = "my_database"
table_name = "my_table"

saver = SpannerDocumentSaver(
    instance_id,
    database_id,
    table_name,
)

### Add documents

The SpannerDocumentSaver can be used to add pre-processed documents into Spanner.

In [None]:
doc = Document(page_content="my-doc", metadata={"foo": "bar", "foo2": "bar2"})
saver.add_documents([doc])

### Delete documents

The SpannerDocumentSaver can be used to delete all instances of a document from the table by matching the entier Document object.

In [None]:
doc = Document(page_content="my-doc", metadata={"foo": "bar", "foo2": "bar2"})
saver.delete([doc])

### Custom client

The client created by default is the default client. To pass in `credentials` and `project` explicitly, a custom client can be passed to the constructor.

In [None]:
from google.cloud import spanner

custom_client = spanner.Client(project="my-project", credentials=creds)
saver = SpannerDocumentSaver(
    instance_id,
    database_id,
    table_name,
    client=custom_client,
)

### Custom initialization

The SpannerDocumentSaver allows custom initialization. This allows user to specify how the Document is saved into the table.


content_column: This will be used as the column name for the Document's page content. Defaulted to `page_content`.

metadata_columns: These metadata will be saved into specific columns if the key exists in the Document's metadata.

metadata_json_column: This will be the column name for the spcial JSON column. Defaulted to `langchain_metadata`.

In [None]:
custom_saver = SpannerDocumentSaver(
    instance_id,
    database_id,
    table_name,
    content_column="my-content",
    metadata_columns=["foo"],
    metadata_json_column="my-special-json-column",
)

### Initialize custom Spanner table

The SpannerDocumentSaver will have a `init_document_table` method to create a new table to store docs with custom schema.

In [None]:
from langchain_google_spanner import Column

new_table_name = "my_new_table"

SpannerDocumentSaver.init_document_table(
    instance_id,
    database_id,
    new_table_name,
    content_column="my-page-content",
    metadata_columns=[
        Column("category", "STRING(36)", True),
        Column("price", "FLOAT64", False),
    ],
)