# BigtableVectorStore

This guide covers how to use Google Cloud Bigtable as a vector store.

[Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data. 

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-bigtable-python/blob/main/docs/vector_store.ipynb)

## Overview

The `BigtableVectorStore` uses Google Cloud Bigtable to store documents and their vector embeddings for similarity search and retrieval. It supports powerful metadata filtering to refine search results.

### Integration details
| Class | Package | Local | JS support | Package downloads | Package latest |
| :--- | :--- | :---: | :---: | :---: | :---: |
| [BigtableVectorStore](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/vector_store.py) | [langchain-google-bigtable](https://pypi.org/project/langchain-google-bigtable/) | ❌ | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-google-bigtable?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-google-bigtable) |

## Setup

### Prerequisites

To get started, you will need a Google Cloud project with an active Bigtable instance.
* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)
* [Enable the Bigtable API](https://console.cloud.google.com/flows/enableapi?apiid=bigtable.googleapis.com)
* [Create a Bigtable instance](https://cloud.google.com/bigtable/docs/creating-instance)

### Installation

The integration is in the `langchain-google-bigtable` package. The command below also installs `langchain-google-vertexai` to use for an embedding service.

In [None]:
%pip install -qU langchain-google-bigtable langchain-google-vertexai

### ☁ Set Your Google Cloud Project
Set your Google Cloud project to use its resources within this notebook.

If you don't know your project ID, you can run `gcloud config list` or see the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113).

In [None]:
# @markdown Please fill in your project, instance, and a new table name.
PROJECT_ID = "your-gcp-project-id"  # @param {type:"string"}
INSTANCE_ID = "your-instance-id"  # @param {type:"string"}
TABLE_ID = "your-vector-store-table"  # @param {type:"string"}

!gcloud config set project {PROJECT_ID}

### 🔐 Authentication
Authenticate to Google Cloud to access your project resources.
- For **Colab**, use the cell below.
- For **Vertex AI Workbench**, see the [setup instructions](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env).

In [None]:
from google.colab import auth

auth.authenticate_user()

## Instantiation

To use `BigtableVectorStore`, we first need an embedding service, then we ensure a table exists, and finally we initialize the store.

### Embedding Service
We'll use Vertex AI embeddings for this example.

In [None]:
from langchain_google_vertexai import VertexAIEmbeddings

embeddings = VertexAIEmbeddings(
    project=PROJECT_ID, model_name="textembedding-gecko@003"
)

### Initialize Table
The `init_vector_store_table` helper function creates a table with the necessary column families. If the table already exists, it does nothing.

In [None]:
from langchain_google_bigtable.vector_store import init_vector_store_table

DATA_COLUMN_FAMILY = "doc_data"

try:
    init_vector_store_table(
        project_id=PROJECT_ID,
        instance_id=INSTANCE_ID,
        table_id=TABLE_ID,
        column_families=[DATA_COLUMN_FAMILY],
    )
except ValueError as e:
    print(e)

### BigtableVectorStore
Now we can create the vector store instance. We configure it with `metadata_mappings` for filtering and the optional `metadata_as_json_column` for efficient retrieval of all metadata.

In [None]:
from langchain_google_bigtable import (
    BigtableVectorStore,
    BigtableEngine,
    ColumnConfig,
    VectorMetadataMapping,
    Encoding,
)

# A BigtableEngine is recommended for managing clients and async operations.
engine = await BigtableEngine.async_initialize(project_id=PROJECT_ID)

# Define mappings for metadata fields you want to filter on.
metadata_mappings = [
    VectorMetadataMapping(metadata_key="author", encoding=Encoding.UTF8),
    VectorMetadataMapping(metadata_key="year", encoding=Encoding.INT_BIG_ENDIAN),
    VectorMetadataMapping(metadata_key="category", encoding=Encoding.UTF8),
    VectorMetadataMapping(metadata_key="rating", encoding=Encoding.FLOAT),
]

# Define the optional column for storing all metadata as a single JSON string.
metadata_as_json_column = ColumnConfig(
    column_family=DATA_COLUMN_FAMILY, column_qualifier="metadata_json"
)

vector_store = await BigtableVectorStore.create(
    engine=engine,
    instance_id=INSTANCE_ID,
    table_id=TABLE_ID,
    embedding_service=embeddings,
    collection="my_docs",
    content_column=DATA_COLUMN_FAMILY,
    embedding_column=DATA_COLUMN_FAMILY,
    metadata_mappings=metadata_mappings,
    metadata_as_json_column=metadata_as_json_column,
)

## Usage

The store supports both sync and async methods. This guide uses the async versions.

### Add Documents
You can add documents with pre-defined IDs. If a `Document` is added without an `id` attribute, the vector store will automatically generate a `uuid4` string for it.

In [None]:
from langchain_core.documents import Document

docs = [
    Document(
        page_content="A young farm boy, Luke Skywalker, is thrust into a galactic conflict.",
        id="doc_1",
        metadata={
            "author": "George Lucas",
            "year": 1977,
            "category": "sci-fi",
            "rating": 4.8,
        },
    ),
    Document(
        page_content="A hobbit named Frodo Baggins must destroy a powerful ring.",
        id="doc_2",
        metadata={
            "author": "J.R.R. Tolkien",
            "year": 1954,
            "category": "fantasy",
            "rating": 4.9,
        },
    ),
    # Document without a pre-defined ID
    Document(
        page_content="A group of children confront an evil entity emerging from the sewers.",
        metadata={"author": "Stephen King", "year": 1986, "category": "horror"},
    ),
]

added_ids = await vector_store.aadd_documents(docs)
print(f"Added documents with IDs: {added_ids}")

### Delete Documents

In [None]:
await vector_store.adelete(ids=["doc_1"])

### Similarity Search
This is the most common operation: finding documents similar to a query.

In [None]:
results = await vector_store.asimilarity_search("a powerful ring", k=1)
print(results[0].page_content)

#### Search with Score
You can also retrieve the distance score along with the documents.

In [None]:
results_with_scores = await vector_store.asimilarity_search_with_score(
    "an evil entity", k=1
)
for doc, score in results_with_scores:
    print(f"[SCORE: {score:.4f}] {doc.page_content}")

#### Search with Filters
Bigtable offers powerful pre-filtering to narrow the search space. All filter logic is passed through a `QueryParameters` object.

| Filter Category | Key / Operator | Meaning |
|---|---|---|
| **Row Key** | `RowKeyFilter` | Narrows search to document IDs with a specific prefix. |
| **Metadata Key** | `ColumnQualifiers` | Checks for the presence of one or more exact metadata keys. |
| | `ColumnQualifierPrefix` | Checks if a metadata key starts with a given prefix. |
| | `ColumnQualifierRegex` | Checks if a metadata key matches a regular expression. |
| **Metadata Value** | `ColumnValueFilter` | Container for all value-based conditions. |
| | `==`, `!=`, `>`, `<`, `>=`, `<=` | Standard comparison operators. |
| | `in`, `nin` | List membership checks. |
| | `contains`, `like` | String matching (substring and regex). |
| **Logical**| `ColumnValueChainFilter` | Logical AND for combining value conditions. |
| | `ColumnValueUnionFilter` | Logical OR for combining value conditions. |

This example combines multiple filters: it searches for documents where the ID starts with `doc_`, the metadata must contain a `rating` key, AND the document is either (a `fantasy` book) OR (written by `George Lucas`).

In [None]:
from langchain_google_bigtable.vector_store import QueryParameters

complex_filter = {
    "RowKeyFilter": "doc_",
    "ColumnQualifiers": ["rating"],
    "ColumnValueFilter": {
        "ColumnValueUnionFilter": {  # OR
            "category": {"==": "fantasy"},
            "author": {"==": "George Lucas"},
        }
    },
}

query_params_complex = QueryParameters(filters=complex_filter)

complex_results = await vector_store.asimilarity_search(
    "a story about a hero's journey", k=5, query_parameters=query_params_complex
)

for doc in complex_results:
    print(f"- ID: {doc.id}, Metadata: {doc.metadata}")

### As a Retriever
The vector store can be easily used as a retriever in RAG applications.

In [None]:
retriever = vector_store.as_retriever(search_kwargs={"k": 1})
retrieved_docs = await retriever.ainvoke("a farm boy")
print(retrieved_docs[0].page_content)

## API reference

For full details on the `BigtableVectorStore` class, see the API reference documentation.