## Vector Search

In [None]:
!pip install langchain-google-vertexai langchain-google-community

In [None]:
import os

## Embeddings

The `Embedding` class provides a standardized `Runnable` for embedding models.
The key methods of this class are` embed_quer`y and` embed_document`sts.

In [None]:
from langchain_google_vertexai import VertexAIEmbeddings

model_name = "<embedding_model_name>"
embedding_model = VertexAIEmbeddings(model_name=model_name)

single_embedding = embedding_model.embed_query("User query")
multiple_embeddings = embedding_model.embed_documents([
	"Sample text 1",
	"Sample text 2",
	"Sample text 3",
])

The argument `model_name` specifies the version of the VertexAI model to use.

For the most up-to-date information on available versions and their capabilities, please
refer to the official Vertex AI embedding model [documentation page](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings)

## VectorStores

The `VectorStore class` abstracts the entire vector search process at query time,
providing a` similarity_searc`h method that accepts a query and returns a list of
the most similar documents in the index

. Optionally, you can add the paramete`r` k to
specify how many similar documents you wish to retrieve.

In [None]:
from langchain_google_vertexai.vectorstores import VectorSearchVectorStore

embeddings = VertexAIEmbeddings(model_name="textembedding-gecko-default")

vector_store = VectorSearchVectorStore.from_components(
        project_id=os.environ["PROJECT_ID"],
        region=os.environ["REGION"],
        gcs_bucket_name=os.environ["GCS_BUCKET_NAME"],
        index_id=os.environ["INDEX_ID"],
        endpoint_id=os.environ["ENDPOINT_ID"],
        embedding=embeddings
)

documents = vector_store.similarity_search("user_query", k=2)

`VectorStore` instances can be converted into a standard retriever by invoking the`
as_retrieve`r method, enabling the use of the basic retriever interface.

In [None]:
retriever = vector_store.as_retriever()
documents = retriever.get_relevant_documents("user_query", k=2)

## VertexVectorSearch

LangChain offers two distinct `VectorStores` for integration with Vector Search, each
differentiated by the underlying Document Store they utilize.

- ` VectorSearchVectorStor` 
uses Google Cloud Stora
- `e VectorSearchVectorStoreDatasto` uses DataStorerere

In [None]:
from langchain_google_vertexai import (
    VectorSearchVectorStore, # GCS Document Store
    VectorSearchVectorStoreDatastore, # DataStore Document store
)

We can construct a `VectorSearchVectorStore` instance using the 
following snippet

In [None]:
vector_store = VectorSearchVectorStoreDatastore.from_components(
    project_id="my-project-id",
    region="my-region",
    index_id="my-index-name",
    endpoint_id="my-endpoint-name",
    embedding=embedding_model,
    stream_update=True,
)

Regardless of your chosen storage backend (Datastore or Google Cloud Storage), the 
methods for interacting with the vector store remain consistent.

In [None]:
texts = [
    "This is the first document",
    "This is the second document",
    "This is the third document"
]
vector_store.add_texts(texts=texts)

Optionally, if you anticipate using filtering later, you can enrich your documents with
metadata during the addition process. This metadata will be stored both within Vector
Search for efficient retrieval and in your chosen Document Store (either Datastore or
Google Cloud Storage) for further processing or analysis.

In [None]:
texts = [
    "This is the first document",
    "This is the second document",
    "This is the third document"
]

metadatas = [
{"page_number": 1, "length": 10},
{"page_number": 2, "length": 20},
{"page_number": 3, "length": 5}
]

vector_store.add_texts(texts=texts, metadatas=metadatas)

We can use the `VectorStore` interface to perform similarity searches:

In [None]:
documents = vector_store.similarity_search("first", k=1)

The following code snippet shows how to perform a similarity search
using both numerical and string comparison filters:

In [None]:
from google.cloud.aiplatform.matching_engine.matching_engine_index_endpoint import (
    Namespace,
    NumericNamespace,
)

filters = [Namespace(name="season", allow_tokens=["spring"])]
numeric_filters = [NumericNamespace(name="price", value_float=40.0, op="LESS")]

# Below code should return 2 results now
vector_store.similarity_search(
    "shirt", k=5, filter=filters, numeric_filter=numeric_filters
)

## CloudSQL

To leverage the LangChain integration with Cloud SQL, you'll need to install an
additional library alongside` langchain-google-ertexai`p:

In [None]:
!pip install --upgrade langchain-google-cloud-sql-pg

The initial step involves constructing the engine object. 

This object defines the Google
Cloud project, location, database, and instance that the `VectorStore` will interact with.

In [None]:
from langchain_google_cloud_sql_pg import PostgresEngine

engine = PostgresEngine.from_instance(
    project_id="my-project-id", 
    region="my-region", 
    instance="my-instance-name", 
    database="my-database-name"
)


Making use of the engine object we just created, we can use the
`init_vectorstore_table` method to create the necessary table in the
database if it doesn't already exist.

In [None]:
from langchain_google_cloud_sql_pg import Column

engine.init_vectorstore_table(
    table_name="my-table-name",
    vector_size=768,      
    metadata_columns=[Column("PRICE", "FLOAT")],
)


Once the engine and the table are in place, you can proceed to initialize the
`VectorStore`. 

As with other integrations, you will also need to build an embedding
model.

In [None]:
from langchain_google_cloud_sql_pg import PostgresVectorStore
from langchain_google_vertexai import VertexAIEmbeddings


embedding = VertexAIEmbeddings(
  model_name="textembedding-gecko@latest", 
  project="my-project-id"
)

store = PostgresVectorStore(
    engine=engine,
    table_name="my-table-name",
    embedding_service=embedding,
)

You can add documents using the `add_documents` or `
add_text`s methods.

 If you've created metadata columns, make sure to include thei 
values when adding documents.

In [None]:
import uuid

all_texts = ["Blue T-shirt", "Spring dress", "Black sunglasses"]
metadatas = [{"PRICE": 21.0}, {"PRICE": 23.0}, {"PRICE": 33.1}]
ids = [str(uuid.uuid4()) for _ in all_texts]

store.add_texts(all_texts, metadatas=metadatas, ids=ids)


To execute a similarity search, you can utilize the `similarity_search` method.

In [None]:
query = "I want glasses"
docs = store.similarity_search(query, k=2, filter="PRICE <= 10")


As a final note, to optimize query performance, this `VectorStore` also provides a method
for creating and applying a vector index to the table, as illustrated below:

In [None]:
from langchain_google_cloud_sql_pg.indexes import IVFFlatIndex

idx = IVFFlatIndex()
store.apply_vector_index(idx)


## BigQuery

In [None]:
PROJECT = "jzaldivar-test-project"
LOCATION = "europe-west1"
DATASET = "lcbook_dataset"
TABLE_NAME = "my-table-name"

First, we create a dataset to store the data, using `google.cloud` bigquery client.

In [None]:
from google.cloud import bigquery

client = bigquery.Client(project=PROJECT, location=LOCATION)
client.create_dataset(dataset=DATASET, exists_ok=True)

We must initialize a specialized `VectorStore` class to use
VectorSearch in BigQuery. In this case, it is available through`
langchaigoogle_n_communi`ty under the nam`BigQueryVectorStore`h

We must then initialize the specialized `VectorStore` class to use VectorSearch in Bigquery. 

It is available in the library `langchain_google_community`, under the name `BigQueryVectorStore`

In [None]:
from langchain_google_vertexai import VertexAIEmbeddings
from langchain.vectorstores.utils import DistanceStrategy
from langchain_google_community import BigQueryVectorStore

embedding = VertexAIEmbeddings(
  model_name="textembedding-gecko@latest", 
  project=PROJECT
)

store = BigQueryVectorStore(
    project_id=PROJECT,
    dataset_name=DATASET,
    table_name=TABLE_NAME,
    location=LOCATION,
    embedding=embedding,
)

The interface for adding texts to `BigQueryVectorStore` is consistent with other
`VectorStore` subclasses. However, it's important to note that metadata, unlike in the
CloudSQL integration, can only be stored within the designated `metadata_field` as
a JSON object. This means you'll need to structure your metadata accordingly to
leverage it for filtered searches within BigQuery

In [None]:
all_texts = [
    "Blue T-Shirt", 
    "Spring Dress", 
    "Black sunglasses", 
]
metadatas = [{"len": len(t)} for t in all_texts]

store.add_texts(all_texts, metadatas=metadatas)


While basic filtering is supported in this way, the `BigQueryVectorStore`
integration does not yet allow full SQL statement filtering. More complex filtering
operations may require the use of raw SQL.

In [None]:
docs = store.similarity_search("I want a dress", filter={"len": 12})
print(docs)