# CloudSQLVectorStore
> **CloudSQLVectorStore**:
CloudSQLVectorStore lets you create vector stores on the Cloud SQL for PostgreSQL database. It also allows for semantic search, using vector indexes for fast approximate results, or using brute force for exact results.


This tutorial illustrates how to work with an end-to-end data and embedding management system in LangChain, and provide scalable semantic search in CloudSQL for PostgreSQL.

###Pre-requisites

### Install the library

In [None]:
! pip install langchain langchain-community google-cloud google-cloud-aiplatform asyncio asyncpg --upgrade --user
! pip install "cloud-sql-python-connector[asyncpg]"

Collecting langchain
  Downloading langchain-0.1.5-py3-none-any.whl (806 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m806.7/806.7 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-community
  Downloading langchain_community-0.0.18-py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m39.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting google-cloud
  Downloading google_cloud-0.34.0-py2.py3-none-any.whl (1.8 kB)
Collecting google-cloud-aiplatform
  Downloading google_cloud_aiplatform-1.40.0-py2.py3-none-any.whl (3.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m56.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting asyncio
  Downloading asyncio-3.4.3-py3-none-any.whl (101 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m101.8/101.8 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting asyncpg
  Downloading asyncpg-0.29.

Collecting cloud-sql-python-connector[asyncpg]
  Downloading cloud_sql_python_connector-1.6.0-py2.py3-none-any.whl (35 kB)
Installing collected packages: cloud-sql-python-connector
Successfully installed cloud-sql-python-connector-1.6.0


:**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top.

In [None]:
# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

###Note

`If you do not have a GCP project, please follow the below link to create a new project`

[Create a Google Cloud project](https://developers.google.com/workspace/guides/create-project)


#### Set your project ID

If you don't know your project ID, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113).

In [None]:
# @title Project { display-mode: "form" }
PROJECT_ID = "gcp_project_id"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

Updated property [core/project].


#### Set the region

You can also change the `REGION` variable used by CloudSQL Postgres. Learn more about [CloudSQL Postgres regions](https://cloud.google.com/sql/docs/postgres/locations).

In [None]:
# @title Region { display-mode: "form" }
REGION = "US"  # @param {type: "string"}

#### Set the dataset and table names

They will be your CloudSQL Postgres Vector Store.

In [1]:
# @title Instance,  Database and Table { display-mode: "form" }
INSTANCE = "my_cloudsql_instance" # @param {type: "string"}
DATABASE = "my_langchain_database"  # @param {type: "string"}
TABLE = "doc_and_vectors"  # @param {type: "string"}

###Pre-requisites for connecting to the CloudSQL instance

To connect to the postgreSQL instance make sure to setup the cloudSQL auth proxy and ensure the addition of IAM users to the list of authenticated users to connect to the instance.

Refer to this [link](https://github.com/GoogleCloudPlatform/cloud-sql-proxy) to setup auth proxy.

Refer to this [link](https://cloud.google.com/sql/docs/postgres/users?_ga=2.165429503.-1722697531.1694071937) to add users to the instance

### Authenticating your notebook environment

- If you are using **Colab** to run this notebook, uncomment the cell below and continue.
- If you are using **Vertex AI Workbench**, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env).

In [None]:
from google.colab import auth as google_auth

google_auth.authenticate_user()

## Demo: CloudSQL Postgres VectorSearch

### Create an embedding class [instance](https://)

---



You may need to enable Vertex AI API in your project by running
`gcloud services enable aiplatform.googleapis.com --project {PROJECT_ID}`
(replace `{PROJECT_ID}` with the name of your project).

You can use any [LangChain embeddings model](https://python.langchain.com/docs/integrations/text_embedding/).

In [None]:
# Importing the necessary libraries
from langchain_community.vectorstores.cloudSQL import CloudSQLVectorStore
from langchain_community.vectorstores.cloudSQL import CloudSQLEngine
from langchain_community.vectorstores.cloudSQL import HNSWIndex

In [None]:
from langchain_community.embeddings import VertexAIEmbeddings

embedding = VertexAIEmbeddings(
    model_name="textembedding-gecko@latest", project=PROJECT_ID
)

### Create CloudSQLEngine to connect to the database

In [None]:
# ClouSQLVectorStore requires an engine created using the CloudSQLEngine class
engine = CloudSQLEngine.from_instance(
    region = "region_name",
    instance = "instance_name",
    database = "dbname"
)

### Create CloudSQLVectorStore to create a table

In [None]:
# Creating a basic CloudSQLVectorStore object
db = CloudSQLVectorStore(
    engine=engine,
    table_name='table_name',
    embedding_service=embedding)

# Alternatively we can create a non-default vector store object by tweaking the following args:
# vector_size - By default it is set to 768. Can be set to vector size of choice.
# content_column - By default the content column is named 'content'. Can be set to any name of choice.
# embedding_column - By default the embedding column is named 'embedding'. Can be set to any name of choice.
# metadata_columns - By default the metadata column is named 'metadata'. Can be set to any name/ list of names of choice.
# ignore_metadata_columns - By default the ignore_metadata_columns is None. Can be set to any name/ list of names of choice.
# index_query_options - By default the index_query_options is None. Can be set using HNSWIndex.QueryOptions() or IVFFlatIndex.QueryOptions().
# index - By default the index is a HNSWIndex object. Can be set to a IVFFlatIndex object or BruteForce object.
# distance_strategy - By default the distance_strategy is 'L2'. Can be set to 'INNER PRODUCT' or 'COSINE'.
# overwrite_existing - By default the overwrite_existing is False. Can be set to True if table needs to be overwritten.

### Add texts
This method helps add texts into the table

In [None]:
texts = ["Apples and oranges", "Cars and airplanes", "Pineapple", "Train", "Banana"]
metadatas = [{"len": len(t)} for t in texts]
await db.add_texts(texts=texts,metadatas=metadatas)

### Search for documents
The default distance strategy used for querying similar documents is L2

In [None]:
query = "I'd like a fruit."
docs = await db.similarity_search(query)
print(docs)

### Search for documents by vector
Searching for similar documents with list of embeddings as params



In [None]:
query_vector = embedding.embed_query(query)
docs = await db.asimilarity_search_by_vector(query_vector, k=2)
print(docs)

### Search for documents with metadata filter
Additional metadata filtering

In [None]:
# This should only return "Banana" document.
docs = await db.asimilarity_search_by_vector(query_vector, filter={"len": 6})
print(docs)

###Maximum Marginal relevance search (MMR)
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.



In [None]:
# This should return top 4 relevant documents to the given query
docs = await db.amax_marginal_relevance_search(query)
print(docs)

###Indexing
Setting custom indexes/ rebuilding indexes

In [None]:
# This would return None if index is rebuilt or created.
index = HNSWIndex()
await db.areindex(index)