# Cloud SQL for PostgreSQL

> [Cloud SQL](https://cloud.google.com/sql/docs/postgres) is a fully managed relational database service for PostgreSQL. This frees you from database administration tasks so that you have more time to manage your data.

This tutorial illustrates how to work with an end-to-end data and embedding management system in LangChain, and provide scalable semantic search in CloudSQL for PostgreSQL.

## Getting started


### Install the library

In [None]:
%pip install --upgrade --quiet  langchain-google-cloud-sql-pg langchain langchain-google-vertexai

**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top.

In [None]:
# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

## Before you begin

* [Enable the Cloud SQL Admin API.](https://console.cloud.google.com/apis?_ga=2.123554824.2062268965.1707700487-2088871159.1707257687)
* [Create a Cloud SQL instance.](https://cloud.google.com/sql/docs/mysql/connect-instance-auth-proxy#create-instance)
* [Add a IAM User to the database.](https://cloud.google.com/sql/docs/mysql/add-manage-iam-users#creating-a-database-user)

#### Set your project ID

If you don't know your project ID, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113).

In [None]:
# @title Project { display-mode: "form" }
PROJECT_ID = ""  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

#### Set database values

Find your database values, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687).

In [None]:
# @title Dataset and Table { display-mode: "form" }
REGION = "my_langchain_dataset"  # @param {type: "string"}
INSTANCE_ID = "langchain-instance" # @param {type: "string"}
DATABASE_ID = "vectorstore" # @param {type: "string"}
TABLE_NAME = "doc_and_vectors"  # @param {type: "string"}

### Authenticating your notebook environment

- If you are using **Colab** to run this notebook, uncomment the cell below and continue.
- If you are using **Vertex AI Workbench**, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env).

In [None]:
from google.colab import auth as google_auth

google_auth.authenticate_user()

## Demo: CloudSQLVectorStore

### Create an embedding class instance

You may need to enable Vertex AI API in your project by running
`gcloud services enable aiplatform.googleapis.com --project {PROJECT_ID}`
(replace `{PROJECT_ID}` with the name of your project).

You can use any [LangChain embeddings model](https://python.langchain.com/docs/integrations/text_embedding/).

In [None]:
from langchain_google_vertexai import VertexAIEmbeddings

embedding = VertexAIEmbeddings(
    model_name="textembedding-gecko@latest", project=PROJECT_ID
)

### Create PostgreSQLEngine to connect to the database

In [None]:
from langchain_google_cloud_sql_pg import PostgreSQLEngine
# ClouSQLVectorStore requires an engine created using the PostgreSQLEngine class
engine = PostgreSQLEngine.from_instance(
    region=REGION, instance=INSTANCE_ID, database=DB_NAME
)

### Option A. Create table and initialize CloudSQLVectorStore

In [None]:
from langchain_google_cloud_sql_pg import CloudSQLVectorStore

# Create a table
engine.init_vectorstore_table(
    table_name=TABLE_NAME,
    vector_size=768, # VertexAI model: textembedding-gecko@latest
)
# Init vectorstore
store = CloudSQLVectorStore(
    engine=engine, table_name=TABLE_NAME, embedding_service=embedding
)

### Option B. Create a custom table

In [None]:
from langchain_google_cloud_sql_pg import Column

engine.init_vectorstore_table(
    table_name=TABLE_NAME,
    vector_size=768, # VertexAI model: textembedding-gecko@latest
    id_column="uuid",
    content_column="documents",
    embedding_column="vectors",
    metadata_columns=[
        Column("page", "INTEGER"),
        Column("source", "TEXT")
    ],
)

### Option C. Reconnect to a Vector Store

In [None]:
from langchain_google_cloud_sql_pg import CloudSQLVectorStore

# Creating a basic CloudSQLVectorStore object
store = CloudSQLVectorStore(
    engine=engine,
    table_name=TABLE_NAME,
    embedding_service=embedding,
    id_column="myid"
    content_column="my",
    embedding_column="vectors",
    metadata_columns=["page", "source"], # Add custom metadata columns
)

### Add texts

In [None]:
import uuid

all_texts = ["Apples and oranges", "Cars and airplanes", "Pineapple", "Train", "Banana"]
metadatas = [{"len": len(t)} for t in all_texts]
ids = [ str(uuid.uuid4()) for _ in all_texts]

store.add_texts(all_texts, metadatas=metadatas, ids=ids)

### Delete texts

In [None]:
store.delete([ids[1]])

### Search for documents

In [None]:
query = "I'd like a fruit."
docs = store.similarity_search(query)
print(docs)

### Search for documents by vector

In [None]:
query_vector = embedding.embed_query(query)
docs = store.similarity_search_by_vector(query_vector, k=2)
print(docs)

### Search for documents with metadata filter

In [None]:
# This should only return "Banana" document.
docs = store.similarity_search_by_vector(query_vector, filter="len >= 6")
print(docs)

## Add a Index

In [None]:
index = IVFFlatIndex()
store.apply_vector_index(index)

### Re-index

In [None]:
store.reindex() # Re-index using default index name

### Remove an index

In [None]:
store.drop_index()