# Google AlloyDB for PostgreSQL

> [AlloyDB](https://cloud.google.com/alloydb) is a fully managed PostgreSQL compatible database service for your most demanding enterprise workloads.
AlloyDB combines the best of Google with PostgreSQL, for superior performance, scale, and availability. Extend your database application to build AI-powered
experiences leveraging AlloyDB Langchain integrations.

This notebook goes over how to use `Alloydb MEM with LangChain` with the `AlloyDBModelManager` and `AlloyDBEmbeddings` class.

Learn more about the package on [GitHub](https://github.com/googleapis/langchain-google-alloydb-pg-python/).

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-alloydb-pg-python/blob/main/docs/chat_message_history.ipynb)

## Before You Begin

To run this notebook, you will need to do the following:

 * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)
 * [Enable the AlloyDB API](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com)
 * [Create a AlloyDB instance](https://cloud.google.com/alloydb/docs/instance-primary-create)
 * [Create a AlloyDB database](https://cloud.google.com/alloydb/docs/database-create)
 * [Add an IAM database user to the database](https://cloud.google.com/alloydb/docs/manage-iam-authn) (Optional
 )
 * [Set up extension and authentication for VertexAI](https://cloud.google.com/alloydb/docs/ai/model-endpoint-register-model)

If your database user is a non super user then run the below command on either psql or AlloyDBStudio

In [None]:
GRANT EXECUTE ON FUNCTION embedding TO USER_NAME;

### 🦜🔗 Library Installation
Install the integration library, `langchain-google-alloydb-pg`.

In [None]:
%pip install --upgrade --quiet langchain-google-alloydb-pg langchain-google-vertexai

**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel.
For Vertex AI Workbench you can restart the terminal using the button on top.

In [None]:
# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

### 🔐 Authentication
Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.

* If you are using Colab to run this notebook, use the cell below and continue.
* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env).

In [None]:
from google.colab import auth

auth.authenticate_user()

### ☁ Set Your Google Cloud Project
Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.

If you don't know your project ID, try the following:

* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113).

In [None]:
# @title Project { display-mode: "form" }
PROJECT_ID = "gcp_project_id"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

## Basic Usage

### Set AlloyDB database values
Find your database values, in the [AlloyDB cluster page](https://console.cloud.google.com/alloydb?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687).

In [None]:
# @title Set Your Values Here { display-mode: "form" }
REGION = "us-central1"  # @param {type: "string"}
CLUSTER = "my-alloydb-cluster"  # @param {type: "string"}
INSTANCE = "my-alloydb-instance"  # @param {type: "string"}
DATABASE = "my-database"  # @param {type: "string"}
TABLE_NAME = "message_store"  # @param {type: "string"}

### AlloyDBEngine Connection Pool

One of the requirements and arguments to establish AlloyDB Model endpoint Management is a `AlloyDBEngine` object. The `AlloyDBEngine`  configures a connection pool to your AlloyDB database, enabling successful connections from your application and following industry best practices.

To create a `AlloyDBEngine` using `AlloyDBEngine.from_instance()` you need to provide only 5 things:

1. `project_id` : Project ID of the Google Cloud Project where the AlloyDB instance is located.
1. `region` : Region where the AlloyDB instance is located.
1. `cluster`: The name of the AlloyDB cluster.
1. `instance` : The name of the AlloyDB instance.
1. `database` : The name of the database to connect to on the AlloyDB instance.

By default, [IAM database authentication](https://cloud.google.com/alloydb/docs/connect-iam) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the environment.

Optionally, [built-in database authentication](https://cloud.google.com/alloydb/docs/database-users/about) using a username and password to access the AlloyDB database can also be used. Just provide the optional `user` and `password` arguments to `AlloyDBEngine.from_instance()`:

* `user` : Database user to use for built-in database authentication and login
* `password` : Database password to use for built-in database authentication and login.


**Note**: This tutorial demonstrates the async interface.

In [None]:
from langchain_google_alloydb_pg import AlloyDBEngine

engine = await AlloyDBEngine.afrom_instance(
    project_id=PROJECT_ID,
    region=REGION,
    cluster=CLUSTER,
    instance=INSTANCE,
    database=DATABASE,
)

## Create an AlloyDBModelManager instance
The `AlloyDBModelManager` class allows the user to create a model, list a model, list all models and drop a model. By creating a model using this, the `AlloyDBEmbedding` class can be used to call that specific model to embed all the data and added into the vectorstore.

In [None]:
from langchain_google_alloydb_pg import AlloyDBModelManager

model_manager = AlloyDBModelManager(engine)

On creating the `AlloyDBModelManager` object, it will run a prerequisite check.
In case of an error make sure:
* extension is up to date : `google_ml_integration` extension is installed and the version is greater than 1.3
* db flag is set : `google_ml_integration.enable_model_support` is set to on.

#### To list all models available
This list includes the two pre built models:
* `textembedding-gecko`
* `textembedding-gecko@001`

And any other model you may have created

The alist_models function returns `List[AlloyDBModel]`.

In [None]:
print(await model_manager.alist_models())

#### To create a custom text embedding model
The acreate_model function has three required parameters:
* model_id: A unique ID for the model endpoint that you define.
* model_provider: The provider of the model endpoint (`google` for vertexAI and `custom` for custom  hosted models).
* model_type: The model type (set this value to `text_embedding` for text embedding model endpoints or `generic` for all other model endpoints).

The rest of the arguments you can pass are optional. You can find a link to the optional parameters [here](https://cloud.google.com/alloydb/docs/reference/model-endpoint-reference#google_mlcreate_model).

This function returns a None type.

If the model has been successfully created, it will be visible using the alist_models function call.

In [None]:
await model_manager.acreate_model(
model_id="textembedding-gecko@003",
model_provider="google",
model_qualified_name='textembedding-gecko@003',
model_type='text_embedding')

#### To get a specific model
The aget_model function has one required parameter:
* model_id: A unique ID for the model endpoint that has been defined


This function returns `Optional[AlloyDBModel]`

If the model with the specified model_id exists, then the AlloyDBModel dataclass of it is returned.
Otherwise None is returned.

In [None]:
await model_manager.aget_model(model_id="textembedding-gecko")

#### To drop a specific model
The adrop_model function has one required parameter:
* model_id: A unique ID for the model endpoint that has been defined

The function returns a None type on successful execution.

This can be confirmed by using the alist_models function call.


In [None]:
await model_manager.adrop_model(model_id="textembedding-gecko@003")

## AlloyDBModel Dataclass
The `AlloyDBModel` dataclass is used to return any model(s) being fetched using the `AlloyDBModelManager`.

The member variables of this class are:
* model_id (str) : A unique ID for the model endpoint that you define.
* model_request_url (Optional[str]) : The model-specific endpoint when adding other text embedding and generic model endpoints.
* model_provider (str) : The provider of the model endpoint. Set to google for Vertex AI model endpoints and custom for custom-hosted model endpoints.
* model_type (str) : The model type. You can set this value to text_embedding for text embedding model endpoints or generic for all other model endpoints.
* model_qualified_name (Optional[str]) : The fully qualified name in case the model endpoint has multiple versions or if the model endpoint defines it
* model_auth_type (Optional[str]) : The authentication type used by the model endpoint. You can set it to either alloydb_service_agent_iam for Vertex AI models or secret_manager for other providers.
* model_auth_id (Optional[str]) : The secret ID that you set and is subsequently used when registering a model endpoint.
* input_transform_fn (Optional[str]) : The SQL function name to transform input of the corresponding prediction function to the model-specific input.
* output_transform_fn (Optional[str]) : The SQL function name to transform model specific output to the prediction function output.

## Create an AlloyDBEmbeddings instance
The `AlloyDBEmbeddings` class allows users to utilize the AlloyDB Embeddings available via Model Endpoint Management.

In [None]:
from langchain_google_alloydb_pg import AlloyDBEmbeddings
model_id = 'textembedding-gecko'
embedding_service = AlloyDBEmbeddings(engine=engine, model_id=model_id)

On creating an instance of the `AlloyDBEmbeddings` class, it creates an `AlloyDBModelManager` object to check if the model_id passed actually belongd to a model.
If the model does not exist with that model_id, the class throws a `ValueError`.

#### To embed a query

The function aembed_query (embed_query for syncEngine) uses Model Endpoint Management to generate embeddings for a given query.

The function returns `List[Float]` which is the embedding of the supplied query.

In [None]:
embedding_list = await embedding_service.aembed_query("test query")

The `AlloyDBEmbeddings` class can be used as the embedding service while creating the `AlloyDBVectorStore`.

You can find a detailed sample notebook on `AlloyDBVectorStore` [here](vector_store.ipynb)

In [None]:
from langchain_google_alloydb_pg import AlloyDBVectorStore
import uuid
from langchain_core.documents import Document

texts = ["foo", "bar", "baz", "boo"]
VECTOR_SIZE = 768 # For textembeddding-gecko model
ids = [str(uuid.uuid4()) for i in range(len(texts))]
metadatas = [{"page": str(i), "source": "google.com"} for i in range(len(texts))]
docs = [
    Document(page_content=texts[i], metadata=metadatas[i]) for i in range(len(texts))
]

await engine.ainit_vectorstore_table(
  table_name='vector_store_table',
  vector_size=VECTOR_SIZE,
  overwrite_existing=True,
)

vs = await AlloyDBVectorStore.create(
  engine,
  embedding_service=embedding_service,
  table_name="vector_store_table",
  )

await vs.aadd_documents(docs, ids=ids)

#### To use `AlloyDBVectorStore` functionalities

To use asimilarity_serach_by_vector / amax_marginal_relevance_search_vector / amax_marginal_relevance_search_vector,

You can generate the embeddings for a query using the `AlloyDBEmbeddings` class and use that with the vector store.

In [None]:
search_embedding = embedding_service.embed_query("foo")
results = await vs.asimilarity_search_by_vector(search_embedding)

You can also use the other functions such as:
* asimliraity_search
* asimliraity_search_score
* amax_marginal_relevance_search

And more which are a part of the `AlloyDBVectorStore` class
to use the provided `AlloyDBEmbedding` instance implicitly.

In [None]:
results = await vs.asimilarity_search("foo", k=1)