# PGVectorStore

`PGVectorStore` is a an implementation of the the LangChain vectorstore abstraction using `postgres` as the backend.

## Requirements

You'll need a PostgreSQL database with the `pgvector` extension enabled.


For local development, you can use the following docker command to spin up the database:

```shell
docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16
```

## Install

Install the `langchain-postgres` package.

In [None]:
%pip install --upgrade --quiet langchain-postgres

## Create an engine

The first step is to create a `PGEngine` instance, which does the following:

1. Allows you to create tables for storing documents and embeddings.
2. Maintains a connection pool that manages connections to the database. This allows sharing of the connection pool and helps to reduce latency for database calls.

In [None]:
from langchain_postgres import PGEngine

# See docker command above to launch a Postgres instance with pgvector enabled.
# Replace these values with your own configuration.
POSTGRES_USER = "langchain"
POSTGRES_PASSWORD = "langchain"
POSTGRES_HOST = "localhost"
POSTGRES_PORT = "6024"
POSTGRES_DB = "langchain"

CONNECTION_STRING = (
    f"postgresql+asyncpg://{POSTGRES_USER}:{POSTGRES_PASSWORD}@{POSTGRES_HOST}"
    f":{POSTGRES_PORT}/{POSTGRES_DB}"
)

pg_engine = PGEngine.from_connection_string(url=CONNECTION_STRING)

To use psycopg3 driver, set your connection string to `postgresql+psycopg://`

## Create a document collection

Use the `PGEngine.ainit_vectorstore_table()` method to create a database table to store the documents and embeddings. This table will be created with appropriate schema.

In [None]:
TABLE_NAME = "vectorstore"

# The vector size (also called embedding size) is determined by the embedding model you use!
VECTOR_SIZE = 1536

Use the `Column` class to customize the table schema. A Column is defined by a name and data type. Any Postgres [data type](https://www.postgresql.org/docs/current/datatype.html) can be used.

In [None]:
from sqlalchemy.exc import ProgrammingError

from langchain_postgres import Column

try:
    await pg_engine.ainit_vectorstore_table(
        table_name=TABLE_NAME,
        vector_size=VECTOR_SIZE,
        metadata_columns=[
            Column("likes", "INTEGER"),
            Column("location", "TEXT"),
            Column("topic", "TEXT"),
        ],
    )
except ProgrammingError:
    # Catching the exception here
    print("Table already exists. Skipping creation.")

### Configure an embeddings model

You need to configure a vectorstore with an embedding model. The embedding model will be used automatically when adding documents and when searching.

We'll use `langchain-openai` as the embedding more here, but you can use any [LangChain embeddings model](https://python.langchain.com/docs/integrations/text_embedding/).

In [None]:
%pip install --upgrade --quiet langchain-openai

In [None]:
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings(model="text-embedding-3-small")

## Initialize the vectorstore

Once the schema for the document collection exists, you can initialize a vectorstore that uses the schema.

You can use the vectorstore to do basic operations; including:

1. Add documents
2. Delete documents
3. Search through the documents

In [None]:
from langchain_postgres import PGVectorStore

vectorstore = await PGVectorStore.create(
    engine=pg_engine,
    table_name=TABLE_NAME,
    embedding_service=embedding,
    metadata_columns=["location", "topic"],
)

## Add documents


You can add documents using the `aadd_documents` method. 

* Assign unique IDs to documents to avoid duplicated content in your database.
* Adding a document by ID implements has `upsert` semantics (i.e., create if does not exist, update if exists).

In [None]:
import uuid

from langchain_core.documents import Document

docs = [
    Document(
        id=uuid.uuid4(),
        page_content="there are cats in the pond",
        metadata={"likes": 1, "location": "pond", "topic": "animals"},
    ),
    Document(
        id=uuid.uuid4(),
        page_content="ducks are also found in the pond",
        metadata={"likes": 30, "location": "pond", "topic": "animals"},
    ),
    Document(
        id=uuid.uuid4(),
        page_content="fresh apples are available at the market",
        metadata={"likes": 20, "location": "market", "topic": "food"},
    ),
    Document(
        id=uuid.uuid4(),
        page_content="the market also sells fresh oranges",
        metadata={"likes": 5, "location": "market", "topic": "food"},
    ),
]


await vectorstore.aadd_documents(documents=docs)

## Delete Documents

Documents can be deleted by ID.

In [None]:
# We'll use the ID of the first doc to delete it
ids = [docs[0].id]
await vectorstore.adelete(ids)

## Search

Search for similar documents using a natural language query.

In [None]:
query = "I'd like a fruit."
docs = await vectorstore.asimilarity_search(query)
for doc in docs:
    print(repr(doc))

### Search by vector

In [None]:
query_vector = embedding.embed_query(query)
docs = await vectorstore.asimilarity_search_by_vector(query_vector, k=2)
print(docs)

## Filtering

To enable search with filters, it is necessary to declare the columns that you want to filter on when creating the table. The vectorstore supports a set of filters that can be applied against the metadata fields of the documents.

`PGVectorStore` currently supports the following operators.

| Operator  | Meaning/Category        |
|-----------|-------------------------|
| \$eq       | Equality (==)           |
| \$ne       | Inequality (!=)         |
| \$lt       | Less than (<)           |
| \$lte      | Less than or equal (<=) |
| \$gt       | Greater than (>)        |
| \$gte      | Greater than or equal (>=) |
| \$in       | Special Cased (in)      |
| \$nin      | Special Cased (not in)  |
| \$between  | Special Cased (between) |
| \$exists   | Special Cased (is null) |
| \$contains_any | Special Cased (contains any of values) |
| \$contains_none| Special Cased (contains none of values) |
| \$like     | Text (like)             |
| \$ilike    | Text (case-insensitive like) |
| \$and      | Logical (and)           |
| \$or       | Logical (or)            |


In [None]:
await vectorstore.asimilarity_search(
    "birds", filter={"$or": [{"topic": "animals"}, {"location": "market"}]}
)

In [None]:
await vectorstore.asimilarity_search("apple", filter={"topic": "food"})

In [None]:
await vectorstore.asimilarity_search(
    "apple", filter={"topic": {"$in": ["food", "animals"]}}
)

In [None]:
await vectorstore.asimilarity_search(
    "sales of fruit", filter={"topic": {"$ne": "animals"}}
)

## Optimization

Speed up vector search queries by adding appropriate indexes. Learn more about [vector indexes](https://cloud.google.com/blog/products/databases/faster-similarity-search-performance-with-pgvector-indexes).

### Add an Index

In [None]:
from langchain_postgres.v2.indexes import IVFFlatIndex

index = IVFFlatIndex()  # Add an index using a default index name
await vectorstore.aapply_vector_index(index)

### Re-index

Rebuild an index using the data stored in the index's table, replacing the old copy of the index. Some index types may require re-indexing after a considerable amount of new data is added.

In [None]:
await vectorstore.areindex()  # Re-index using default index name

### Drop an index

You can delete indexes

In [None]:
await vectorstore.adrop_vector_index()  # Drop index using default name

## Clean up

**⚠️ WARNING: this can not be undone**

Drop the vector store table.

In [None]:
await pg_engine.adrop_table(TABLE_NAME)