# PGVector

> An implementation of LangChain vectorstore abstraction using `postgres` as the backend and utilizing the `pgvector` extension.

The code lives in an integration package called: [langchain_postgres](https://github.com/langchain-ai/langchain-postgres/).

## Status

This code has been ported over from `langchain_community` into a dedicated package called `langchain-postgres`. The following changes have been made:

* langchain_postgres works only with psycopg3. Please update your connnecion strings from `postgresql+psycopg2://...` to `postgresql+psycopg://langchain:langchain@...` (yes, it's the driver name is `psycopg` not `psycopg3`, but it'll use `psycopg3`.
* The schema of the embedding store and collection have been changed to make add_documents work correctly with user specified ids.
* One has to pass an explicit connection object now.


Currently, there is **no mechanism** that supports easy data migration on schema changes. So any schema changes in the vectorstore will require the user to recreate the tables and re-add the documents.
If this is a concern, please use a different vectorstore. If not, this implementation should be fine for your use case.

## Setup

First donwload the partner package:

In [1]:
!pip install -qU langchain_postgres langchain_openai SQLAlchemy


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


You can run the following command to spin up a a postgres container with the `pgvector` extension:

In [None]:
!docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16

### Credentials

To run this notebook, you can either set your OpenAI API key in the environment to access real embeddings, or simply use the FakeEmbeddings class for quick local testing. Just make sure the `langchain_postgres` package is installed and the PostgreSQL container is running properly.

If you want to get best in-class automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:

In [4]:
import os

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
os.environ["OPENAI_API_KEY"] = "YOUR-API-KEY"

## Instantiation

import EmbeddingTabs from "@theme/EmbeddingTabs";

<EmbeddingTabs/>


In [5]:
# | output: false
# | echo: false
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large",)

In [8]:
from langchain_postgres import PGVectorStore, PGEngine, Column
from sqlalchemy import create_engine
from langchain_core.embeddings import FakeEmbeddings

# See docker command above to launch a postgres instance with pgvector enabled.
connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain"  # Uses psycopg3!
# engine = PGEngine.from_connection_string(url=connection)

In [None]:
VECTOR_SIZE = 768
collection_name = "my_docs"

engine = PGEngine.from_connection_string(url=connection)
engine.init_vectorstore_table(
    table_name=collection_name,
    vector_size=VECTOR_SIZE,
    metadata_columns=[
        # Document metadatas should be declared
        Column("location", "TEXT"),
        Column("topic", "TEXT"),
    ],
)

vector_store = PGVectorStore.create_sync(
    engine,
    embedding_service=embeddings,
    # shorter fake embeddings for quick testing
    # embedding_service=FakeEmbeddings(size=VECTOR_SIZE),
    table_name=collection_name,
    metadata_columns=["location", "topic"],
)


## Manage vector store

### Add items to vector store

Note that adding documents by ID will over-write any existing documents that match that ID.

In this notebook, IDs are handled using UUIDs for each document.

In [33]:

import uuid
from langchain_core.documents import Document

docs = [
    Document(
        page_content="there are cats in the pond",
        metadata={"location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="ducks are also found in the pond",
        metadata={"location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="fresh apples are available at the market",
        metadata={"location": "market", "topic": "food"},
    ),
    Document(
        page_content="the market also sells fresh oranges",
        metadata={"location": "market", "topic": "food"},
    ),
    Document(
        page_content="the new art exhibit is fascinating",
        metadata={"location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a sculpture exhibit is also at the museum",
        metadata={"location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a new coffee shop opened on Main Street",
        metadata={"location": "Main Street", "topic": "food"},
    ),
    Document(
        page_content="the book club meets at the library",
        metadata={"location": "library", "topic": "reading"},
    ),
    Document(
        page_content="the library hosts a weekly story time for kids",
        metadata={"location": "library", "topic": "reading"},
    ),
    Document(
        page_content="a cooking class for beginners is offered at the community center",
        metadata={"location": "community center", "topic": "classes"},
    ),
]

doc_ids = [str(uuid.uuid4()) for _ in docs]

vector_store.add_documents(docs, ids=doc_ids)



['806e3a72-1f0a-4757-9c48-8afec040837b',
 '2830481c-7a45-4190-a215-90f097614fad',
 '544d63f8-b257-402b-b0b8-879f31904dcc',
 '63d5e51c-b8dd-4895-9a76-979bac983908',
 '173de82b-d8a7-4bf8-b64f-aded9fc21fc2',
 'd53f33bd-af70-4d6b-8651-3355bcc7ccb0',
 '63c57f5c-c9b3-432d-833e-79d48cc73752',
 'a2a34ef3-ff07-4eb5-925b-e60d272be47b',
 '62091398-4015-4300-8b36-dad152afe7c4',
 'c8e140d3-2d21-4977-9cbd-3087830eecf2']

### Delete items from vector store

In [34]:
def delete_by_metadata(vector_store, **filters):
    docs = vector_store.similarity_search(
        query="",  # to apply on all documents
        k=1000,
        filter=filters
    )
    ids = [doc.id for doc in docs]
    if ids:
        vector_store.delete(ids=ids)
        print(f"Deleted {len(ids)} documents")
    else:
        print(f"No documents found")


In [35]:
delete_by_metadata(vector_store, location="market")

Deleted 2 documents


## Query vector store

Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.

### Filtering Support

The vectorstore supports a set of filters that can be applied against the metadata fields of the documents.

| Operator | Meaning/Category        |
|----------|-------------------------|
| \$eq      | Equality (==)           |
| \$ne      | Inequality (!=)         |
| \$lt      | Less than (&lt;)           |
| \$lte     | Less than or equal (&lt;=) |
| \$gt      | Greater than (>)        |
| \$gte     | Greater than or equal (>=) |
| \$in      | Special Cased (in)      |
| \$nin     | Special Cased (not in)  |
| \$between | Special Cased (between) |
| \$like    | Text (like)             |
| \$ilike   | Text (case-insensitive like) |
| \$and     | Logical (and)           |
| \$or      | Logical (or)            |

### Query directly

Performing a simple similarity search can be done as follows:

In [27]:
def display_search_results(docs: Document):
    for i, doc in enumerate(docs, 1):
        print(f"{i}. {doc.page_content} [{doc.metadata}]")

In [28]:
# Search filtering only by location
results = vector_store.similarity_search(
    "kitty",
    k=10,
    filter={"location": {"$in": ["pond", "library"]}},
)

display_search_results(results)

1. there are cats in the pond [{'location': 'pond', 'topic': 'animals'}]
2. the book club meets at the library [{'location': 'library', 'topic': 'reading'}]
3. the library hosts a weekly story time for kids [{'location': 'library', 'topic': 'reading'}]
4. ducks are also found in the pond [{'location': 'pond', 'topic': 'animals'}]


If you provide a dict with multiple fields, but no operators, the top level will be interpreted as a logical **AND** filter

In [29]:
results = vector_store.similarity_search(
    "ducks",
    k=10,
    filter={"location": {"$in": ["pond", "market"]}},
)

display_search_results(results)

1. there are cats in the pond [{'location': 'pond', 'topic': 'animals'}]
2. ducks are also found in the pond [{'location': 'pond', 'topic': 'animals'}]


In [30]:
results = vector_store.similarity_search(
    "ducks",
    k=10,
    filter={
        "$and": [
            {"location": {"$in": ["pond", "market"]}},
            {"topic": {"$in": ["animals"]}},
        ]
    },
)
display_search_results(results)

1. there are cats in the pond [{'location': 'pond', 'topic': 'animals'}]
2. ducks are also found in the pond [{'location': 'pond', 'topic': 'animals'}]


If you want to execute a similarity search and receive the corresponding scores you can run:

In [31]:
results = vector_store.similarity_search_with_score(query="cats", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.924413] a sculpture exhibit is also at the museum [{'location': 'museum', 'topic': 'art'}]


### Query by turning into retriever

You can also transform the vector store into a retriever for easier usage in your chains.

In [32]:
retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("kitty")

[Document(id='f655856a-4cf9-43fd-a0a4-ed739e8fca65', metadata={'location': 'library', 'topic': 'reading'}, page_content='the library hosts a weekly story time for kids')]

## Usage for retrieval-augmented generation

For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:

- [Tutorials](/docs/tutorials/)
- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)
- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval)
- [Langchain Postgres](https://github.com/langchain-ai/langchain-postgres)

## API reference

For a full list of the different searches you can execute on a `PGVectorStore` vector store, please refer to the API reference: https://python.langchain.com/api_reference/postgres/v2/langchain_postgres.v2.vectorstores.PGVectorStore.html