# Vectorstores: Langchain + PGVector

This notebook demonstrates how to use the `PGVector` vectorstore from the `langchain_postgres` package. `PGVector` is an implementation of LangChain's vectorstore abstraction using PostgreSQL as the backend and utilizing the `pgvector` extension.

## Import required classes

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_core.documents import Document
from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector

## Identify current user

In [None]:
import os

user = os.getenv('LOGNAME')
print(f'Hello, {user}')

Hello, mbklein


## Connect and initialize the `PGVector` vectorstore

For this example, we're going to use an embedding model from [HuggingFace](https://huggingface.co)

In [11]:
connection = f'postgresql+psycopg://{user}:{user}@localhost:5432/{user}'
collection_name = "code4lib2024"
embeddings = HuggingFaceEmbeddings(model_name='nomic-ai/nomic-embed-text-v1.5', model_kwargs={'trust_remote_code':True})

vectorstore = PGVector(
    embeddings=embeddings,
    collection_name=collection_name,
    connection=connection,
    use_jsonb=True,
)

<All keys matched successfully>


## Add Documents to vectorstore

Create a list of `Document` objects and add them to the vectorstore using `add_documents()`. You can specify the document IDs using the `ids` parameter.

In [12]:
docs = [
    Document(
        page_content="Interlibrary loan requests can be made online or at the service desk",
        metadata={"id": 1, "location": "library", "topic": "borrowing"},
    ),
    Document(
        page_content="Course reserves are available for checkout at the circulation desk",
        metadata={"id": 2, "location": "library", "topic": "borrowing"},
    ),
    Document(
        page_content="Study rooms can be reserved up to two weeks in advance",
        metadata={"id": 3, "location": "library", "topic": "reservations"},
    ),
    Document(
        page_content="Library workshops on database research are held monthly",
        metadata={"id": 4, "location": "library", "topic": "workshops"},
    ),
    Document(
        page_content="Access to digital archives is available through the library portal",
        metadata={"id": 5, "location": "library", "topic": "online resources"},
    ),
    Document(
        page_content="Renew your borrowed items online or at any library kiosk",
        metadata={"id": 6, "location": "library", "topic": "borrowing"},
    ),
    Document(
        page_content="Special collections can be accessed in the reading room",
        metadata={"id": 7, "location": "library", "topic": "borrowing"},
    ),
    Document(
        page_content="Library orientation tours are available for new users",
        metadata={"id": 8, "location": "library", "topic": "facilities"},
    ),
    Document(
        page_content="The library offers free Wi-Fi to all visitors",
        metadata={"id": 9, "location": "library", "topic": "facilities"},
    ),
    Document(
        page_content="Photocopying and printing services are available on the ground floor",
        metadata={"id": 10, "location": "library", "topic": "printing services"},
    ),
]

vectorstore.add_documents(docs, ids=[doc.metadata["id"] for doc in docs])

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

## Similarity Search

Perform a similarity search using the `similarity_search()` method. You can specify the number of results to return using the `k` parameter.

In [13]:
vectorstore.similarity_search("guided viewing of library spaces", k=10)

[Document(page_content='Library orientation tours are available for new users', metadata={'id': 8, 'topic': 'facilities', 'location': 'library'}),
 Document(page_content='Special collections can be accessed in the reading room', metadata={'id': 7, 'topic': 'borrowing', 'location': 'library'}),
 Document(page_content='The library offers free Wi-Fi to all visitors', metadata={'id': 9, 'topic': 'facilities', 'location': 'library'}),
 Document(page_content='Access to digital archives is available through the library portal', metadata={'id': 5, 'topic': 'online resources', 'location': 'library'}),
 Document(page_content='Study rooms can be reserved up to two weeks in advance', metadata={'id': 3, 'topic': 'reservations', 'location': 'library'}),
 Document(page_content='Photocopying and printing services are available on the ground floor', metadata={'id': 10, 'topic': 'printing services', 'location': 'library'}),
 Document(page_content='Renew your borrowed items online or at any library kiosk

Perform a similarity search using the `similarity_search_with_score()` method.
- This method allows you to return not only the documents, but also the distance score of the query to them.
- The returned distance score is `L2 distance` (or, the length between two points in Euclidean space).
- The calculated distance is normalized to a value between 0 and 1.
- A *LOWER* score is better (i.e., more similar).

In [14]:
vectorstore.similarity_search_with_score("guided viewing of library spaces", k=5)

[(Document(page_content='Library orientation tours are available for new users', metadata={'id': 8, 'topic': 'facilities', 'location': 'library'}),
  0.3159548116291834),
 (Document(page_content='Special collections can be accessed in the reading room', metadata={'id': 7, 'topic': 'borrowing', 'location': 'library'}),
  0.3535018871512495),
 (Document(page_content='The library offers free Wi-Fi to all visitors', metadata={'id': 9, 'topic': 'facilities', 'location': 'library'}),
  0.4055845962079343),
 (Document(page_content='Access to digital archives is available through the library portal', metadata={'id': 5, 'topic': 'online resources', 'location': 'library'}),
  0.44532148610557076),
 (Document(page_content='Study rooms can be reserved up to two weeks in advance', metadata={'id': 3, 'topic': 'reservations', 'location': 'library'}),
  0.4672278332641374)]

In [15]:
vectorstore.similarity_search_with_score("tours", k=5)

[(Document(page_content='Library orientation tours are available for new users', metadata={'id': 8, 'topic': 'facilities', 'location': 'library'}),
  0.38080471606419075),
 (Document(page_content='Course reserves are available for checkout at the circulation desk', metadata={'id': 2, 'topic': 'borrowing', 'location': 'library'}),
  0.5290579315142592),
 (Document(page_content='Study rooms can be reserved up to two weeks in advance', metadata={'id': 3, 'topic': 'reservations', 'location': 'library'}),
  0.5365441239680665),
 (Document(page_content='The library offers free Wi-Fi to all visitors', metadata={'id': 9, 'topic': 'facilities', 'location': 'library'}),
  0.5424525929123898),
 (Document(page_content='Library workshops on database research are held monthly', metadata={'id': 4, 'topic': 'workshops', 'location': 'library'}),
  0.5688333491128945)]

## Filtering Support

`PGVector` supports filtering documents based on their metadata fields. You can use various operators to define the filters. If you provide a dictionary with multiple fields but no operators, the top level will be interpreted as a logical AND filter.

| Operator | Meaning/Category        |
|----------|-------------------------|
| $eq      | Equality (==)           |
| $ne      | Inequality (!=)         |
| $lt      | Less than (<)           |
| $lte     | Less than or equal (<=) |
| $gt      | Greater than (>)        |
| $gte     | Greater than or equal (>=) |
| $in      | Special Cased (in)      |
| $nin     | Special Cased (not in)  |
| $between | Special Cased (between) |
| $like    | Text (like)             |
| $ilike   | Text (case-insensitive like) |
| $and     | Logical (and)           |
| $or      | Logical (or)            |

In [16]:
vectorstore.similarity_search_with_score("borrowing books for a course", k=10, filter={"id": {"$in": [1, 5, 2, 9]}})

[(Document(page_content='Course reserves are available for checkout at the circulation desk', metadata={'id': 2, 'topic': 'borrowing', 'location': 'library'}),
  0.35905132491138936),
 (Document(page_content='Interlibrary loan requests can be made online or at the service desk', metadata={'id': 1, 'topic': 'borrowing', 'location': 'library'}),
  0.4483810948900985),
 (Document(page_content='The library offers free Wi-Fi to all visitors', metadata={'id': 9, 'topic': 'facilities', 'location': 'library'}),
  0.4942349200752565),
 (Document(page_content='Access to digital archives is available through the library portal', metadata={'id': 5, 'topic': 'online resources', 'location': 'library'}),
  0.5078700533152649)]

In [17]:
vectorstore.similarity_search(
    "ILL requests",
    k=10,
    filter={"id": {"$in": [1, 5, 2, 9]}, "topic": {"$in": ["borrowing"]}},
)

[Document(page_content='Interlibrary loan requests can be made online or at the service desk', metadata={'id': 1, 'topic': 'borrowing', 'location': 'library'}),
 Document(page_content='Course reserves are available for checkout at the circulation desk', metadata={'id': 2, 'topic': 'borrowing', 'location': 'library'})]

In [18]:
# You can also use the `$and` operator explicitly.
vectorstore.similarity_search(
    "books",
    k=10,
    filter={
        "$and": [
            {"id": {"$in": [1, 5, 2, 9]}},
            {"topic": {"$in": ["borrowing", "online resources"]}},
        ]
    },
)

[Document(page_content='Course reserves are available for checkout at the circulation desk', metadata={'id': 2, 'topic': 'borrowing', 'location': 'library'}),
 Document(page_content='Access to digital archives is available through the library portal', metadata={'id': 5, 'topic': 'online resources', 'location': 'library'}),
 Document(page_content='Interlibrary loan requests can be made online or at the service desk', metadata={'id': 1, 'topic': 'borrowing', 'location': 'library'})]

In [19]:
# Other operators like `$ne` can be used as well.
vectorstore.similarity_search("reserves", k=10, filter={"topic": {"$ne": "borrowing"}})

[Document(page_content='Study rooms can be reserved up to two weeks in advance', metadata={'id': 3, 'topic': 'reservations', 'location': 'library'}),
 Document(page_content='Access to digital archives is available through the library portal', metadata={'id': 5, 'topic': 'online resources', 'location': 'library'}),
 Document(page_content='Photocopying and printing services are available on the ground floor', metadata={'id': 10, 'topic': 'printing services', 'location': 'library'}),
 Document(page_content='Library orientation tours are available for new users', metadata={'id': 8, 'topic': 'facilities', 'location': 'library'}),
 Document(page_content='Library workshops on database research are held monthly', metadata={'id': 4, 'topic': 'workshops', 'location': 'library'}),
 Document(page_content='The library offers free Wi-Fi to all visitors', metadata={'id': 9, 'topic': 'facilities', 'location': 'library'})]