## Querying a PgVector index

Simple example on how to query content from a PostgreSQL+pgvector VectorStore.

Requirements:
- A PostgreSQL cluster with the pgvector extension installed (https://github.com/pgvector/pgvector)
- A Database created in the cluster with the extension enabled (in this example, the database is named `vectordb`. Run the following command in the database as a superuser:
`CREATE EXTENSION vector;`
- All the information to connect to the database

### Needed packages

In [1]:
!pip install -q pgvector 


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


### Base parameters, the PostgreSQL info

In [2]:
CONNECTION_STRING = "postgresql+psycopg://vectordb:vectordb@postgresql.llama-2-7b-chat-hf-fine-tuned.svc.cluster.local:5432/vectordb"
COLLECTION_NAME = "documents_test"

### Imports

In [3]:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.vectorstores.pgvector import PGVector

### Initialize the connection

In [4]:
embeddings = HuggingFaceEmbeddings()
store = PGVector(
    connection_string=CONNECTION_STRING,
    collection_name=COLLECTION_NAME,
    embedding_function=embeddings)

  return self.fget.__get__(instance, owner)()


### Make a query to the index to verify sources

In [5]:
query="How do you create a Data Science Project?"
results =store.similarity_search(query, k=4, return_metadata=True)
for result in results:
    print(result.metadata['source'])

rhods-doc/red_hat_openshift_data_science_self-managed-1.32-getting_started_with_red_hat_openshift_data_science_self-managed-en-us.pdf
rhods-doc/red_hat_openshift_data_science_self-managed-1.32-getting_started_with_red_hat_openshift_data_science_self-managed-en-us.pdf
rhods-doc/red_hat_openshift_data_science_self-managed-1.32-getting_started_with_red_hat_openshift_data_science_self-managed-en-us.pdf
rhods-doc/red_hat_openshift_data_science_self-managed-1.32-getting_started_with_red_hat_openshift_data_science_self-managed-en-us.pdf


### Work with a retriever

In [6]:
retriever = store.as_retriever(search_type="similarity_score_threshold", search_kwargs={"k": 4, "score_threshold": 0.2 })

In [7]:
docs = retriever.get_relevant_documents(query)
docs

[Document(page_content='-\n, and must start and end with an alphanumeric\ncharacter.\n5\n. \nEnter a \ndescription\n for your data science project.\n6\n. \nClick \nCreate\n.\nA project details page opens. From here, you can create workbenches, add cluster storage, and\nadd data connections to your project.\nVerification\nThe data science project that you created is displayed on the \nData science projects\n page.\nCHAPTER 4. CREATING A DATA SCIENCE PROJECT\n9', metadata={'source': 'rhods-doc/red_hat_openshift_data_science_self-managed-1.32-getting_started_with_red_hat_openshift_data_science_self-managed-en-us.pdf', 'page': 12}),
 Document(page_content='-\n, and must start and end with an alphanumeric\ncharacter.\n5\n. \nEnter a \ndescription\n for your data science project.\n6\n. \nClick \nCreate\n.\nA project details page opens. From here, you can create workbenches, add cluster storage, and\nadd data connections to your project.\nVerification\nThe data science project that you create