# Vectore Store

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

This notebook uses the [chroma](https://www.trychroma.com/) vector database. I prefer it because it is open source, and it is easy to run locally. Other alternatives include [FAISS](https://ai.meta.com/tools/faiss/), [LanceDB](https://lancedb.com/) or [Pinecone](https://www.pinecone.io/).

In [1]:
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

## Load the document

In [2]:
raw_documents = TextLoader("./data/vector-essay.txt").load()

## Split the document into chunks

In [3]:
embeddings_model = OpenAIEmbeddings()
text_splitter = CharacterTextSplitter(
    chunk_size=embeddings_model.chunk_size, chunk_overlap=0
)
documents = text_splitter.split_documents(raw_documents)
documents[0]

Document(page_content='Title: Harnessing Vectors and Language Models for Knowledge Base Retrieval: A Comprehensive Exploration\n\nIntroduction:\nAs advancements in artificial intelligence continue to push the boundaries of natural language processing, the power of large language models and their applications is increasingly evident. In this essay, we delve into the world of vectors, text embedding, vector databases, and vector search, highlighting their significance in developing end-to-end web applications for knowledge base retrieval.\n\nI. Understanding Vectors:\nVectors serve as a cornerstone in a diverse range of scientific disciplines, including computer science and mathematics. In the domain of AI, a vector represents a mathematical construct that enables the representation of numerical information. Vectors are widely used for encoding and organizing data, allowing for efficient processing and analysis.', metadata={'source': './data/vector-essay.txt'})

## Store the documents in a vector database

In [4]:
db = Chroma.from_documents(documents, embeddings_model)

## Search for relevant documents using a query

In [5]:
query = "What is vector search?"
docs = db.similarity_search(query)
print(docs[0].page_content)

IV. Vector Search using Cosine Similarity:
Cosine similarity serves as a fundamental metric in vector search. It measures the angular similarity between two vectors, thereby offering a measure of their semantic closeness. In the context of knowledge base retrieval, querying a vector database using cosine similarity allows us to find relevant documents based on their similarity to the input query.

V. Applications in Knowledge Base Retrieval:
The integration of large language models and vector search techniques has paved the way for developing end-to-end web applications focused on knowledge base retrieval. These applications enable users to search vast repositories of precomputed vector representations, ensuring quick and accurate retrieval of relevant information.


In [6]:
docs = db.similarity_search_with_relevance_scores(query)
for doc, score in docs[:3]:
    print(f"relevance={score}\n\n{doc.page_content}\n")

relevance=0.7788229274228644

IV. Vector Search using Cosine Similarity:
Cosine similarity serves as a fundamental metric in vector search. It measures the angular similarity between two vectors, thereby offering a measure of their semantic closeness. In the context of knowledge base retrieval, querying a vector database using cosine similarity allows us to find relevant documents based on their similarity to the input query.

V. Applications in Knowledge Base Retrieval:
The integration of large language models and vector search techniques has paved the way for developing end-to-end web applications focused on knowledge base retrieval. These applications enable users to search vast repositories of precomputed vector representations, ensuring quick and accurate retrieval of relevant information.

relevance=0.7443720091563626

Title: Harnessing Vectors and Language Models for Knowledge Base Retrieval: A Comprehensive Exploration

Introduction:
As advancements in artificial intelligence c

## Search for relevant documents using a vector

In [7]:
embedding_vector = embeddings_model.embed_query(query)
docs = db.similarity_search_by_vector(embedding_vector)
print(docs[0].page_content)

IV. Vector Search using Cosine Similarity:
Cosine similarity serves as a fundamental metric in vector search. It measures the angular similarity between two vectors, thereby offering a measure of their semantic closeness. In the context of knowledge base retrieval, querying a vector database using cosine similarity allows us to find relevant documents based on their similarity to the input query.

V. Applications in Knowledge Base Retrieval:
The integration of large language models and vector search techniques has paved the way for developing end-to-end web applications focused on knowledge base retrieval. These applications enable users to search vast repositories of precomputed vector representations, ensuring quick and accurate retrieval of relevant information.
