# Vector Store
- Storing the vectors into DB

## 1. FAISS
- Facebook AI Similarity Search (FAISS)
- FAISS is a library for efficient similarity searhc and clustering a dense vectors.
- It contains algorithm that search in sets of vectors of any size, up to ones that possibly do not fit in RAM.
- It also contains supporting code for evaluation and parameter tuning.

In [2]:
from langchain_community.document_loaders import TextLoader 
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

In [5]:
## 1. Data Load
text_loader = TextLoader("transformer.txt")
text_docs = text_loader.load()
text_docs

[Document(metadata={'source': 'transformer.txt'}, page_content='The Transformer model, introduced in the paper “Attention is All You Need” by Vaswani et al. in 2017, marked a significant departure from previous deep learning architectures used for natural language processing (NLP).\n\nPrior to Transformers, models like Recurrent Neural Networks (RNNs) and Long Short-Term Memory networks (LSTMs) were the go-to methods for handling sequential data. \n\nHowever, these models processed data step by step, leading to slower training times and limitations in capturing long-range dependencies. \n\nThe Transformer, in contrast, uses a mechanism called self-attention that enables it to process entire sequences of data in parallel, making it much faster and more effective at capturing complex relationships between words, regardless of their distance in the sequence.\n\nAt the heart of the Transformer architecture lies the self-attention mechanism. Self-attention allows each token in a sequence to

In [7]:
## 2. Data Splitting
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=30)
chunks_docs = text_splitter.split_documents(text_docs)
chunks_docs

[Document(metadata={'source': 'transformer.txt'}, page_content='The Transformer model, introduced in the paper “Attention is All You Need” by Vaswani et al. in 2017, marked a significant departure from previous deep learning architectures used for natural language processing (NLP).\n\nPrior to Transformers, models like Recurrent Neural Networks (RNNs) and Long Short-Term Memory networks (LSTMs) were the go-to methods for handling sequential data.'),
 Document(metadata={'source': 'transformer.txt'}, page_content='However, these models processed data step by step, leading to slower training times and limitations in capturing long-range dependencies. \n\nThe Transformer, in contrast, uses a mechanism called self-attention that enables it to process entire sequences of data in parallel, making it much faster and more effective at capturing complex relationships between words, regardless of their distance in the sequence.'),
 Document(metadata={'source': 'transformer.txt'}, page_content='At

In [10]:
## 3. Embeddings
embedding = OllamaEmbeddings(model="gemma2:2b") ## Default mode ls llama2
embedding

OllamaEmbeddings(base_url='http://localhost:11434', model='gemma2:2b', embed_instruction='passage: ', query_instruction='query: ', mirostat=None, mirostat_eta=None, mirostat_tau=None, num_ctx=None, num_gpu=None, num_thread=None, repeat_last_n=None, repeat_penalty=None, temperature=None, stop=None, tfs_z=None, top_k=None, top_p=None, show_progress=False, headers=None, model_kwargs=None)

In [11]:
## 4. FAISS DB
faiss_db = FAISS.from_documents(chunks_docs, embedding)
faiss_db

<langchain_community.vectorstores.faiss.FAISS at 0x121efa3f0>

In [14]:
## 5. Query
query = "What is the text all about?"
query_result = faiss_db.similarity_search(query)
query_result

[Document(id='50ac313f-d335-431b-a4d6-ba251605aec7', metadata={'source': 'transformer.txt'}, page_content='However, these models processed data step by step, leading to slower training times and limitations in capturing long-range dependencies. \n\nThe Transformer, in contrast, uses a mechanism called self-attention that enables it to process entire sequences of data in parallel, making it much faster and more effective at capturing complex relationships between words, regardless of their distance in the sequence.'),
 Document(id='4b4be241-dd81-436d-9093-b0b48b378111', metadata={'source': 'transformer.txt'}, page_content='The decoder, on the other hand, takes this encoded information and generates the output sequence. Both the encoder and decoder consist of multiple layers of self-attention and feed-forward neural networks. \n\nThe encoder’s layers work to capture the relationships in the input, while the decoder generates the output while attending to both the encoder’s output and pre

In [16]:
query_result[0].page_content

'However, these models processed data step by step, leading to slower training times and limitations in capturing long-range dependencies. \n\nThe Transformer, in contrast, uses a mechanism called self-attention that enables it to process entire sequences of data in parallel, making it much faster and more effective at capturing complex relationships between words, regardless of their distance in the sequence.'

#### As a Retriver 
- We can also convert the vectorStore into a Retriever Class.
- This allow us to easily use it in other Langchain models, while largely works with retrievers.

In [17]:
retriever = faiss_db.as_retriever()
retriever.invoke(query)

[Document(id='50ac313f-d335-431b-a4d6-ba251605aec7', metadata={'source': 'transformer.txt'}, page_content='However, these models processed data step by step, leading to slower training times and limitations in capturing long-range dependencies. \n\nThe Transformer, in contrast, uses a mechanism called self-attention that enables it to process entire sequences of data in parallel, making it much faster and more effective at capturing complex relationships between words, regardless of their distance in the sequence.'),
 Document(id='4b4be241-dd81-436d-9093-b0b48b378111', metadata={'source': 'transformer.txt'}, page_content='The decoder, on the other hand, takes this encoded information and generates the output sequence. Both the encoder and decoder consist of multiple layers of self-attention and feed-forward neural networks. \n\nThe encoder’s layers work to capture the relationships in the input, while the decoder generates the output while attending to both the encoder’s output and pre

### Similarity Search with Score
- These are FAISS specific methods - similarity_search_with_score
- This allows to return not only the documents but also the distance score of the query to them.
- The returned distanc score is L2 distance (a.k.a Manhatten distance). 
- Lower Score is better.

In [18]:
faiss_db.similarity_search_with_score(query)

[(Document(id='50ac313f-d335-431b-a4d6-ba251605aec7', metadata={'source': 'transformer.txt'}, page_content='However, these models processed data step by step, leading to slower training times and limitations in capturing long-range dependencies. \n\nThe Transformer, in contrast, uses a mechanism called self-attention that enables it to process entire sequences of data in parallel, making it much faster and more effective at capturing complex relationships between words, regardless of their distance in the sequence.'),
  np.float32(3918.3716)),
 (Document(id='4b4be241-dd81-436d-9093-b0b48b378111', metadata={'source': 'transformer.txt'}, page_content='The decoder, on the other hand, takes this encoded information and generates the output sequence. Both the encoder and decoder consist of multiple layers of self-attention and feed-forward neural networks. \n\nThe encoder’s layers work to capture the relationships in the input, while the decoder generates the output while attending to both 

In [19]:
embedding_vector = embedding.embed_query(query)
embedding_vector

[-1.1227810382843018,
 0.6096981763839722,
 -1.1328717470169067,
 0.8620540499687195,
 0.19997353851795197,
 2.0818614959716797,
 -1.1502734422683716,
 -0.23491665720939636,
 1.5814882516860962,
 0.1362071931362152,
 -1.6779192686080933,
 -0.3024980127811432,
 -1.2110776901245117,
 -0.7754701972007751,
 -1.066559076309204,
 -0.6645272374153137,
 0.5272212028503418,
 -1.0698517560958862,
 -3.9099199771881104,
 0.6610438227653503,
 1.159730315208435,
 -0.3196346163749695,
 -1.0488115549087524,
 0.6563618183135986,
 0.37776869535446167,
 1.4670379161834717,
 0.8872778415679932,
 -0.6913021206855774,
 -1.5543091297149658,
 0.6368270516395569,
 -1.9224759340286255,
 -0.8413493037223816,
 0.19763147830963135,
 1.2060595750808716,
 0.7809198498725891,
 1.320129156112671,
 -0.032588522881269455,
 -0.3710813820362091,
 0.6621020436286926,
 -1.6674728393554688,
 0.2564912438392639,
 1.7931692600250244,
 0.4910845458507538,
 -0.36302450299263,
 -0.5952747464179993,
 -1.8950824737548828,
 1.131178

In [20]:
faiss_db.similarity_search_by_vector(embedding_vector)

[Document(id='50ac313f-d335-431b-a4d6-ba251605aec7', metadata={'source': 'transformer.txt'}, page_content='However, these models processed data step by step, leading to slower training times and limitations in capturing long-range dependencies. \n\nThe Transformer, in contrast, uses a mechanism called self-attention that enables it to process entire sequences of data in parallel, making it much faster and more effective at capturing complex relationships between words, regardless of their distance in the sequence.'),
 Document(id='4b4be241-dd81-436d-9093-b0b48b378111', metadata={'source': 'transformer.txt'}, page_content='The decoder, on the other hand, takes this encoded information and generates the output sequence. Both the encoder and decoder consist of multiple layers of self-attention and feed-forward neural networks. \n\nThe encoder’s layers work to capture the relationships in the input, while the decoder generates the output while attending to both the encoder’s output and pre

#### Saving and loading FAISS DB

In [21]:
## saving
faiss_db.save_local("faiss_index")

In [25]:
## loading
new_db = FAISS.load_local("faiss_index", embedding, allow_dangerous_deserialization=True)
new_db

<langchain_community.vectorstores.faiss.FAISS at 0x122f10380>

## 2. ChromaDB
- Chroma is a AI-native open-source vector database focused on developer productivity and happiness.


In [26]:
from langchain_chroma import Chroma

In [27]:
chroma_db = Chroma.from_documents(chunks_docs, embedding)
chroma_db

<langchain_chroma.vectorstores.Chroma at 0x12613bce0>

In [28]:
## query it 
query = "What is the text all about?"
query_result = chroma_db.similarity_search(query)
query_result

[Document(id='7e00e49c-3f20-4279-bfc7-e17639d1dc8f', metadata={'source': 'transformer.txt'}, page_content='However, these models processed data step by step, leading to slower training times and limitations in capturing long-range dependencies. \n\nThe Transformer, in contrast, uses a mechanism called self-attention that enables it to process entire sequences of data in parallel, making it much faster and more effective at capturing complex relationships between words, regardless of their distance in the sequence.'),
 Document(id='d85f6615-08ce-49d6-a6d4-81e9f4ba0eba', metadata={'source': 'transformer.txt'}, page_content='The decoder, on the other hand, takes this encoded information and generates the output sequence. Both the encoder and decoder consist of multiple layers of self-attention and feed-forward neural networks. \n\nThe encoder’s layers work to capture the relationships in the input, while the decoder generates the output while attending to both the encoder’s output and pre

In [32]:
## saving
vectordb = Chroma.from_documents(documents=chunks_docs, embedding=embedding, persist_directory="./chroma_db")

In [33]:
## loading from disk
choma_db2 = Chroma(persist_directory="./chroma_db", embedding_function=embedding)
choma_db2

<langchain_chroma.vectorstores.Chroma at 0x12638d280>

In [34]:
query_result = choma_db2.similarity_search(query)
query_result

[Document(id='c5e1930b-1c07-40a3-b0b1-f676040d3834', metadata={'source': 'transformer.txt'}, page_content='However, these models processed data step by step, leading to slower training times and limitations in capturing long-range dependencies. \n\nThe Transformer, in contrast, uses a mechanism called self-attention that enables it to process entire sequences of data in parallel, making it much faster and more effective at capturing complex relationships between words, regardless of their distance in the sequence.'),
 Document(id='3a176a52-2e86-4113-bcf0-79d0302f078a', metadata={'source': 'transformer.txt'}, page_content='The decoder, on the other hand, takes this encoded information and generates the output sequence. Both the encoder and decoder consist of multiple layers of self-attention and feed-forward neural networks. \n\nThe encoder’s layers work to capture the relationships in the input, while the decoder generates the output while attending to both the encoder’s output and pre

In [35]:
query_result = choma_db2.similarity_search_with_score(query)
query_result

[(Document(id='c5e1930b-1c07-40a3-b0b1-f676040d3834', metadata={'source': 'transformer.txt'}, page_content='However, these models processed data step by step, leading to slower training times and limitations in capturing long-range dependencies. \n\nThe Transformer, in contrast, uses a mechanism called self-attention that enables it to process entire sequences of data in parallel, making it much faster and more effective at capturing complex relationships between words, regardless of their distance in the sequence.'),
  3918.3715654670036),
 (Document(id='3a176a52-2e86-4113-bcf0-79d0302f078a', metadata={'source': 'transformer.txt'}, page_content='The decoder, on the other hand, takes this encoded information and generates the output sequence. Both the encoder and decoder consist of multiple layers of self-attention and feed-forward neural networks. \n\nThe encoder’s layers work to capture the relationships in the input, while the decoder generates the output while attending to both the

#### Retriever

In [36]:
## Retrievar
retriever = choma_db2.as_retriever()
retriever.invoke(query)

[Document(id='c5e1930b-1c07-40a3-b0b1-f676040d3834', metadata={'source': 'transformer.txt'}, page_content='However, these models processed data step by step, leading to slower training times and limitations in capturing long-range dependencies. \n\nThe Transformer, in contrast, uses a mechanism called self-attention that enables it to process entire sequences of data in parallel, making it much faster and more effective at capturing complex relationships between words, regardless of their distance in the sequence.'),
 Document(id='3a176a52-2e86-4113-bcf0-79d0302f078a', metadata={'source': 'transformer.txt'}, page_content='The decoder, on the other hand, takes this encoded information and generates the output sequence. Both the encoder and decoder consist of multiple layers of self-attention and feed-forward neural networks. \n\nThe encoder’s layers work to capture the relationships in the input, while the decoder generates the output while attending to both the encoder’s output and pre