## FAISS DB

- Facebook AI Similarity search (Faiss) is a library for efficient similarity search and clustering of dense vectors.it contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.
- Need to install faiss-cpu : pip install faiss-cpu

In [21]:
## Loading and splitting documets 

from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("2.speech.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(separator="\n", chunk_size=100, chunk_overlap=30)
docs = text_splitter.split_documents(documents=documents)



In [22]:
docs

[Document(metadata={'source': '2.speech.txt'}, page_content='"The moonlight danced across the rippling water, casting silver shadows."'),
 Document(metadata={'source': '2.speech.txt'}, page_content='"A forgotten book lay dust-covered on the attic shelf."'),
 Document(metadata={'source': '2.speech.txt'}, page_content='"In the quiet of the forest, every sound seemed magnified."'),
 Document(metadata={'source': '2.speech.txt'}, page_content='"She wore a smile like a secret, hinting at mysteries untold."'),
 Document(metadata={'source': '2.speech.txt'}, page_content='"The scent of fresh bread filled the air as the bakery opened."'),
 Document(metadata={'source': '2.speech.txt'}, page_content='"He found solace in the rhythmic ticking of the old grandfather clock."'),
 Document(metadata={'source': '2.speech.txt'}, page_content='"The city streets buzzed with the hum of a thousand stories waiting to be told."'),
 Document(metadata={'source': '2.speech.txt'}, page_content='"Among the stars, dre

In [23]:
## Embedding / converting data to vectors

embeddings = OllamaEmbeddings()
db = FAISS.from_documents(docs,embeddings)
db

<langchain_community.vectorstores.faiss.FAISS at 0x2211e5f39b0>

In [25]:
## Querying the vector DB 

query = "what happened with photograph?"

docs_response = db.similarity_search(query)
docs[0].page_content


'"The moonlight danced across the rippling water, casting silver shadows."'

### As a retriever 

We can also convert the vector stores into a retriever class. This allows us to easily use it in other langchain methods, which largly works with retrievers



In [28]:
retriever = db.as_retriever()
docs_ret = retriever.invoke(query)
docs_ret[0].page_content

'"Among the stars, dreams seemed to shimmer just a little brighter."'

### Similarity Search with score 

- There are some FAISS specific methods. One of them is similarity_search_with_score, which allows you to return not only the dcouments but also the distance score of the query to them.
The returned distance score is L2 distance. Therefore , a lower score is better.

- L2 score is also called as a manhatten score

In [29]:
docs_and_sscore = db.similarity_search_with_score(query)
docs_and_sscore

[(Document(metadata={'source': '2.speech.txt'}, page_content='"Among the stars, dreams seemed to shimmer just a little brighter."'),
  9332.267),
 (Document(metadata={'source': '2.speech.txt'}, page_content='"A forgotten book lay dust-covered on the attic shelf."'),
  10191.865),
 (Document(metadata={'source': '2.speech.txt'}, page_content='"An old photograph slipped from the pages of a long-abandoned journal."'),
  10340.957),
 (Document(metadata={'source': '2.speech.txt'}, page_content='"The city streets buzzed with the hum of a thousand stories waiting to be told."'),
  10467.08)]

### Passing query as a vectors 

In [30]:
embedding_vector = embeddings.embed_query(query)
embedding_vector

[1.475339651107788,
 -0.6727593541145325,
 1.6882517337799072,
 -0.6002629399299622,
 -2.6975953578948975,
 0.07971423864364624,
 0.460440456867218,
 -0.49726298451423645,
 0.18014821410179138,
 -0.08178604394197464,
 0.18558485805988312,
 -1.031967043876648,
 0.053031887859106064,
 3.238417625427246,
 -0.6686111688613892,
 -1.5349575281143188,
 0.6768632531166077,
 0.6126464605331421,
 1.1526137590408325,
 -1.91748046875,
 1.6094001531600952,
 -3.991102933883667,
 0.8291155695915222,
 -2.0779192447662354,
 -0.3683927357196808,
 -1.9921252727508545,
 0.6795099377632141,
 -1.3943018913269043,
 0.08851487934589386,
 -0.13482511043548584,
 1.9809216260910034,
 -1.1782643795013428,
 -2.063157081604004,
 2.8522448539733887,
 2.356053352355957,
 -4.187309741973877,
 -0.4365040361881256,
 -0.027942167595028877,
 0.35000112652778625,
 -0.5143059492111206,
 0.14770497381687164,
 -5.54640531539917,
 -1.0007174015045166,
 -0.01584043726325035,
 0.9877885580062866,
 -0.39463990926742554,
 -0.66508

In [31]:
## Similarity search by query vector 

docs_score = db.similarity_search_by_vector(embedding_vector)
docs_score

[Document(metadata={'source': '2.speech.txt'}, page_content='"Among the stars, dreams seemed to shimmer just a little brighter."'),
 Document(metadata={'source': '2.speech.txt'}, page_content='"A forgotten book lay dust-covered on the attic shelf."'),
 Document(metadata={'source': '2.speech.txt'}, page_content='"An old photograph slipped from the pages of a long-abandoned journal."'),
 Document(metadata={'source': '2.speech.txt'}, page_content='"The city streets buzzed with the hum of a thousand stories waiting to be told."')]

### Saving and loading 

In [32]:
## It will save the data with index name in this folder from DB

db.save_local("faiss_index")

In [37]:
## Loading db back from folder

new_db = FAISS.load_local("faiss_index",embeddings,allow_dangerous_deserialization=True)
docs = new_db.similarity_search(query)
docs


[Document(metadata={'source': '2.speech.txt'}, page_content='"Among the stars, dreams seemed to shimmer just a little brighter."'),
 Document(metadata={'source': '2.speech.txt'}, page_content='"A forgotten book lay dust-covered on the attic shelf."'),
 Document(metadata={'source': '2.speech.txt'}, page_content='"An old photograph slipped from the pages of a long-abandoned journal."'),
 Document(metadata={'source': '2.speech.txt'}, page_content='"The city streets buzzed with the hum of a thousand stories waiting to be told."')]