FAISS - Facebook AI Similarity Search

In [5]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OpenAIEmbeddings

text_docs = TextLoader("../Data/speech.txt").load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = text_splitter.split_documents(text_docs)

In [6]:
split_docs

[Document(metadata={'source': '../Data/speech.txt'}, page_content='The world must be made safe for democracy. Its peace must be planted upon the tested foundations of political liberty. We have no selfish ends to serve. We desire no conquest, no dominion. We seek no indemnities for ourselves, no material compensation for the sacrifices we shall freely make. We are but one of the champions of the rights of mankind. We shall be satisfied when those rights have been made as secure as the faith and the freedom of nations can make them.\n\nJust because we fight without rancor and without selfish object, seeking nothing for ourselves but what we shall wish to share with all free peoples, we shall, I feel confident, conduct our operations as belligerents without passion and ourselves observe with proud punctilio the principles of right and of fair play we profess to be fighting for.\n\n…'),
 Document(metadata={'source': '../Data/speech.txt'}, page_content='…\n\nIt will be all the easier for u

In [7]:
embedding_model = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(split_docs, embedding_model)
vectorstore

  embedding_model = OpenAIEmbeddings()


<langchain_community.vectorstores.faiss.FAISS at 0x10c80a9e0>

In [None]:
#lets query now
query = "What is the main topic of the speech?"
results = vectorstore.similarity_search(query, k=3) #shows top 3 results
for result in results:
    print(result.page_content)
    print("-----")


It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.
-----
To such a task we can dedicate our lives and our fortunes, everything that we are and everything that we have, with the pride of those who know that the day has come when America is privileged to spend he

We can also convert vectorstore into a Retriever Class. This allows us to easily use it in other Langchain methods, wherever retrievers are involved

In [10]:
#Converting vector store into a retriever, now it can be used as a abstraction over our vector store
#This allows us to use the retriever interface for querying.
retriever=vectorstore.as_retriever()
retrieved_docs = retriever.invoke(query) #since its a runnable we need to invoke it, it processes multiple inputs in parallel.
retrieved_docs[0].page_content  # Accessing the content of the first document returned by the retriever

'It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'

In FAISS we also have some FAISS specific methods

One of them is similarity_search with score, which allows us to return not only the documents but also the distance score of the query to them. 

The returned distance score is L2 distance. Therefore a lower score is better.

In [11]:
docs_and_score = vectorstore.similarity_search_with_score(query, k=3)  # returns both documents and their scores
docs_and_score

[(Document(id='1758b9e0-0f36-4712-b5b6-19079f5b1a11', metadata={'source': '../Data/speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'),
  np.float32(0.40106493)),
 (Document(id='1f3ef80b-2ff1-4bf7-9818-9a34c4a9005a', metadata={'sour

In [12]:
#Storing vectordb in local
vectorstore.save_local("faiss_index")  # Save the vector store to a local directory

In [14]:
#Loading the vector store from local
loaded_vectorstore = FAISS.load_local("faiss_index", embedding_model, allow_dangerous_deserialization=True)  # Load the vector store from the local directory
loaded_vectorstore

<langchain_community.vectorstores.faiss.FAISS at 0x10f185b10>

In [15]:
loaded_vectorstore.similarity_search(query, k=3)  # Perform a similarity search on the loaded vector store

[Document(id='1758b9e0-0f36-4712-b5b6-19079f5b1a11', metadata={'source': '../Data/speech.txt'}, page_content='It is a distressing and oppressive duty, gentlemen of the Congress, which I have performed in thus addressing you. There are, it may be, many months of fiery trial and sacrifice ahead of us. It is a fearful thing to lead this great peaceful people into war, into the most terrible and disastrous of all wars, civilization itself seeming to be in the balance. But the right is more precious than peace, and we shall fight for the things which we have always carried nearest our hearts—for democracy, for the right of those who submit to authority to have a voice in their own governments, for the rights and liberties of small nations, for a universal dominion of right by such a concert of free peoples as shall bring peace and safety to all nations and make the world itself at last free.'),
 Document(id='1f3ef80b-2ff1-4bf7-9818-9a34c4a9005a', metadata={'source': '../Data/speech.txt'}, p