### FAISS
Facebook AI Similarity Search

In [1]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_ollama import OllamaEmbeddings
from langchain.text_splitter import CharacterTextSplitter

loader = TextLoader("speech.txt")
documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=30)
docs = text_splitter.split_documents(documents)


Created a chunk of size 1225, which is longer than the specified 200
Created a chunk of size 204, which is longer than the specified 200
Created a chunk of size 1609, which is longer than the specified 200
Created a chunk of size 374, which is longer than the specified 200
Created a chunk of size 359, which is longer than the specified 200
Created a chunk of size 557, which is longer than the specified 200
Created a chunk of size 361, which is longer than the specified 200


In [2]:
docs

[Document(metadata={'source': 'speech.txt'}, page_content='Heisenberg enjoyed classical music and was an accomplished pianist, and playing for others was a large part of his social life.[3] During the late 1920s and early 1930s he would often play music and dance at the Berlin home of his aristocratic student Carl Friedrich von Weizsäcker, during which time he carried on a courtship with Carl\'s high-school-age sister Adelheid, which scandalized her parents and led to him being unwelcome at their home for a time.[26] Years later his interest in music also led to meeting his future wife. In January 1937, Heisenberg met Elisabeth Schumacher (1914–1998) at a private music recital. Elisabeth was the daughter of a well-known Berlin economics professor, and her brother was the economist E. F. Schumacher, author of Small Is Beautiful. Heisenberg married her on 29 April. Fraternal twins Maria and Wolfgang were born in January 1938, whereupon Wolfgang Pauli congratulated Heisenberg on his "pair

In [3]:
embeddings = OllamaEmbeddings(model="gemma:2b", base_url="http://localhost:11434")
db = FAISS.from_documents(docs, embeddings)
db

<langchain_community.vectorstores.faiss.FAISS at 0x11250b0e0>

In [8]:
query="Who is Werner Heisenberg?"
docs = db.similarity_search(query)
docs[0].page_content

'At 25 years old, Heisenberg gained the title of the youngest full-time professor in Germany and professorial chair[40] of the Institute for Theoretical Physics at the University of Leipzig. He gave lectures that were attended by physicists like Edward Teller and Robert Oppenheimer,[40] who would later work on the Manhattan Project[41] for the United States.'

We can also convert the vectorstore into a retriever class. This allows us to easily use it in other langchain methods, which largely work with retrievers

In [9]:
retriever = db.as_retriever()
retriever.invoke(query)

[Document(id='4642a75c-5ce4-40f7-ad27-44f41c2125bc', metadata={'source': 'speech.txt'}, page_content='At 25 years old, Heisenberg gained the title of the youngest full-time professor in Germany and professorial chair[40] of the Institute for Theoretical Physics at the University of Leipzig. He gave lectures that were attended by physicists like Edward Teller and Robert Oppenheimer,[40] who would later work on the Manhattan Project[41] for the United States.'),
 Document(id='b5d726a0-c070-4117-bdea-c02f3e5fd84c', metadata={'source': 'speech.txt'}, page_content="During Heisenberg's tenure at Leipzig, the high quality of the doctoral students and post-graduate and research associates who studied and worked with him is clear from the acclaim that many later earned. They included Erich Bagge, Felix Bloch, Ugo Fano, Siegfried Flügge, William Vermillion Houston, Friedrich Hund, Robert S. Mulliken, Rudolf Peierls, George Placzek, Isidor Isaac Rabi, Fritz Sauter, John C. Slater, Edward Teller, 

### Similarity Search with score 

There are some FAISS specific methods. One of them is simialrity search with score, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore, a lower score is better. 

In [11]:
docs_and_score=db.similarity_search_with_score(query)
docs_and_score #  Get L2 distance by default. 

### 

[(Document(id='4642a75c-5ce4-40f7-ad27-44f41c2125bc', metadata={'source': 'speech.txt'}, page_content='At 25 years old, Heisenberg gained the title of the youngest full-time professor in Germany and professorial chair[40] of the Institute for Theoretical Physics at the University of Leipzig. He gave lectures that were attended by physicists like Edward Teller and Robert Oppenheimer,[40] who would later work on the Manhattan Project[41] for the United States.'),
  np.float32(0.30578256)),
 (Document(id='b5d726a0-c070-4117-bdea-c02f3e5fd84c', metadata={'source': 'speech.txt'}, page_content="During Heisenberg's tenure at Leipzig, the high quality of the doctoral students and post-graduate and research associates who studied and worked with him is clear from the acclaim that many later earned. They included Erich Bagge, Felix Bloch, Ugo Fano, Siegfried Flügge, William Vermillion Houston, Friedrich Hund, Robert S. Mulliken, Rudolf Peierls, George Placzek, Isidor Isaac Rabi, Fritz Sauter, Jo

In [18]:
embeddings_vector = embeddings.embed_query(query)
len(embeddings_vector)

db.similarity_search_by_vector(embeddings_vector)

[Document(id='4642a75c-5ce4-40f7-ad27-44f41c2125bc', metadata={'source': 'speech.txt'}, page_content='At 25 years old, Heisenberg gained the title of the youngest full-time professor in Germany and professorial chair[40] of the Institute for Theoretical Physics at the University of Leipzig. He gave lectures that were attended by physicists like Edward Teller and Robert Oppenheimer,[40] who would later work on the Manhattan Project[41] for the United States.'),
 Document(id='b5d726a0-c070-4117-bdea-c02f3e5fd84c', metadata={'source': 'speech.txt'}, page_content="During Heisenberg's tenure at Leipzig, the high quality of the doctoral students and post-graduate and research associates who studied and worked with him is clear from the acclaim that many later earned. They included Erich Bagge, Felix Bloch, Ugo Fano, Siegfried Flügge, William Vermillion Houston, Friedrich Hund, Robert S. Mulliken, Rudolf Peierls, George Placzek, Isidor Isaac Rabi, Fritz Sauter, John C. Slater, Edward Teller, 

In [19]:
### Saving and loading 

db.save_local("faiss_index")

In [21]:
new_df = FAISS.load_local("faiss_index", embeddings=embeddings, allow_dangerous_deserialization=True)
docs = new_df.similarity_search(query)

In [22]:
docs

[Document(id='4642a75c-5ce4-40f7-ad27-44f41c2125bc', metadata={'source': 'speech.txt'}, page_content='At 25 years old, Heisenberg gained the title of the youngest full-time professor in Germany and professorial chair[40] of the Institute for Theoretical Physics at the University of Leipzig. He gave lectures that were attended by physicists like Edward Teller and Robert Oppenheimer,[40] who would later work on the Manhattan Project[41] for the United States.'),
 Document(id='b5d726a0-c070-4117-bdea-c02f3e5fd84c', metadata={'source': 'speech.txt'}, page_content="During Heisenberg's tenure at Leipzig, the high quality of the doctoral students and post-graduate and research associates who studied and worked with him is clear from the acclaim that many later earned. They included Erich Bagge, Felix Bloch, Ugo Fano, Siegfried Flügge, William Vermillion Houston, Friedrich Hund, Robert S. Mulliken, Rudolf Peierls, George Placzek, Isidor Isaac Rabi, Fritz Sauter, John C. Slater, Edward Teller, 