## FAISS (Facebook AI Similarity Search)
FAISS is not a full-fledged database with APIs, but a high-performance library built purely for speed and scalability of similarity search. Unlike many vector DBs that wrap storage + APIs, FAISS focuses only on the core engine and is optimized in C++/CUDA, making it one of the fastest libraries for both CPU and GPU. It can handle billions of vectors with advanced indexing techniques (like IVF, PQ, HNSW), something that many other DBs struggle with at that scale. That’s why big companies often use FAISS as the core search engine inside their own custom vector databases.

In [2]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("speech.txt")
document = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=50, chunk_overlap=10)
docs = text_splitter.split_documents(document)

Created a chunk of size 141, which is longer than the specified 50
Created a chunk of size 257, which is longer than the specified 50
Created a chunk of size 161, which is longer than the specified 50


In [3]:
docs

[Document(metadata={'source': 'speech.txt'}, page_content='Good morning everyone,  \nI am very happy to be here today.  \nFirst of all, I want to thank you all for giving me this opportunity to speak.'),
 Document(metadata={'source': 'speech.txt'}, page_content='Today, I want to share a few thoughts about the importance of learning and growth.  \nLearning is a lifelong journey. It does not stop after school or college.  \nEvery experience in life teaches us something new, and every challenge helps us grow stronger.'),
 Document(metadata={'source': 'speech.txt'}, page_content='We should never be afraid of making mistakes, because mistakes are proof that we are trying.  \nWhat matters is that we learn from them and keep moving forward.'),
 Document(metadata={'source': 'speech.txt'}, page_content='So let us stay curious, stay motivated, and never stop improving ourselves.  \nThank you.')]

In [7]:
embeddings = OllamaEmbeddings(model="gemma:2b")
db = FAISS.from_documents(docs,embeddings)
db

<langchain_community.vectorstores.faiss.FAISS at 0x250d2120050>

In [9]:
query = "Find speeches or texts about the importance of continuous learning, personal growth, and motivation."
docs = db.similarity_search(query)
docs[0].page_content

'We should never be afraid of making mistakes, because mistakes are proof that we are trying.  \nWhat matters is that we learn from them and keep moving forward.'

## retriever 
we can also covert the vectorstore into a retreiver class . which is useful to use in other langchain methods 

In [10]:
ret = db.as_retriever()
ret.invoke(query)

[Document(id='ab39ef7e-3938-4061-8b97-e2254dade1ab', metadata={'source': 'speech.txt'}, page_content='We should never be afraid of making mistakes, because mistakes are proof that we are trying.  \nWhat matters is that we learn from them and keep moving forward.'),
 Document(id='0d187e69-757a-4e83-8662-029a1268b89a', metadata={'source': 'speech.txt'}, page_content='So let us stay curious, stay motivated, and never stop improving ourselves.  \nThank you.'),
 Document(id='cf3b91d4-548b-4899-b2cd-6fcca851c8d0', metadata={'source': 'speech.txt'}, page_content='Today, I want to share a few thoughts about the importance of learning and growth.  \nLearning is a lifelong journey. It does not stop after school or college.  \nEvery experience in life teaches us something new, and every challenge helps us grow stronger.'),
 Document(id='27db95f0-8245-45be-815f-e5e2472698e3', metadata={'source': 'speech.txt'}, page_content='Good morning everyone,  \nI am very happy to be here today.  \nFirst of al

## similarity search with score 
similarity search gives you the top-k vectors with scores that show how close each result is to the query.
These scores act as a measure of relevance — lower distance (like L2) or higher score (like cosine similarity) means the result is more similar.
This is important because it doesn’t just fetch any vectors, but ranks them by how well they match your query.
That ranking is what makes semantic search, recommendations, or LLM retrieval effective.
Without scores, you wouldn’t know which result is the most relevant and why it was retrieved.

In [12]:
docs_and_scores= db.similarity_search_with_score(query)

In [13]:
docs_and_scores

[(Document(id='ab39ef7e-3938-4061-8b97-e2254dade1ab', metadata={'source': 'speech.txt'}, page_content='We should never be afraid of making mistakes, because mistakes are proof that we are trying.  \nWhat matters is that we learn from them and keep moving forward.'),
  np.float32(2450.278)),
 (Document(id='0d187e69-757a-4e83-8662-029a1268b89a', metadata={'source': 'speech.txt'}, page_content='So let us stay curious, stay motivated, and never stop improving ourselves.  \nThank you.'),
  np.float32(2456.8696)),
 (Document(id='cf3b91d4-548b-4899-b2cd-6fcca851c8d0', metadata={'source': 'speech.txt'}, page_content='Today, I want to share a few thoughts about the importance of learning and growth.  \nLearning is a lifelong journey. It does not stop after school or college.  \nEvery experience in life teaches us something new, and every challenge helps us grow stronger.'),
  np.float32(2699.21)),
 (Document(id='27db95f0-8245-45be-815f-e5e2472698e3', metadata={'source': 'speech.txt'}, page_cont

In [14]:
embedding_vector=embeddings.embed_query(query)
embedding_vector

[-1.0881606340408325,
 -0.68260258436203,
 0.16935515403747559,
 1.2059342861175537,
 1.2786662578582764,
 2.0842018127441406,
 0.44867268204689026,
 -0.6552714109420776,
 0.12628363072872162,
 -1.1395387649536133,
 0.3968193233013153,
 0.5340678095817566,
 0.7874913215637207,
 0.9614617228507996,
 -0.3321280777454376,
 -1.6797581911087036,
 2.196765422821045,
 1.8455936908721924,
 -0.3165280520915985,
 0.22168360650539398,
 0.6926795244216919,
 0.5281621217727661,
 0.21002362668514252,
 -0.11935561895370483,
 -1.359663724899292,
 0.03896982967853546,
 -0.4592416286468506,
 -0.7820960283279419,
 -0.10022790729999542,
 -2.164337396621704,
 0.7622060775756836,
 -0.3474885821342468,
 -1.044465184211731,
 -0.5446993112564087,
 -1.0048549175262451,
 0.5651193261146545,
 0.16383187472820282,
 0.7134586572647095,
 -0.03154255449771881,
 -2.5089030265808105,
 -0.812071681022644,
 -0.9438115954399109,
 1.0264837741851807,
 0.7376441955566406,
 2.575143337249756,
 -1.160934329032898,
 1.98176097

In [15]:
docs_scores=db.similarity_search_by_vector(embedding_vector)
docs_scores

[Document(id='ab39ef7e-3938-4061-8b97-e2254dade1ab', metadata={'source': 'speech.txt'}, page_content='We should never be afraid of making mistakes, because mistakes are proof that we are trying.  \nWhat matters is that we learn from them and keep moving forward.'),
 Document(id='0d187e69-757a-4e83-8662-029a1268b89a', metadata={'source': 'speech.txt'}, page_content='So let us stay curious, stay motivated, and never stop improving ourselves.  \nThank you.'),
 Document(id='cf3b91d4-548b-4899-b2cd-6fcca851c8d0', metadata={'source': 'speech.txt'}, page_content='Today, I want to share a few thoughts about the importance of learning and growth.  \nLearning is a lifelong journey. It does not stop after school or college.  \nEvery experience in life teaches us something new, and every challenge helps us grow stronger.'),
 Document(id='27db95f0-8245-45be-815f-e5e2472698e3', metadata={'source': 'speech.txt'}, page_content='Good morning everyone,  \nI am very happy to be here today.  \nFirst of al

save and load

In [16]:
db.save_local("faiss_index")

In [18]:
new_db=FAISS.load_local("faiss_index",embeddings, allow_dangerous_deserialization=True)

In [19]:
docs=new_db.similarity_search(embedding_vector)
docs

[Document(id='0d187e69-757a-4e83-8662-029a1268b89a', metadata={'source': 'speech.txt'}, page_content='So let us stay curious, stay motivated, and never stop improving ourselves.  \nThank you.'),
 Document(id='27db95f0-8245-45be-815f-e5e2472698e3', metadata={'source': 'speech.txt'}, page_content='Good morning everyone,  \nI am very happy to be here today.  \nFirst of all, I want to thank you all for giving me this opportunity to speak.'),
 Document(id='ab39ef7e-3938-4061-8b97-e2254dade1ab', metadata={'source': 'speech.txt'}, page_content='We should never be afraid of making mistakes, because mistakes are proof that we are trying.  \nWhat matters is that we learn from them and keep moving forward.'),
 Document(id='cf3b91d4-548b-4899-b2cd-6fcca851c8d0', metadata={'source': 'speech.txt'}, page_content='Today, I want to share a few thoughts about the importance of learning and growth.  \nLearning is a lifelong journey. It does not stop after school or college.  \nEvery experience in life te