Facebook AI Similarity Search(Faiss) is a library for efficient for similarity search and clustering of dense vectors.It contains algorithms that search in sets of vector of any size,up to ones that possibly do not fit in RAM.It also contain supporting code for evaluation and parameter tuning.

In [3]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("speech.txt")
documnets=loader.load()
text_splitter=CharacterTextSplitter(chunk_size=100, chunk_overlap=20)
docs=text_splitter.split_documents(documnets)

Created a chunk of size 278, which is longer than the specified 100
Created a chunk of size 250, which is longer than the specified 100
Created a chunk of size 256, which is longer than the specified 100
Created a chunk of size 327, which is longer than the specified 100
Created a chunk of size 255, which is longer than the specified 100
Created a chunk of size 254, which is longer than the specified 100
Created a chunk of size 297, which is longer than the specified 100


In [4]:
docs

[Document(metadata={'source': 'speech.txt'}, page_content='LangChain is a powerful framework designed to simplify the development of applications powered by large language models (LLMs). It helps developers create chains of components, such as prompt templates, memory, LLMs, and agents, to build context-aware, intelligent applications.'),
 Document(metadata={'source': 'speech.txt'}, page_content='One of the core advantages of LangChain is its modularity. Developers can start with basic chains and progressively add more complex functionality such as custom tools, retrieval augmentation using vector stores, or multi-agent collaboration systems.'),
 Document(metadata={'source': 'speech.txt'}, page_content='LangChain supports multiple LLM providers such as OpenAI, Anthropic, Cohere, and Hugging Face. It also integrates with various vector stores like FAISS, Pinecone, Weaviate, and Chroma to enable efficient document retrieval and semantic search capabilities.'),
 Document(metadata={'source

In [5]:
embeddings=OllamaEmbeddings(model="gemma:2b")

  embeddings=OllamaEmbeddings(model="gemma:2b")


In [6]:
db=FAISS.from_documents(docs,embeddings)
db

<langchain_community.vectorstores.faiss.FAISS at 0x1e8dd0ff820>

In [7]:
## quering
query = "For data ingestion, LangChain provides several document loaders. These include loaders for plain text,"
results = db.similarity_search(query)

In [8]:
results[0].page_content

'To process and manage long documents, LangChain includes powerful text splitters like `CharacterTextSplitter`, `RecursiveCharacterTextSplitter`, and `MarkdownHeaderTextSplitter`. These tools break documents into manageable chunks while preserving context.'

## Retriver

we can aslo convert out databasse vector class into a vectorclass. This allow us it easily use it in other Langchain  methods ,which largely work with retrivers.

In [9]:
retriver=db.as_retriever()
docs=retriver.invoke(query)
docs[0].page_content

'To process and manage long documents, LangChain includes powerful text splitters like `CharacterTextSplitter`, `RecursiveCharacterTextSplitter`, and `MarkdownHeaderTextSplitter`. These tools break documents into manageable chunks while preserving context.'

In [10]:
type(docs)

list

### Similarity Search with Scores
There are some FAISS specific methods.one of them is similarity_search_with_score ,which allow you to return not only the documnet but also the distance score of the query to them.The returned score is L2 distance.Therefore ,a lower score is better.

In [11]:
docs_scores=db.similarity_search_with_score(query)
docs_scores

[(Document(id='084d39a1-e530-4f40-b0ed-6caccc0009fc', metadata={'source': 'speech.txt'}, page_content='To process and manage long documents, LangChain includes powerful text splitters like `CharacterTextSplitter`, `RecursiveCharacterTextSplitter`, and `MarkdownHeaderTextSplitter`. These tools break documents into manageable chunks while preserving context.'),
  4932.908),
 (Document(id='067cb4b2-fa32-469d-9c04-ea82b0c5de3a', metadata={'source': 'speech.txt'}, page_content='LangChain supports multiple LLM providers such as OpenAI, Anthropic, Cohere, and Hugging Face. It also integrates with various vector stores like FAISS, Pinecone, Weaviate, and Chroma to enable efficient document retrieval and semantic search capabilities.'),
  5288.799),
 (Document(id='e780117b-09e4-49b4-959b-d3cd323f09a5', metadata={'source': 'speech.txt'}, page_content='LangChain is a powerful framework designed to simplify the development of applications powered by large language models (LLMs). It helps developer

In [14]:
emebeding_vector=embeddings.embed_query(query)

In [15]:
docs_scores=db.similarity_search_by_vector(emebeding_vector)
docs

[Document(id='084d39a1-e530-4f40-b0ed-6caccc0009fc', metadata={'source': 'speech.txt'}, page_content='To process and manage long documents, LangChain includes powerful text splitters like `CharacterTextSplitter`, `RecursiveCharacterTextSplitter`, and `MarkdownHeaderTextSplitter`. These tools break documents into manageable chunks while preserving context.'),
 Document(id='067cb4b2-fa32-469d-9c04-ea82b0c5de3a', metadata={'source': 'speech.txt'}, page_content='LangChain supports multiple LLM providers such as OpenAI, Anthropic, Cohere, and Hugging Face. It also integrates with various vector stores like FAISS, Pinecone, Weaviate, and Chroma to enable efficient document retrieval and semantic search capabilities.'),
 Document(id='e780117b-09e4-49b4-959b-d3cd323f09a5', metadata={'source': 'speech.txt'}, page_content='LangChain is a powerful framework designed to simplify the development of applications powered by large language models (LLMs). It helps developers create chains of components

In [16]:
### saving and loading the vector store
db.save_local("faiss_store")

In [20]:
new_db = FAISS.load_local("faiss_store", embeddings,allow_dangerous_deserialization=True)
docs = new_db.similarity_search(query)

In [21]:
docs

[Document(id='084d39a1-e530-4f40-b0ed-6caccc0009fc', metadata={'source': 'speech.txt'}, page_content='To process and manage long documents, LangChain includes powerful text splitters like `CharacterTextSplitter`, `RecursiveCharacterTextSplitter`, and `MarkdownHeaderTextSplitter`. These tools break documents into manageable chunks while preserving context.'),
 Document(id='067cb4b2-fa32-469d-9c04-ea82b0c5de3a', metadata={'source': 'speech.txt'}, page_content='LangChain supports multiple LLM providers such as OpenAI, Anthropic, Cohere, and Hugging Face. It also integrates with various vector stores like FAISS, Pinecone, Weaviate, and Chroma to enable efficient document retrieval and semantic search capabilities.'),
 Document(id='e780117b-09e4-49b4-959b-d3cd323f09a5', metadata={'source': 'speech.txt'}, page_content='LangChain is a powerful framework designed to simplify the development of applications powered by large language models (LLMs). It helps developers create chains of components