# FAISS (Facebook AI Similarity Search)

FAISS (Facebook AI Similarity Search) is an open-source library developed by Facebook AI Research for efficient similarity search and clustering of dense vectors. It is widely used for building vector stores, which are databases optimized for storing and searching high-dimensional vector embeddings, such as those generated by machine learning models for text, images, or audio.

## Key Points about FAISS Vector Store

- **FAISS enables fast nearest neighbor search** in large datasets of vectors, making it ideal for applications like semantic search, recommendation systems, and information retrieval.
- **Supports both CPU and GPU acceleration**, allowing for scalable performance on large collections.
- **Provides various indexing structures** (e.g., flat, IVF, HNSW) to balance between search speed, memory usage, and accuracy.
- **Commonly used in NLP and AI workflows** to store and retrieve document or sentence embeddings for tasks like document similarity, question answering, and more.

**In summary:**  
FAISS is a powerful tool for managing and searching vector embeddings efficiently in AI-driven applications.

### Crawl the website

In [None]:
from langchain_text_splitters import HTMLHeaderTextSplitter

url = "https://www.chandanys.in/"

headers_to_split_on = [
    ("h1", "Header 1"),
    ("h2", "Header 2"),
    ("h3", "Header 3"),
    ("h4", "Header 4"),
]

html_splitter = HTMLHeaderTextSplitter(headers_to_split_on)
crawled_docs = html_splitter.split_text_from_url(url)

### Init embedding model

In [None]:
from langchain_openai import AzureOpenAIEmbeddings
from dotenv import load_dotenv

load_dotenv()

azOpenAIembeddings = AzureOpenAIEmbeddings(
    model="text-embedding-ada-002",
    api_version="2023-05-15",
)

### FAISS Vector DB

FAISS is in memory vector store, so it will not persist across sessions

In [None]:
from langchain.vectorstores import FAISS

vectorstore = FAISS.from_documents(
    documents=crawled_docs,
    embedding=azOpenAIembeddings
)

### Semantic Search

In [None]:
vectorstore.similarity_search("Chandan's technical stack", k=3)

### Search with Score

In [None]:
vectorstore.similarity_search_with_score(query="technical stack", k=3)

### Search by Vector

In [None]:
user_query="Chandan's technical stack"
user_query_vector=azOpenAIembeddings.embed_query(user_query)
vectorstore.similarity_search_with_score_by_vector(embedding=user_query_vector, k=3)

### Vector Store as Retriever

We can also convert the vector store into a Retriever object. This makes it easy to integrate with other LangChain methods, as many of them are designed to work with retrievers. Essentially, it serves as a convenient interface.

In [None]:
retriever = vectorstore.as_retriever(search_kwargs = {"k": 3})
retriever.invoke("technical stack")

### Saving FAISS Vector DB Locally

In [None]:
vectorstore.save_local("./faiss_vector_db")

### Load FAISS Vector DB from Local

In [None]:
new_db=FAISS.load_local("faiss_vector_db", azOpenAIembeddings, allow_dangerous_deserialization=True)
new_db.similarity_search(query="technical stack", k=3)