### FAISS
Facebook AI similarity search(Faiss) is a library for efficient similarity search and clustering of dense vectors. it contains algorithms that search in sets of vectors of any size , up to ones that possibly do not fitin RAM. it also contains supporting code for evaluation nd parameter training.

In [10]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.embeddings import OpenAIEmbeddings,OllamaEmbeddings

loader = TextLoader("speech.md")
document = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000,chunk_overlap=10)
docs = text_splitter.split_documents(document)

In [11]:
from dotenv import load_dotenv
import os
load_dotenv()
api_key2 = os.getenv("OPENAI_API_KEY")

embeddings = OpenAIEmbeddings(api_key=api_key2,model="text-embedding-3-large")


  embeddings = OpenAIEmbeddings(api_key=api_key2,model="text-embedding-3-large")


In [15]:
db = FAISS.from_documents(docs,embeddings)


In [16]:
##query
query = "what transformed the way we live"
answer = db.similarity_search(query)
answer


[Document(id='f035e869-5591-4fb7-a923-e61214398696', metadata={'source': 'speech.md'}, page_content="Ladies and gentlemen,\n\nToday, we stand at the crossroads of innovation and tradition. Technology has transformed the way we live, work, and communicate. It is our responsibility to harness this power for the greater good.\n\nTogether, let's build a future that is inclusive, sustainable, and driven by knowledge.\n\nThank you.")]

### Retriever 
we can also convert the vectorstore into a retriever class. This allows us to easily use it in other lanchain method to retrive the quer related content

In [18]:
retriever = db.as_retriever()
docs = retriever.invoke(query)
docs[0].page_content

"Ladies and gentlemen,\n\nToday, we stand at the crossroads of innovation and tradition. Technology has transformed the way we live, work, and communicate. It is our responsibility to harness this power for the greater good.\n\nTogether, let's build a future that is inclusive, sustainable, and driven by knowledge.\n\nThank you."

### Similarity search with scores
There are some FAISS specific methods. one of them is similarity_search with scores , which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance . The lower the score the better it is

In [19]:
docs_n_score = db.similarity_search_with_score(query)
docs_n_score


[(Document(id='f035e869-5591-4fb7-a923-e61214398696', metadata={'source': 'speech.md'}, page_content="Ladies and gentlemen,\n\nToday, we stand at the crossroads of innovation and tradition. Technology has transformed the way we live, work, and communicate. It is our responsibility to harness this power for the greater good.\n\nTogether, let's build a future that is inclusive, sustainable, and driven by knowledge.\n\nThank you."),
  1.3583719)]

In [20]:
embedding_vector = embeddings.embed_query(query)
embedding_vector


[-0.012057526806079898,
 -0.007271517435906977,
 -0.007243364486177971,
 0.004428064391003206,
 -0.048905781743106,
 0.002435234342510611,
 -0.005879954821426262,
 -0.009821374768406419,
 -0.013553658317830548,
 0.008461987485800319,
 -0.008687211083632372,
 -0.03764457949976182,
 -0.00842176832095705,
 0.005071561715269878,
 0.018709677950964884,
 -0.012282751335234513,
 -0.032930966023290625,
 0.02038277099405823,
 -0.009716806802459045,
 0.01726181036834872,
 0.02150889084586362,
 -0.02350373161976774,
 -0.027509500833513282,
 -0.01212992037147522,
 -0.0005504916697198158,
 0.019272738808190144,
 0.012974510725990545,
 -0.005968435587097609,
 -0.028410397087486623,
 0.01833966722234219,
 0.00385897182320466,
 0.03410534887290794,
 0.017358334501327437,
 -0.03684021110405745,
 -0.016385044681958776,
 0.004039955271031681,
 0.010778576921837773,
 0.007645550081013999,
 -0.0055702719087459066,
 0.025321611793651725,
 0.0504179999894714,
 0.03164397137475729,
 -0.001861114262691336,
 0.

In [24]:
docs_score = db.similarity_search_with_score("embedding_vector")
docs_score


[(Document(id='f035e869-5591-4fb7-a923-e61214398696', metadata={'source': 'speech.md'}, page_content="Ladies and gentlemen,\n\nToday, we stand at the crossroads of innovation and tradition. Technology has transformed the way we live, work, and communicate. It is our responsibility to harness this power for the greater good.\n\nTogether, let's build a future that is inclusive, sustainable, and driven by knowledge.\n\nThank you."),
  1.8233352)]

In [25]:
##save and load 

db.save_local("faiss_index")

In [28]:
new_db = FAISS.load_local("faiss_index",embeddings,allow_dangerous_deserialization=True)

In [29]:
new_db

<langchain_community.vectorstores.faiss.FAISS at 0x128449210>