# Facebook AI Similarity Search (FAISS)
- One popular vector database is the Facebook AI Similarity Search (FAISS), an open-source library developed by Facebook Research
- FAISS provides efficient algorithms and methods for searching large collections of dense vectors

In [1]:
# load document
from langchain_community.document_loaders import TextLoader
loader = TextLoader('state_of_the_union.txt', encoding= 'utf-8')
documents = loader.load()
documents

[Document(page_content='So tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.\n\nFirst, beat the opioid epidemic.\nThere is so much we can do. Increase funding for prevention, treatment, harm reduction, and recovery.\nGet rid of outdated rules that stop doctors from prescribing treatments. And stop the flow of illicit drugs by working with state and local law enforcement to go after traffickers.\nIf you’re suffering from addiction, know you are not alone. I believe in recovery, and I celebrate the 23 million Americans in recovery.\n\nSecond, let’s take on mental health. Especially among our children, whose lives and education have been turned upside down.\nThe American Rescue Plan gave schools money to hire teachers and help students make up for lost learning.\nI urge every parent to make sure your school does just that. And we can all play a part—sign up to be a tutor or a mentor.\nChildren were also struggling before the pandemic. Bullying, violen

In [2]:
# create chunks
from langchain_text_splitters import CharacterTextSplitter
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

Created a chunk of size 2046, which is longer than the specified 1000


In [3]:
# create embeddings
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [4]:
# vector database
# One popular vector database is the Facebook AI Similarity Search (FAISS), an open-source library developed by Facebook Research.
from langchain_community.vectorstores import FAISS
db = FAISS.from_documents(texts, embeddings)

In [5]:
# create reteriver
retriever = db.as_retriever()

In [6]:
docs = retriever.get_relevant_documents("what is about veterans")

In [7]:
print(docs[0].page_content)

Third, support our veterans.
Veterans are the best of us.
I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home.
My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.
Our troops in Iraq and Afghanistan faced many dangers.
One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more.
When they came home, many of the world’s fittest and best trained warriors were never the same.
Headaches. Numbness. Dizziness.
A cancer that would put them in a flag-draped coffin.
I know.
One of those soldiers was my son Major Beau Biden.
We don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops.
But I’m committed to finding out everything we can.
Committed to military families like Danielle Ro

In [8]:
# Maximum marginal relevance retrieval
# By default, the vector store retriever uses similarity search. If the underlying vector store supports maximum marginal
# relevance search, you can specify that as the search type.
retriever = db.as_retriever(search_type="mmr")

In [9]:
docs = retriever.get_relevant_documents("what is about veterans")
print(docs[0].page_content)

Third, support our veterans.
Veterans are the best of us.
I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home.
My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.
Our troops in Iraq and Afghanistan faced many dangers.
One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more.
When they came home, many of the world’s fittest and best trained warriors were never the same.
Headaches. Numbness. Dizziness.
A cancer that would put them in a flag-draped coffin.
I know.
One of those soldiers was my son Major Beau Biden.
We don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops.
But I’m committed to finding out everything we can.
Committed to military families like Danielle Ro

In [10]:
retriever = db.as_retriever(search_kwargs={"k": 1})
docs = retriever.get_relevant_documents("what did he say about ketanji brown jackson")
len(docs)

1

In [11]:
print(docs[0].page_content)

Third, support our veterans.
Veterans are the best of us.
I’ve always believed that we have a sacred obligation to equip all those we send to war and care for them and their families when they come home.
My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free.
Our troops in Iraq and Afghanistan faced many dangers.
One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more.
When they came home, many of the world’s fittest and best trained warriors were never the same.
Headaches. Numbness. Dizziness.
A cancer that would put them in a flag-draped coffin.
I know.
One of those soldiers was my son Major Beau Biden.
We don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops.
But I’m committed to finding out everything we can.
Committed to military families like Danielle Ro