# Faiss

Facebook ai similarity search (FAiSS) is a library that allows you to search for similarity between documents. It uses a vector space model to compute the similarity between documents and can be used to find similar documents based on their content.

In [5]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain.text_splitter import CharacterTextSplitter

In [8]:
text  = TextLoader("speech.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
docs =  text_splitter.split_documents(text)

In [9]:
docs

[Document(metadata={'source': 'speech.txt'}, page_content='Today we are launching a campaign called HeForShe. I am reaching out to you because we need your help. We want to end gender inequality, and to do this, we need everyone involved. This is the first campaign of its kind at the UN. We want to try to mobilize as many men and boys as possible to be advocates for change. And, we don’t just want to talk about it. We want to try and make sure that it’s tangible. \n\nI was appointed as Goodwill Ambassador for UN Women six months ago. And, the more I spoke about feminism, the more I realized that fighting for women’s rights has too often become synonymous with man-hating. If there is one thing I know for certain, it is that this has to stop. \n\nFor the record, feminism by definition is the belief that men and women should have equal rights and opportunities. It is the theory of political, economic and social equality of the sexes.'),
 Document(metadata={'source': 'speech.txt'}, page_co

In [10]:
embeddings = OllamaEmbeddings(model="mxbai-embed-large:latest")
db = FAISS.from_documents(docs, embeddings)
db

  embeddings = OllamaEmbeddings(model="mxbai-embed-large:latest")


<langchain_community.vectorstores.faiss.FAISS at 0x7c6887d8e350>

In [13]:
## querying

query = "When her male friends were unable to express their feelings ?"
result = db.similarity_search(query)
result

[Document(id='e0060c51-d327-4aad-bf49-f5a2318d3fa5', metadata={'source': 'speech.txt'}, page_content='I started questioning gender-based assumptions a long time ago. When I was 8, I was confused for being called bossy because I wanted to direct the plays that we would put on for our parents, but the boys were not. When at 14, I started to be sexualized by certain elements of the media. When at 15, my girlfriends started dropping out of sports teams because they didn’t want to appear muscly. When at 18, my male friends were unable to express their feelings. \n\nI decided that I was a feminist, and this seemed uncomplicated to me. But my recent research has shown me that feminism has become an unpopular word. Women are choosing not to identify as feminists. Apparently, I’m among the ranks of women whose expressions are seen as too strong, too aggressive, isolating, and anti-men. Unattractive, even.'),
 Document(id='c54fb947-663b-40a6-aff6-2395aa204ab4', metadata={'source': 'speech.txt'},

In [14]:
result[0].page_content

'I started questioning gender-based assumptions a long time ago. When I was 8, I was confused for being called bossy because I wanted to direct the plays that we would put on for our parents, but the boys were not. When at 14, I started to be sexualized by certain elements of the media. When at 15, my girlfriends started dropping out of sports teams because they didn’t want to appear muscly. When at 18, my male friends were unable to express their feelings. \n\nI decided that I was a feminist, and this seemed uncomplicated to me. But my recent research has shown me that feminism has become an unpopular word. Women are choosing not to identify as feminists. Apparently, I’m among the ranks of women whose expressions are seen as too strong, too aggressive, isolating, and anti-men. Unattractive, even.'

### As a Retriever 
    we can also convert the vectorstore into a  rectriver class. This allow us to easily use it in other langchain methods , which largely work with retrivers 

In [18]:
## Retriver 
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 1})
docs = retriever.invoke(query)


[Document(id='e0060c51-d327-4aad-bf49-f5a2318d3fa5', metadata={'source': 'speech.txt'}, page_content='I started questioning gender-based assumptions a long time ago. When I was 8, I was confused for being called bossy because I wanted to direct the plays that we would put on for our parents, but the boys were not. When at 14, I started to be sexualized by certain elements of the media. When at 15, my girlfriends started dropping out of sports teams because they didn’t want to appear muscly. When at 18, my male friends were unable to express their feelings. \n\nI decided that I was a feminist, and this seemed uncomplicated to me. But my recent research has shown me that feminism has become an unpopular word. Women are choosing not to identify as feminists. Apparently, I’m among the ranks of women whose expressions are seen as too strong, too aggressive, isolating, and anti-men. Unattractive, even.')]

In [19]:
docs[0].page_content

'Today we are launching a campaign called HeForShe. I am reaching out to you because we need your help. We want to end gender inequality, and to do this, we need everyone involved. This is the first campaign of its kind at the UN. We want to try to mobilize as many men and boys as possible to be advocates for change. And, we don’t just want to talk about it. We want to try and make sure that it’s tangible. \n\nI was appointed as Goodwill Ambassador for UN Women six months ago. And, the more I spoke about feminism, the more I realized that fighting for women’s rights has too often become synonymous with man-hating. If there is one thing I know for certain, it is that this has to stop. \n\nFor the record, feminism by definition is the belief that men and women should have equal rights and opportunities. It is the theory of political, economic and social equality of the sexes.'

# Similarity search with score 
    there are some FAISS specific methods . one of them is similarity_search_with_score which allows you to returen not onlythe documents but also the distance score of the query to them . The returned disctance score is L2 distance 

    so a lower score is better 

In [20]:
docs_and_score = db.similarity_search_with_score(query)
docs_and_score

[(Document(id='e0060c51-d327-4aad-bf49-f5a2318d3fa5', metadata={'source': 'speech.txt'}, page_content='I started questioning gender-based assumptions a long time ago. When I was 8, I was confused for being called bossy because I wanted to direct the plays that we would put on for our parents, but the boys were not. When at 14, I started to be sexualized by certain elements of the media. When at 15, my girlfriends started dropping out of sports teams because they didn’t want to appear muscly. When at 18, my male friends were unable to express their feelings. \n\nI decided that I was a feminist, and this seemed uncomplicated to me. But my recent research has shown me that feminism has become an unpopular word. Women are choosing not to identify as feminists. Apparently, I’m among the ranks of women whose expressions are seen as too strong, too aggressive, isolating, and anti-men. Unattractive, even.'),
  254.55301),
 (Document(id='c54fb947-663b-40a6-aff6-2395aa204ab4', metadata={'source'

In [21]:
embedding_vector = embeddings.embed_query(query)
embedding_vector

[0.7332888245582581,
 -0.12768308818340302,
 0.5107711553573608,
 0.8551225662231445,
 -0.7619158029556274,
 -1.307618498802185,
 -0.28656715154647827,
 0.7955676317214966,
 -0.14623241126537323,
 -0.3268262445926666,
 -0.6644721627235413,
 0.018557166680693626,
 -0.11915278434753418,
 -0.4113273620605469,
 -0.733522355556488,
 0.6028421521186829,
 -0.3661428391933441,
 -0.31077927350997925,
 -0.4197673201560974,
 0.045012108981609344,
 0.7596293091773987,
 0.05149848014116287,
 -0.26236557960510254,
 0.36713406443595886,
 -0.26761728525161743,
 0.21540567278862,
 0.24351844191551208,
 -0.10656388103961945,
 0.9815841913223267,
 0.2267509549856186,
 0.06295008957386017,
 0.5979960560798645,
 -0.4598802626132965,
 -0.8301398158073425,
 0.49974000453948975,
 -0.5955793857574463,
 -0.16179659962654114,
 -0.8628184199333191,
 -0.7828373312950134,
 -0.2904733121395111,
 0.5068140625953674,
 0.1925809383392334,
 0.4364457428455353,
 -0.7136217355728149,
 0.005076095461845398,
 0.080618962645

In [22]:
docs_score = db.similarity_search_by_vector(embedding_vector)
docs_score

[Document(id='e0060c51-d327-4aad-bf49-f5a2318d3fa5', metadata={'source': 'speech.txt'}, page_content='I started questioning gender-based assumptions a long time ago. When I was 8, I was confused for being called bossy because I wanted to direct the plays that we would put on for our parents, but the boys were not. When at 14, I started to be sexualized by certain elements of the media. When at 15, my girlfriends started dropping out of sports teams because they didn’t want to appear muscly. When at 18, my male friends were unable to express their feelings. \n\nI decided that I was a feminist, and this seemed uncomplicated to me. But my recent research has shown me that feminism has become an unpopular word. Women are choosing not to identify as feminists. Apparently, I’m among the ranks of women whose expressions are seen as too strong, too aggressive, isolating, and anti-men. Unattractive, even.'),
 Document(id='c54fb947-663b-40a6-aff6-2395aa204ab4', metadata={'source': 'speech.txt'},

In [23]:
### saving and loading the database
db.save_local("faiss_index")

In [26]:

new_db = FAISS.load_local("faiss_index", embeddings , allow_dangerous_deserialization=True)
new_db
new_db.similarity_search(query)

[Document(id='e0060c51-d327-4aad-bf49-f5a2318d3fa5', metadata={'source': 'speech.txt'}, page_content='I started questioning gender-based assumptions a long time ago. When I was 8, I was confused for being called bossy because I wanted to direct the plays that we would put on for our parents, but the boys were not. When at 14, I started to be sexualized by certain elements of the media. When at 15, my girlfriends started dropping out of sports teams because they didn’t want to appear muscly. When at 18, my male friends were unable to express their feelings. \n\nI decided that I was a feminist, and this seemed uncomplicated to me. But my recent research has shown me that feminism has become an unpopular word. Women are choosing not to identify as feminists. Apparently, I’m among the ranks of women whose expressions are seen as too strong, too aggressive, isolating, and anti-men. Unattractive, even.'),
 Document(id='c54fb947-663b-40a6-aff6-2395aa204ab4', metadata={'source': 'speech.txt'},