### FAISS

Facebook Ai - Simillarity Search (Faiss) is a library for effeciently simillarity search and clustering of dense vector. it
contain algorithms that search in sets of vector of any size , upto ones that possibly do not fit in RAM . It also contain
supporting code for evaluation and parameter training 

In [1]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("bctech2011.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size = 50, chunk_overlap = 20)
docs = text_splitter.split_documents(documents)

Created a chunk of size 127, which is longer than the specified 50


In [2]:
docs

[Document(metadata={'source': 'bctech2011.txt'}, page_content='Title: ML and AI-based insurance premium model to predict premium to be charged by the insurance company - Blackcoffer Insights'),
 Document(metadata={'source': 'bctech2011.txt'}, page_content='Home\nOur Success Stories\nML and AI-based insurance premium model to predict premium to be charged...\nOur Success Stories\nBanking, Financials, Securities, and Insurance\nML and AI-based insurance premium model to predict premium to be charged by the insurance company\nBy\nAjay Bidyarthy\n-\nJanuary 7, 2024\n3648\nClient Background\nClient:\nA leading insurance firm worldwide\nIndustry Type:\nBFSI\nProducts & Services:\nInsurance\nOrganization Size:\n10000+\nThe Problem\nThe insurance industry, particularly in the context of providing coverage to Public Company Directors against Insider Trading public lawsuits, faces a significant challenge in accurately determining insurance premiums. Traditional methods of premium calculation may

In [5]:
embeddings = OllamaEmbeddings(model= "gemma2:2b")

In [7]:
db = FAISS.from_documents(docs, embeddings)
db

<langchain_community.vectorstores.faiss.FAISS at 0x22db02e21d0>

In [9]:
### Querying 
query = "what is ml ?"
docs = db.similarity_search(query)
docs

### Below Is The Entire Response That We Are Getting 

[Document(metadata={'source': 'bctech2011.txt'}, page_content='Title: ML and AI-based insurance premium model to predict premium to be charged by the insurance company - Blackcoffer Insights'),
 Document(metadata={'source': 'bctech2011.txt'}, page_content='Home\nOur Success Stories\nML and AI-based insurance premium model to predict premium to be charged...\nOur Success Stories\nBanking, Financials, Securities, and Insurance\nML and AI-based insurance premium model to predict premium to be charged by the insurance company\nBy\nAjay Bidyarthy\n-\nJanuary 7, 2024\n3648\nClient Background\nClient:\nA leading insurance firm worldwide\nIndustry Type:\nBFSI\nProducts & Services:\nInsurance\nOrganization Size:\n10000+\nThe Problem\nThe insurance industry, particularly in the context of providing coverage to Public Company Directors against Insider Trading public lawsuits, faces a significant challenge in accurately determining insurance premiums. Traditional methods of premium calculation may

In [13]:
print(len(docs))
print(docs[0].page_content)


2
Title: ML and AI-based insurance premium model to predict premium to be charged by the insurance company - Blackcoffer Insights


In [14]:
print(docs[1].page_content)

Home
Our Success Stories
ML and AI-based insurance premium model to predict premium to be charged...
Our Success Stories
Banking, Financials, Securities, and Insurance
ML and AI-based insurance premium model to predict premium to be charged by the insurance company
By
Ajay Bidyarthy
-
January 7, 2024
3648
Client Background
Client:
A leading insurance firm worldwide
Industry Type:
BFSI
Products & Services:
Insurance
Organization Size:
10000+
The Problem
The insurance industry, particularly in the context of providing coverage to Public Company Directors against Insider Trading public lawsuits, faces a significant challenge in accurately determining insurance premiums. Traditional methods of premium calculation may lack precision, and there is a growing need for more sophisticated and data-driven approaches. The integration of Artificial Intelligence (AI) and Machine Learning (ML) models in predicting insurance premiums for this specialized coverage is essential to enhance accuracy, fair

### As A Retriever

We can also convert the vectorstore into a Retriever class . This allow us to easily use it in other langchain methods, 
which largely works with retrievers 

In [15]:
retriever = db.as_retriever()
retriever.invoke(query)

[Document(metadata={'source': 'bctech2011.txt'}, page_content='Title: ML and AI-based insurance premium model to predict premium to be charged by the insurance company - Blackcoffer Insights'),
 Document(metadata={'source': 'bctech2011.txt'}, page_content='Home\nOur Success Stories\nML and AI-based insurance premium model to predict premium to be charged...\nOur Success Stories\nBanking, Financials, Securities, and Insurance\nML and AI-based insurance premium model to predict premium to be charged by the insurance company\nBy\nAjay Bidyarthy\n-\nJanuary 7, 2024\n3648\nClient Background\nClient:\nA leading insurance firm worldwide\nIndustry Type:\nBFSI\nProducts & Services:\nInsurance\nOrganization Size:\n10000+\nThe Problem\nThe insurance industry, particularly in the context of providing coverage to Public Company Directors against Insider Trading public lawsuits, faces a significant challenge in accurately determining insurance premiums. Traditional methods of premium calculation may

### SIMILARITY SEARCH WITH SCORES 

There are some FAISS specific methods. One of them is similarity_search_with_score , which allow you to return not only the document
but also distance score of the query to them. This return Distance Score is  L2 distance . Therefore, a lower score is better 

In [16]:
docs_and_scores = db.similarity_search_with_score(query)

In [17]:
docs_and_scores

[(Document(metadata={'source': 'bctech2011.txt'}, page_content='Title: ML and AI-based insurance premium model to predict premium to be charged by the insurance company - Blackcoffer Insights'),
  9280.707),
 (Document(metadata={'source': 'bctech2011.txt'}, page_content='Home\nOur Success Stories\nML and AI-based insurance premium model to predict premium to be charged...\nOur Success Stories\nBanking, Financials, Securities, and Insurance\nML and AI-based insurance premium model to predict premium to be charged by the insurance company\nBy\nAjay Bidyarthy\n-\nJanuary 7, 2024\n3648\nClient Background\nClient:\nA leading insurance firm worldwide\nIndustry Type:\nBFSI\nProducts & Services:\nInsurance\nOrganization Size:\n10000+\nThe Problem\nThe insurance industry, particularly in the context of providing coverage to Public Company Directors against Insider Trading public lawsuits, faces a significant challenge in accurately determining insurance premiums. Traditional methods of premium 

In [18]:
embedding_vectors = embeddings.embed_query(query)
embedding_vectors

[-1.5583081245422363,
 1.984365463256836,
 -2.370910406112671,
 -0.029823865741491318,
 -0.6936257481575012,
 0.882943332195282,
 -1.0666611194610596,
 -0.08940568566322327,
 -0.8183586001396179,
 0.40861767530441284,
 -1.5469861030578613,
 -0.7263386249542236,
 -0.5875989198684692,
 -0.5163737535476685,
 1.9676717519760132,
 -1.4269853830337524,
 -2.517606019973755,
 -1.7854528427124023,
 -2.658860206604004,
 1.0552557706832886,
 0.7995272278785706,
 -1.942483901977539,
 -0.08745452016592026,
 0.8684563636779785,
 0.048345740884542465,
 0.730903685092926,
 -0.26417845487594604,
 -0.4029219150543213,
 0.6540932059288025,
 -2.0219428539276123,
 0.14380860328674316,
 -2.4839906692504883,
 -3.0779101848602295,
 0.3229588568210602,
 0.18659192323684692,
 -0.10088672488927841,
 -0.38595426082611084,
 -0.36380094289779663,
 2.7111117839813232,
 -2.8395018577575684,
 0.24131031334400177,
 0.6655551791191101,
 0.6928049325942993,
 -0.7849926948547363,
 -0.6952816843986511,
 -0.3768268823623657

In [19]:
doc_scores = db.similarity_search_by_vector(embedding_vectors)
doc_scores

[Document(metadata={'source': 'bctech2011.txt'}, page_content='Title: ML and AI-based insurance premium model to predict premium to be charged by the insurance company - Blackcoffer Insights'),
 Document(metadata={'source': 'bctech2011.txt'}, page_content='Home\nOur Success Stories\nML and AI-based insurance premium model to predict premium to be charged...\nOur Success Stories\nBanking, Financials, Securities, and Insurance\nML and AI-based insurance premium model to predict premium to be charged by the insurance company\nBy\nAjay Bidyarthy\n-\nJanuary 7, 2024\n3648\nClient Background\nClient:\nA leading insurance firm worldwide\nIndustry Type:\nBFSI\nProducts & Services:\nInsurance\nOrganization Size:\n10000+\nThe Problem\nThe insurance industry, particularly in the context of providing coverage to Public Company Directors against Insider Trading public lawsuits, faces a significant challenge in accurately determining insurance premiums. Traditional methods of premium calculation may

In [20]:
### Saving And Loading 

db.save_local("faiss_index")

In [22]:
new_db = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)


In [23]:
docs = new_db.similarity_search(query)

In [24]:
docs

[Document(metadata={'source': 'bctech2011.txt'}, page_content='Title: ML and AI-based insurance premium model to predict premium to be charged by the insurance company - Blackcoffer Insights'),
 Document(metadata={'source': 'bctech2011.txt'}, page_content='Home\nOur Success Stories\nML and AI-based insurance premium model to predict premium to be charged...\nOur Success Stories\nBanking, Financials, Securities, and Insurance\nML and AI-based insurance premium model to predict premium to be charged by the insurance company\nBy\nAjay Bidyarthy\n-\nJanuary 7, 2024\n3648\nClient Background\nClient:\nA leading insurance firm worldwide\nIndustry Type:\nBFSI\nProducts & Services:\nInsurance\nOrganization Size:\n10000+\nThe Problem\nThe insurance industry, particularly in the context of providing coverage to Public Company Directors against Insider Trading public lawsuits, faces a significant challenge in accurately determining insurance premiums. Traditional methods of premium calculation may