``` Facebook AI Similarity Search (FAISS)``` is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.

In [6]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader('esports.txt', encoding='utf-8')
documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=30)
docs = text_splitter.split_documents(documents)


Created a chunk of size 1558, which is longer than the specified 1000
Created a chunk of size 1893, which is longer than the specified 1000
Created a chunk of size 1222, which is longer than the specified 1000
Created a chunk of size 1687, which is longer than the specified 1000


In [7]:
docs

[Document(metadata={'source': 'esports.txt'}, page_content='Esports (/ˈiːspɔːrts/ ⓘ), short for electronic sports, is a form of competition using video games.[3] Esports often takes the form of organized, multiplayer video game competitions, particularly between professional players, played individually or as teams.[4][5][6]\n\nMultiplayer competitions were long a part of video game culture, but were largely between amateurs until the late 2000s when the advent of online streaming media platforms, particularly YouTube and Twitch, enabled a surge in participation by professional gamers and spectators.[7][8] By the 2010s, esports was a major part of the video game industry, with many game developers designing for and funding tournaments and other events.'),
 Document(metadata={'source': 'esports.txt'}, page_content='Esports first became popular in East Asia, particularly in China and South Korea (which first licensed professional players in 2000) but less so in Japan, whose broad anti-ga

In [11]:
from langchain_community.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings(model="llama2")
db = FAISS.from_documents(docs, embeddings)

In [12]:
db

<langchain_community.vectorstores.faiss.FAISS at 0x1ec520aa900>

In [15]:
## querying
query = "Who held an international Track & Field arcade game competition?"
docs = db.similarity_search(query)
docs

[Document(id='36dc7e27-9cc3-45a4-a64e-e846821caf56', metadata={'source': 'esports.txt'}, page_content='In 1984, Konami and Centuri jointly held an international Track & Field arcade game competition that drew more than a million players from across Japan and North America. Play Meter in 1984 called it "the coin-op event of the year" and an "event on a scale never before achieved in the industry".[31] As of 2016, it holds the record for the largest organized video game competition of all time, according to Guinness World Records.[32]'),
 Document(id='7388313a-568a-4f70-bc50-abf55e4b2fee', metadata={'source': 'esports.txt'}, page_content='In April 2006, the G7 teams federation were formed by seven prominent Counter-Strike teams. The goal of the organization was to increase stability in the esports world, particularly in standardizing player transfers and working with leagues and organizations. The founding members were 4Kings, Fnatic, Made in Brazil, Mousesports, NiP, SK-Gaming, and Team

In [16]:
docs[0].page_content

'In 1984, Konami and Centuri jointly held an international Track & Field arcade game competition that drew more than a million players from across Japan and North America. Play Meter in 1984 called it "the coin-op event of the year" and an "event on a scale never before achieved in the industry".[31] As of 2016, it holds the record for the largest organized video game competition of all time, according to Guinness World Records.[32]'

## As a Retriever

We can also convert the vectorstore into a Retriever Class. This allows us to easily use it in other LangChain methods, which largely work with retrievers

In [None]:
retriever = db.as_retriever()

In [19]:
docs = retriever.invoke(query)
docs

[Document(id='36dc7e27-9cc3-45a4-a64e-e846821caf56', metadata={'source': 'esports.txt'}, page_content='In 1984, Konami and Centuri jointly held an international Track & Field arcade game competition that drew more than a million players from across Japan and North America. Play Meter in 1984 called it "the coin-op event of the year" and an "event on a scale never before achieved in the industry".[31] As of 2016, it holds the record for the largest organized video game competition of all time, according to Guinness World Records.[32]'),
 Document(id='7388313a-568a-4f70-bc50-abf55e4b2fee', metadata={'source': 'esports.txt'}, page_content='In April 2006, the G7 teams federation were formed by seven prominent Counter-Strike teams. The goal of the organization was to increase stability in the esports world, particularly in standardizing player transfers and working with leagues and organizations. The founding members were 4Kings, Fnatic, Made in Brazil, Mousesports, NiP, SK-Gaming, and Team

In [20]:
docs[0].page_content

'In 1984, Konami and Centuri jointly held an international Track & Field arcade game competition that drew more than a million players from across Japan and North America. Play Meter in 1984 called it "the coin-op event of the year" and an "event on a scale never before achieved in the industry".[31] As of 2016, it holds the record for the largest organized video game competition of all time, according to Guinness World Records.[32]'

# Similarity Search with score

There are some FAISS specific methods. One of them is similarity_search_with_score, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore, a lower score is better.

In [21]:
docs_and_score = db.similarity_search_with_score(query)
docs_and_score

[(Document(id='36dc7e27-9cc3-45a4-a64e-e846821caf56', metadata={'source': 'esports.txt'}, page_content='In 1984, Konami and Centuri jointly held an international Track & Field arcade game competition that drew more than a million players from across Japan and North America. Play Meter in 1984 called it "the coin-op event of the year" and an "event on a scale never before achieved in the industry".[31] As of 2016, it holds the record for the largest organized video game competition of all time, according to Guinness World Records.[32]'),
  np.float32(8933.879)),
 (Document(id='7388313a-568a-4f70-bc50-abf55e4b2fee', metadata={'source': 'esports.txt'}, page_content='In April 2006, the G7 teams federation were formed by seven prominent Counter-Strike teams. The goal of the organization was to increase stability in the esports world, particularly in standardizing player transfers and working with leagues and organizations. The founding members were 4Kings, Fnatic, Made in Brazil, Mousesport

In [22]:
embeddings_vector = embeddings.embed_query(query)
embeddings_vector

[0.6956684589385986,
 -0.021117670461535454,
 2.4177358150482178,
 -1.2461029291152954,
 -1.7946999073028564,
 0.13234010338783264,
 0.08609988540410995,
 -0.2686516344547272,
 0.754967987537384,
 -0.9893657565116882,
 1.1619278192520142,
 -0.5679734349250793,
 -1.8908501863479614,
 1.4104551076889038,
 -1.9609028100967407,
 -1.1547333002090454,
 1.56021249294281,
 0.35567909479141235,
 1.1301827430725098,
 -3.2312142848968506,
 -0.7243523597717285,
 -0.2218884378671646,
 1.2058007717132568,
 -1.71611487865448,
 -0.5881711840629578,
 -0.8638231754302979,
 1.5743311643600464,
 0.23956145346164703,
 -1.4970157146453857,
 -0.47607192397117615,
 3.2452051639556885,
 -2.1886978149414062,
 -2.81087327003479,
 4.594067096710205,
 -0.17442677915096283,
 -4.36420202255249,
 -0.15275782346725464,
 0.697551429271698,
 1.2772345542907715,
 0.17341549694538116,
 0.6341754198074341,
 -2.8329575061798096,
 -1.4808661937713623,
 -0.2124076634645462,
 0.2728390693664551,
 0.2438860535621643,
 -0.464113

In [23]:
docs_score = db.similarity_search_by_vector(embeddings_vector)
docs_score

[Document(id='36dc7e27-9cc3-45a4-a64e-e846821caf56', metadata={'source': 'esports.txt'}, page_content='In 1984, Konami and Centuri jointly held an international Track & Field arcade game competition that drew more than a million players from across Japan and North America. Play Meter in 1984 called it "the coin-op event of the year" and an "event on a scale never before achieved in the industry".[31] As of 2016, it holds the record for the largest organized video game competition of all time, according to Guinness World Records.[32]'),
 Document(id='7388313a-568a-4f70-bc50-abf55e4b2fee', metadata={'source': 'esports.txt'}, page_content='In April 2006, the G7 teams federation were formed by seven prominent Counter-Strike teams. The goal of the organization was to increase stability in the esports world, particularly in standardizing player transfers and working with leagues and organizations. The founding members were 4Kings, Fnatic, Made in Brazil, Mousesports, NiP, SK-Gaming, and Team

# Saving And Loading 

In [24]:
db.save_local("faiss_index")

In [26]:
new_db = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)

In [27]:
new_db

<langchain_community.vectorstores.faiss.FAISS at 0x1ec524902d0>

In [28]:
docs = new_db.similarity_search(query)
docs

[Document(id='36dc7e27-9cc3-45a4-a64e-e846821caf56', metadata={'source': 'esports.txt'}, page_content='In 1984, Konami and Centuri jointly held an international Track & Field arcade game competition that drew more than a million players from across Japan and North America. Play Meter in 1984 called it "the coin-op event of the year" and an "event on a scale never before achieved in the industry".[31] As of 2016, it holds the record for the largest organized video game competition of all time, according to Guinness World Records.[32]'),
 Document(id='7388313a-568a-4f70-bc50-abf55e4b2fee', metadata={'source': 'esports.txt'}, page_content='In April 2006, the G7 teams federation were formed by seven prominent Counter-Strike teams. The goal of the organization was to increase stability in the esports world, particularly in standardizing player transfers and working with leagues and organizations. The founding members were 4Kings, Fnatic, Made in Brazil, Mousesports, NiP, SK-Gaming, and Team