# FAISS (Facebook AI Similarity Search)
It is library for efficient similarity search and clustering of dense vectors. It contains  algorithms that search in sets of vectors of any size, upto ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.

In [3]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("speech.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=30)
docs = text_splitter.split_documents(documents)

In [4]:
docs

[Document(metadata={'source': 'speech.txt'}, page_content="A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative AI.[1][2] The concept was initially developed by Ian Goodfellow and his colleagues in June 2014.[3] In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.\n\nGiven a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning,[4] fully supervised learning,[5] and reinforcement learning.[6]"),
 Document(metadata={'source': 'speech.txt'}, page_content='The core idea of 

In [6]:
embeddings = OllamaEmbeddings(model="gemma:2b")
db = FAISS.from_documents(docs, embedding=embeddings)
db

<langchain_community.vectorstores.faiss.FAISS at 0x109c0c790>

In [8]:
# querying
query = "What is the core idea of GAN"
docs = db.similarity_search(query=query)
docs[1].page_content

'The core idea of a GAN is based on the "indirect" training through the discriminator, another neural network that can tell how "realistic" the input seems, which itself is also being updated dynamically.[7] This means that the generator is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner.\n\nGANs are similar to mimicry in evolutionary biology, with an evolutionary arms race between both networks.'

### As a Retriever 
we can also convert the vectorstore into a retriever class. This allows us to easily use it into other langchain methods, which largely works in retrievers

In [9]:
retriever = db.as_retriever()
retriever.invoke(query)

[Document(metadata={'source': 'speech.txt'}, page_content="A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative AI.[1][2] The concept was initially developed by Ian Goodfellow and his colleagues in June 2014.[3] In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.\n\nGiven a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning,[4] fully supervised learning,[5] and reinforcement learning.[6]"),
 Document(metadata={'source': 'speech.txt'}, page_content='The core idea of 

## Similarity search with score
There are some FAISS specific functions. One of them is similarity_search_with_score, which allows you to return not only documents but also the distance score of the query to them. The returned score is L2 distance. Therefore, a lower score is better

In [10]:
docs_and_score = db.similarity_search_with_score(query)
docs_and_score

[(Document(metadata={'source': 'speech.txt'}, page_content="A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative AI.[1][2] The concept was initially developed by Ian Goodfellow and his colleagues in June 2014.[3] In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.\n\nGiven a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning,[4] fully supervised learning,[5] and reinforcement learning.[6]"),
  5181.476),
 (Document(metadata={'source': 'speech.txt'}, page_content='Th

In [11]:
embedding_vector = embeddings.embed_query(query)
embedding_vector

[0.9256177544593811,
 -1.0974011421203613,
 1.276012659072876,
 2.0323569774627686,
 0.31248992681503296,
 1.919869303703308,
 -0.5657263398170471,
 -0.5735724568367004,
 -1.8245986700057983,
 -0.5964614152908325,
 2.6181633472442627,
 0.752016007900238,
 0.7414097785949707,
 -0.5096820592880249,
 1.2657777070999146,
 -0.212593212723732,
 4.085391044616699,
 3.216667413711548,
 1.7819242477416992,
 -1.540102481842041,
 1.1149333715438843,
 -1.197122573852539,
 0.4974328875541687,
 0.6927240490913391,
 -0.5599958896636963,
 0.9644643664360046,
 0.9824890494346619,
 -1.2717164754867554,
 -0.4128232002258301,
 -0.9229079484939575,
 -0.6440960168838501,
 -1.4435441493988037,
 -0.7338292002677917,
 -1.0935347080230713,
 -0.8462935090065002,
 -2.144460916519165,
 1.3638195991516113,
 -0.0614805705845356,
 0.7500710487365723,
 -1.8072205781936646,
 0.4716472327709198,
 0.21512989699840546,
 -0.05225863307714462,
 0.7475586533546448,
 0.6191796660423279,
 -0.06523691862821579,
 0.0467688292264

In [13]:
docs_and_score = db.similarity_search_by_vector(embedding_vector)
docs_and_score

[Document(metadata={'source': 'speech.txt'}, page_content="A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative AI.[1][2] The concept was initially developed by Ian Goodfellow and his colleagues in June 2014.[3] In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.\n\nGiven a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning,[4] fully supervised learning,[5] and reinforcement learning.[6]"),
 Document(metadata={'source': 'speech.txt'}, page_content='The core idea of 

In [14]:
# Saving and loading
db.save_local("faiss_index")

In [16]:
new_db = FAISS.load_local("faiss_index", embeddings=embeddings, allow_dangerous_deserialization=True)

In [19]:
docs = new_db.similarity_search(query)
docs

[Document(metadata={'source': 'speech.txt'}, page_content="A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative AI.[1][2] The concept was initially developed by Ian Goodfellow and his colleagues in June 2014.[3] In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.\n\nGiven a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning,[4] fully supervised learning,[5] and reinforcement learning.[6]"),
 Document(metadata={'source': 'speech.txt'}, page_content='The core idea of 

# Chroma
Chroma is an AI-native open source vector database focused on developer's productivity and happiness. Chroma is licensed under Apache 2.0