## FAISS

FAISS (Facebook AI Similarity Search) is a C++ library (python wrappers) for high performance nearest neighbor search and clustering of dense vectors. It is often used in machine learning, NLP and RAG applications. FAISS implements fast nearest neighbor search to provide a faster output. 

Before learning about Fash Nearest Neighbor Search, there are few more concepts that we need to be aware of. 

### Embedding Space
When we generate embeddings (say using OpenAI Sentence Transformers, or CLIP), each piece of text, image or item is mapped to a vector in a high-dimensional space. (384D, 768D, 1536D). Think of it as a cloud of points floating in a space. The distance between points reflects semantic similarity (close = similar, far = different)

### Vector Normalization
Vector Normalization means, scaling down a vector such that its own length (magnitude) becomes 1. After normalization, all vectors lie on the same unit sphere in the embedding space. This ensures that no vector dominates purely because it has a larger magnitude. Some embedding models produces vectors of different lengths even for similar meanings. Normalization ensures only direction (semantic meaning) matters. Lets talk about this in detail. 

 We all know, a vector is just a list of numbers representing some objects (like text, or an image). Example:

 - Vector for "doberman" -> [3,4]
 - Vector for "pug" -> [6,8]

Both these vectors are dogs. But, the vector for put is double the value of doberman, which may give a impression that they are not related to each other. Vector normalization removes this difference. 

Vector normalization scales the vectors such that their length (magnitude) becomes 1 without changing its direction. 

The formula for normalizing a vector is Vn = V/Squareroot(Sum of squares of all vectors)

For the above example, the vector for doberman would be (3/SqRoot(sq(3) + sq(4)),4/SqRoot(sq(3) + sq(4))) which would be [0.6,0.8]

If we add the same formula for pug, the normalized vector would be [0.6, 0.8]

### Nearest Neighbor Search (NNS)
Nearest Neighbor Search can be used to retrieve documents relevant to the query. Nearest Neighbor Search looks for vectors closest to the query vector to retrieve the documents. It uses the either of the below distance metrics to retrieve the documents. 

1. Cosine similarity (angle between vectors)
2. Euclidean distance (L2 norm)

There are two types of Nearest Neighbor Search

- **Exact Nearest Neighbor Search (ENN)**

    This is a brute force search. It compares the query vector with every vector in the database. While this is 100% accurate, it is too slow for millions of embeddings. 

- **Approximate Nearest Neighbor Search (ANN)**

    ANN uses clever data structures like trees, graphs or clustering to speed things up. It trades a tiny bit of accuracy for huge performance gains. ANN is widely used in RAG applications. 


### Distance Metrics 
Distance Metrics are various approaches that can be followed to find the distance between the vectors in Nearest Neighbor Search. The distance between the vectors determine the similarity between them. Closer the vectors, more similar they are. There are various distance metrics that are used.

1. **Euclidean Distance (L2 Norm)**

    - Formulae

        ![alt text](euclidean-distance-formulae.jpg "Euclidean Distance")
    
    - Measures the straight line geometric distance between two vectors. 
    - Intuition: "How apart are the points in space?"
    - Use case: Works well when both the magnitude and the direction of the vectors matter. (Ex: Clustering images or sensor data)

2. **Manhattan Distance (L1 Norm)**

    - Formulae

        ![alt text](manhattan-distance-formulae.jpg "Manhattan Distance")
    
    - Measures distance as if we can only move along the grid lines. Like city blocks in Manhattan.
    - Intuition: More robust to outliers than euclidean distance. (I doubt this)
    - Use case: Sometimes used in sparse embeddings or table data

3. **Cosine Similarity / Cosine Distance**

    - Formulae

        ![alt text](cosine-similarity-formulae.jpg "Cosine Similarity")

        Cosine Similarity = 1 - sim(x,y)

    - Focuses on the angle between the vectors
    - Intuition: Two vectors pointing in the same direction are similar even though they differ in length. 
    - Use case: Very common in semantic embeddings (text embeddings, document search, RAG memory) since meaning is captured by direction and not length.

4. **Dot Product / Inner Product**

    - Formulae

        ![alt text](dot-product-formula.jpg "Dot Product")

    - Dot Product is similar to cosine similarity. We can consider cosine similarity to be a normalized form of dot product. 
    - Dot Product considers magnitude in the score while cosine similarity removes it. 
    - We can consider cosine similarity as pure directional closeness. Dot Product is directional closeness weighted by size. 

### k-means clustering
Clustering is grouping of similar data-points together. k-means is unsupervised machine learning algorithm that partitions a dataset into **k clusters**. Here k is a predefined number of clusters that we define. In FAISS, this is called as **nlist**. 

Each cluster has a **centroid** and data-points that are closest to that centroid compared to other centroids. 

#### How k-means clustering works?
For a given value of k, the algorithm would

1. **Initialize centroids** randomly. It would pick k random points as initial cluster centers. 
2. **Assigns points to clusters**. Each data point goes to the cluster whose centroid is nearest. (using a distance metric, usually Euclidean). Can two points be equidistant to two or more centroids? While this is theoretically possible, practically, this is rare as the distance is computed as a floating point number. In theory if it happens, the algorithm will assign the data point to the first centroid it loads to the memory. 
3. **Recompute the centroids**. After all the data-points are assigned to their random centroids, The algorithm would **compute their mean (average) position** across each dimension. That average becomes the new centroid of the cluster. This is repeated for every cluster. 

#### When is this algorithm stopped?
1. **Centroids don't change anymore (Convergence)**
    If the new centroids are the same (or almost the same) as the old ones, the algorithm has converged. This means the clusters are stable.

2. **Point assignments don't change**
    Even if the centroids move a little, if every point remains assigned to the same cluster as the previous iteration, the process can stop. 

3. **Maximum number of iterations reached**
    To avoid infinite loops, a maximum limit like 100 or 300 iterations is set. If the algorithm hasn't converged by then, it stops. 

Most implementations (like FAISS) use:
- **Tolerance**: If the centroid movement is smaller than a small threshold (say 10-4), stop. 
- **Max iterations**: Even if it is not fully converged, stop after N iterations. 

### Compressing Vectors

#### Product Quantization (PQ)
In similarity search, storing and comparing raw vectors is expense. If we consider every vector dimension to take 4 bytes of memory, 1 billion vectors with 128 dimensions would take 512 GB of RAM. We need a way to compress vectors so we can still compare them approximately but with much less memory and faster distance computation. That's where Product Quantization comes in 

##### The idea of Product Quantization
Instead of representing a vector by its raw float value, PQ **splits it into smaller chunks** and quantizes each chunk separately. 

- **Step 1: Split the vector**
    - Suppose we have a vector of dimensions _D_ = 128
    - Split it into _M_ sub-vectors. If we consider _M_ to be 8, each sub-vector will have _D/M_ = 16 dimensions.

- **Step 2: Quantize each sub-vector**
    - For each sub-vector space, run **k-means** to create a code-book (dictionary) of centroids. If you choose 256 centroids (k=256), then each sub-vector can be approximated by just 1 byte. (0-255 index)

- **Step 3: Encode the vector**
    - Instead of storing the raw 128 floats, we store the **index of the closest centroid for each sub-vector. For 8 sub-vector, that would be 8 bytes in total. 

So the whole 128-D float vector (~512 bytes) is compressed to just 8 bytes.

##### Distance Computation with PQ
How do we search efficiently if we don't have raw vectors?

1. When a **query vector** comes in, we split it into sub-vectors too. 
2. For each sub-vector, compute its distance to all centroids in the code-book. 
3. For each database vector, look up its stored centroid index in each subspace and sum up the distances. 

This way, distance computation is super fast. Just table lookups + additions, instead of full vector distance calculations. 

##### Problem with Product Quantization
Product Quantization assumes that simply slicing the vector dimensions into contiguous blocks (example, first 8 dims and then the next 8 dims) is fine. However, **the variance or correlation between dimensions** can make this division suboptimal. 

The dimensions of our data can be correlated. This means one or two dimensions **vary together**. For example, if we consider height and weight of a person, they are correlated. People with higher height tend to have more weight. When we group correlated dimensions in PQ, they tend to influence the centroid creation while the other dimensions may have little to no effect. 

The dimensions of our data can be unevenly scaled. If some dimensions have higher numerical ranges compared to others, they tend to exert more influence on the centroid compared to other dimensions. 

These issues are solved by **Optimized Product Quantization**

#### Optimized Product Quantization (OPQ)
Optimized Product Quantization (OPQ) rearranges the vectors before applying Product Quantization. It does this by learning an orthogonal rotation (linear transform), that arranges the vector dimensions before applying PQ. The idea is that, by rotating the space, we can distribute the variance of data more evenly across the subspaces - so that PQ can quantize it more effectively. 

#### Scalar Quantization (SQ)
Scalar Quantization (SQ), instead of storing full-precision floats for each vector dimension, it approximates each component (dimension) by a discrete value - a "quantized" scaler. 

Each dimension is treated independently unlike in PQ which splits vectors into sub vectors. 

##### How it works?
- **Step 1: Choose the number of bits per dimension**
    An 8 Bit SQ can take 256 discrete values. A 4 bit SQ can take 16 discrete values.

- **Step 2: Find min/max values**
    For each dimension across all vectors, determine the minimum (Xmin) and maximum values (Xmax)

- **Step 3: Map each value to a discrete level**
    - Each dimension range (Xmin , Xmax) is divided into 2^b levels (where b is the number of bits)
    - Each value is replaced by the index of its nearest level. 

    **Example**
    - Suppose dimension values range (minimum and maximum) is [0,1]
    - 8-bit &rarr; 256 levels: 0.0, 0.0039, 0.0078, ..., 0.996
    - A value of 0.45 &rarr; nearest level 0.4492 &rarr; store index 115 instead of the float. 

- **Step 4: Store compressed vectors**
    - Each dimension is replaced by its quantization index. 
    - 128-D vector with 8-bit SQ &rarr; 128 bytes instead of 512 bytes (128 floats x 4 bytes).

#### Key differences in Quantization methods

| Feature | Product Quantization (PQ) | Scalar Quantization (SQ) |
|---------|---------------------------|--------------------------|
| Compression Method | Vectors split into sub-vectors | Vectors approximated by a scalar value 
| Complexity | Compute intensive | Simple - No k means per sub-vector |
| Memory Efficiency | Very high compression possible | Less efficient than PQ at same precision |
| Accuracy | Higher accuracy with the same code size | Less precise than PQ |
| Speed | Comparatively slower but still fast. Slow due to looks involved | Very fast |

### Indexes in FAISS
Index in FAISS decides how vectors are inserted, stored and retrieved. Every Index works with a similarity metric. (L2 distance , dot product etc)


1. Flat (Brute Force) Index

    - It stores all vectors in memory and compares query against all vectors. It supports 
        - Euclidean distance ```IndexFlatL2```
        - Inner product ```IndexFlatIP``` (cosine similarity if the vectors are normalized)
    - Pros: Exact search, very simple
    - Cons: Slow for very large datasets (O(n) comparisons)
    - Use cases: Can be used when the dataset is small

2. Inverted File Index (IVF)

    Inverted File Index (IVF) divides the vector space into clusters using k-means. At search time, it only probes a subset of clusters instead of all vectors. It uses the below parameters to perform the search

    - nlist &rarr; number of clusters
    - nprobe &rarr; number of clusters searched per query 
    
    The different variants of Inverted File Index (IVF) are 

    - ```IVFFlat```: Stores raw vectors in each cluster.
    - ```IVFPQ```: Uses Product Quantization to compress vectors in each space. 
    - ```IVFSQ```: Uses Scalar Quantization to compress vectors in each space. 

    - Pros: Much faster than brute force, scalable to billions of vectors
    - Cons: Approximate
    - Use case: Large datasets where speed is important. 

3. HNSW (Hierarchical Navigable Small World Graphs)

    Before learning about HNSW, we need to understand what a graph data structure is

    **Graph Data structure**
    A graph data structure is a non liner data structure made of vertices (nodes) and edges (links). It is used to model relationships between objects. Vertices are data-containing points and edges connect the vertices to show the relationship with them. Unlike trees, graphs have no root or leaves. Nodes can be connected in anyway allowing complex, non-hierarchical relationships. 

    HNSW, instead of clustering the vectors, it builds a **graph structure** to enable **fast approximate nearest neighbor (ANN) search**.

    It is widely used because it offers:

    - Very high recall (accuracy)
    - Low query latency
    - Works especially well for high dimensional vectors. 

    **Core Idea of HNSW**
    1. Graph-based indexing
        - Each vector is treated as a node in a graph
        - Edges connect each node to its nearest neighbors
    
    2. Small World Property
        - Graphs are built so that most nodes can be reached in just a few steps (like "6 degrees of separation").
        - This ensures search can navigate quickly

    3. Hierarchical layers
        - The graph is built in multiple layers. 
        - Top layer is sparse. It has fewer edges &rarr; allows long jumps across the dataset.
        - Lower layers are dense. It has more edges &rarr; allows precise local refinement. 

    **How Search works in HNSW?**
    1. Start at the **top layer** where there are very few nodes.
    2. From the entry point, **greedily walk** to the neighbor closest to the query. 
    3. Move down layer by layer, always refining the nearest candidate. 
    4. At the bottom layer (densest graph), expand the search locally to get the top k-nearest neighbors. 

    **Exponential Decay Function**
    1. In HNSW, the nodes are inserted randomly into various levels based on exponential decay function. 
    2. The primary property of **Exponential Decay Function** is that, it inserts very few nodes in the top layers and it inserts almost all the nodes in the bottom layer.
    3. The topmost layer will usually contain only one node. The probability of the topmost layer to contain more than one node is near zero. 
    4. Exponential decay function has the below formula.
            **_l=floor(-ln(u)*m)_**
            - **_u_** is a random number between 0 and 1
            - **_m_** is level multiplier. It controls the rate of decay. 
            - Larger  **_m_** &rarr; slower decay &rarr; more nodes appear in higher levels
            - Smaller **_m_** &rarr; faster decay &rarr; less nodes appear in higher levels.
            - **_floor_** is a round down function. The values are rounded down to the nearest integer. 
            - Level multiplier is derived from a user parameter **_M_** which is the maximum number of connections per node. M will typically be between 5 and 48. Given a value for **_M_**, level multiplier is calculated with the below formula.

                ![alt text](level-multiplier-formulae.jpg "Level Multiplier")

    **How are HNSW graphs created?**
    1. Consider we have a dataset of vectors **_V = {v1, v2, v3, v4.....vn}_** in a d-dimensional space.
    2. We also have the below parameters
        - **_M_**: Maximum number of connections per node
        - **_efConstruction_**: efConstruction stands for **exploration factor during construction**. When we insert a new vector into the HNSW graph, we need to 
            1. Search for its nearest neighbors among the already inserted nodes.
            2. Connect the new nodes to some of its neighbors (up to **_M_** neighbors)
        efConstruction controls how thoroughly we search for neighbors before deciding which ones to connect. 
    
    **Step 1: Initialization**
    1. Start with an empty graph
    2. Set the first inserted vector as the entry point. There is no particular logic in choosing the vector to be first inserted. It is simply the vector which is first inserted into the index. 
    3. Assign it to a maximum layer level. This is chosen randomly usually with an exponential distribution. 
        - A layer where a vector is inserted is chosen using exponential distribution formula. 
        - The key design is that the probability of being placed in higher layers decays exponentially. 
        - This means, most of the vectors on the vector space will reside on layer 0 and fewer and fewer vectors will reside on upper layers. 
        - This is calculated using the sampling formula. It goes as below. 
            **_l=floor(-ln(u)*m)_**
            - **_u_** is a random number between 0 and 1
            - **_m_** is level multiplier. It controls the rate of decay. 
            - Larger  **_m_** &rarr; slower decay &rarr; more nodes appear in higher levels
            - Smaller **_m_** &rarr; faster decay &rarr; less nodes appear in higher levels.
            - **_floor_** is a round down function. The values are rounded down to the nearest integer.  
    4. In the maximum level, usually there will be only one node. This is because we are using exponential decay function and the chances of having more nodes on the highest level is very minimum. The maximum level will have at-least (and at most times) one node. 


    **Step 2: For each new vector**
    1. Decide the layer height
        - Decide the layer height using the exponential distribution formula.
    2. Insert the vector in the graph
        - We insert the vector in all the layers from the layer height that is determined to all the layers below it.
    3. Once inserted, we need to connect the new nodes with the existing nodes in the graph. This is done in two ways. 
    4. We need to connect the newly inserted node on its top most level to a node in a level above that. That way, it will become a part of the hierarchy. We do this using the **greedy search**.
        - Start from the top level. There will only be one node in the top level. So go down to the next level. 
        - On each level, find the vector closest to the newly inserted vector using greedy search. We start with the top level node on that level, and compute the distance between our vector and all the nodes connected to the upper level node present in that level. 
        - We hop to this node and again compare the distance of all of its connected nodes to our node, find the closest one and hop to it. 
        - We do this, till we exhaust all the connected nodes and find the node closest to the newly inserted node. 
        - We drop down one level using this node and repeat the process in the lower levels. 
    5. Once we reach the maximum level where the node is inserted, we use **best first search** instead of greedy search. 
        - In best first search, instead of following one node as we did in greedy search, we keep a priority queue of all visited nodes sorted by distance to the target. 
        - We start from an entry node that is found using greedy search. 
        - We put it in an priority queue ordered by distance to the query. 
        - We pop the closest node from the PQ and explore its neighbors.
        - For each neighbor
            - If not visited, we compute the distance to the query 
            - Add it to the PQ. 
        - Continue until you have checked all the nodes that could be closest to the newly inserted node.
        - Once you identify the closest node, connect it to the newly inserted node.
        - This closest node will become the entry node to the level below to perform the best first search in that level. 
        - 
#### Flat Indexes (Exact Search)
Flat indexes just encode the vectors into codes of a fixed size and store them in an array of ntotal * code_size bytes. At search time, all the indexed vectors are decoded sequentially and compared to the query vectors. 

### Fast Nearest Neighbor Search
Nearest Neighbor Search looks for vectors closest to the query vector. In a large RAG applications, where we may have millions of document embeddings, a naive search would compute distances of the query to every single vector which would be very slow. 

Fast Nearest Neighbor Search uses optimized data-structures and algorithms. Fast Nearest Neighbor Search is a general term for any optimization that makes nearest neighbor retrieval quicker than brute force. The below optimization techniques are generally employed in Fast Nearest Neighbor Search

1. Hardware Level optimizations
2. Indexing Structures for Exact NNS
3. Approximate Nearest Neighbor (ANN) Structures

### FAISS

FAISS stands for Facebook AI Similarity Search. It is a library for fast nearest neighbor search in high-dimensional vector spaces. FAISS provides specialized data structures and algorithms to make Nearest Neighbor Search (NNS) faster. 

#### Core Features of FAISS


### Difference between FAISS and Chroma
| Metric | FAISS | Chroma |
| ------ | ----- | ------ |
| In a nutshell | A vector similarity search library (C++ library by Meta) | A **full fledged** vector database (written in python with persistence, metadata and query APIs) |
| Architecture | Focuses purely on efficient similarity search between vectors. It is not a database. There is no schema, metadata, filtering or query language. We have to handle persistence manually. (Saving / loading .faiss and .pkl files in LangChain) | It is a **open source** vector database for LLM applications. |
| Speed | Extremely fast (C++ core. Optimized for large-scale vector math) | Slower (Python + DB Overhead) |
| Scale | Millions - Billions of vectors | Usually up to a few million effectively |
| Index Types | Flat, IVF, HNSW, PQ, OPQ | Usually Flat |
| Approximate Search | Supported | No native support |
| GPU Acceleration | Supported | Not Supported |
| Filtering by Metadata | Manual | Built in |
| Persistence | Manual | Automatic |
| Distributed Support | Via custom setup | Limited (Mostly local) |
| Ideal for | Large scale vector search | Lightweight RAG apps |

### We need to install the below python libraries to use FAISS
- faiss-cpu

In [1]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from langchain.text_splitter import CharacterTextSplitter

# Load the documents
loader = TextLoader('sampletext.txt')
documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=50, chunk_overlap=5)

texts = text_splitter.split_documents(documents)

Created a chunk of size 555, which is longer than the specified 50
Created a chunk of size 624, which is longer than the specified 50


In [2]:
texts

[Document(metadata={'source': 'sampletext.txt'}, page_content='Agentic AI is a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks with or without human intervention. The independent systems automatically respond to conditions, with procedural, algorithmic, and human-like creative steps, to produce process results. The field is closely linked to agentic automation, also known as agent-based process management systems, when applied to process automation. Applications include software development, customer support, cybersecurity and business intelligence.'),
 Document(metadata={'source': 'sampletext.txt'}, page_content='The core concept of agentic AI is the use of AI agents to perform automated tasks with or without human intervention.[1] While robotic process automation (RPA) systems automate rule-based, repetitive tasks with fixed logic, agentic AI adapts and learns from data inputs. [2] Agentic AI refers to autonomous systems c

In [3]:
# Create the embeddings
embeddings = OllamaEmbeddings(model="embeddinggemma:latest")
# Create the vector store
db = FAISS.from_documents(texts, embeddings)

  embeddings = OllamaEmbeddings(model="embeddinggemma:latest")


In [None]:
### Querying
### Similarity search by default searches using Exact Nearest Neighbor (ANN) search and uses L2 distance metric and will return top 4 (k=4) similar documents.
query = "What is the core concept of agentic ai?"
docs = db.similarity_search(query)
docs

[Document(id='73c54611-4541-4df7-bf27-50de154685ba', metadata={'source': 'sampletext.txt'}, page_content='The core concept of agentic AI is the use of AI agents to perform automated tasks with or without human intervention.[1] While robotic process automation (RPA) systems automate rule-based, repetitive tasks with fixed logic, agentic AI adapts and learns from data inputs. [2] Agentic AI refers to autonomous systems capable of pursuing complex goals with minimal human intervention, often making decisions based on continuous learning and external data. [3] Functioning agents can require various AI techniques, such as natural language processing, machine learning (ML), and computer vision, depending on the environment.[1]')]

### Vector Store Retriever
A retriever is a component in langchain that is responsible for fetching relevant pieces of information from a datasource - usually based on user's query. It provides an abstraction (a common interface) to fetch documents from any backend. 

1. It converts question into embeddings
2. Searches for similar embeddings in a vector store. 
3. It returns the most relevant chunks of texts or documents. 

In [20]:
retriever = db.as_retriever()
docs = retriever.invoke(query)
docs

[Document(id='73c54611-4541-4df7-bf27-50de154685ba', metadata={'source': 'sampletext.txt'}, page_content='The core concept of agentic AI is the use of AI agents to perform automated tasks with or without human intervention.[1] While robotic process automation (RPA) systems automate rule-based, repetitive tasks with fixed logic, agentic AI adapts and learns from data inputs. [2] Agentic AI refers to autonomous systems capable of pursuing complex goals with minimal human intervention, often making decisions based on continuous learning and external data. [3] Functioning agents can require various AI techniques, such as natural language processing, machine learning (ML), and computer vision, depending on the environment.[1]'),
 Document(id='f8718a40-b012-4fed-a587-c73a010fb1c2', metadata={'source': 'sampletext.txt'}, page_content='Agentic AI is a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks with or without human intervention

### Similarity Search with Score

```similarity_search_with_score```` is a FAISS method which will return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Lower the score, better is the similarity. 

In [8]:
docs_with_score = db.similarity_search_with_score(query)
docs_with_score

[(Document(id='73c54611-4541-4df7-bf27-50de154685ba', metadata={'source': 'sampletext.txt'}, page_content='The core concept of agentic AI is the use of AI agents to perform automated tasks with or without human intervention.[1] While robotic process automation (RPA) systems automate rule-based, repetitive tasks with fixed logic, agentic AI adapts and learns from data inputs. [2] Agentic AI refers to autonomous systems capable of pursuing complex goals with minimal human intervention, often making decisions based on continuous learning and external data. [3] Functioning agents can require various AI techniques, such as natural language processing, machine learning (ML), and computer vision, depending on the environment.[1]'),
  np.float32(0.7763521)),
 (Document(id='f8718a40-b012-4fed-a587-c73a010fb1c2', metadata={'source': 'sampletext.txt'}, page_content='Agentic AI is a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks with o

In [None]:
### Similarity Search with Vectors
query_vector = embeddings.embed_query(query)
docs_with_vectors = db.similarity_search_by_vector(query_vector)
docs_with_vectors


[Document(id='73c54611-4541-4df7-bf27-50de154685ba', metadata={'source': 'sampletext.txt'}, page_content='The core concept of agentic AI is the use of AI agents to perform automated tasks with or without human intervention.[1] While robotic process automation (RPA) systems automate rule-based, repetitive tasks with fixed logic, agentic AI adapts and learns from data inputs. [2] Agentic AI refers to autonomous systems capable of pursuing complex goals with minimal human intervention, often making decisions based on continuous learning and external data. [3] Functioning agents can require various AI techniques, such as natural language processing, machine learning (ML), and computer vision, depending on the environment.[1]'),
 Document(id='f8718a40-b012-4fed-a587-c73a010fb1c2', metadata={'source': 'sampletext.txt'}, page_content='Agentic AI is a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks with or without human intervention

In [None]:
### Save the vector store db to disk
db.save_local("faiss_index")

### The above method will store the vector store db in a folder named faiss_index. It stores it as two files: index.faiss and index.pkl
### index.faiss contains the actual FAISS index. The vectors are stored in a binary format optimized for fast similarity search. 
### index.pkl contains the metadata and other information about the vector store, including the mapping between the vectors and the original documents.

In [15]:
### Load the vector store db from disk
new_db = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)
docs = new_db.similarity_search(query)
docs

[Document(id='73c54611-4541-4df7-bf27-50de154685ba', metadata={'source': 'sampletext.txt'}, page_content='The core concept of agentic AI is the use of AI agents to perform automated tasks with or without human intervention.[1] While robotic process automation (RPA) systems automate rule-based, repetitive tasks with fixed logic, agentic AI adapts and learns from data inputs. [2] Agentic AI refers to autonomous systems capable of pursuing complex goals with minimal human intervention, often making decisions based on continuous learning and external data. [3] Functioning agents can require various AI techniques, such as natural language processing, machine learning (ML), and computer vision, depending on the environment.[1]'),
 Document(id='f8718a40-b012-4fed-a587-c73a010fb1c2', metadata={'source': 'sampletext.txt'}, page_content='Agentic AI is a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks with or without human intervention