# Encode and Index Sentences in FAISS

## 1. Import Libraries and Sample Data

In [1]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

In [2]:
# Load pretrained model
model = SentenceTransformer('all-MiniLM-L6-v2')  # Fast and accurate

# Sample sentences (these could be resumes, FAQs, JD lines, etc.)
sentences = [
    "Looking for a backend engineer with Python experience.",
    "Proficient in data analysis using SQL and Python.",
    "Seeking an expert in AWS cloud deployment.",
    "Experience with Docker and CI/CD pipelines is required.",
    "Strong knowledge of machine learning and deep learning.",
]

## 2. Generate Sentence Embeddings

In [3]:
# Generate embeddings for each sentence (768-dim vectors)
embeddings = model.encode(sentences)

# Convert to NumPy float32 (required by FAISS)
embeddings = np.array(embeddings).astype('float32')

In [5]:
embeddings

array([[-0.07723466, -0.03381972, -0.00117026, ...,  0.0772387 ,
         0.07817636,  0.02331655],
       [-0.01174889, -0.00998492, -0.01694515, ...,  0.05859416,
         0.08929224,  0.011912  ],
       [ 0.04048735, -0.00954833, -0.00259527, ..., -0.00608465,
        -0.01668681,  0.04723663],
       [-0.00978079,  0.03192974,  0.02150432, ..., -0.00645307,
         0.06357714,  0.01552729],
       [-0.09907927, -0.03590719,  0.0935325 , ...,  0.00611919,
        -0.02670863,  0.01894625]], dtype=float32)

## 3. Create FAISS Index and Add Vectors

In [4]:
# Choose index type ‚Äî Flat = brute-force but accurate
index = faiss.IndexFlatL2(embeddings.shape[1]) # 768-dim
#shape[1] represents, FAISS needs to know the dimensionality of each vector so it can store and compare them correctly.

# Add sentence embeddings to index
index.add(embeddings)

# Print number of vectors in the index
print('Total vectors indexed:',index.ntotal)

Total vectors indexed: 5


What is index?
	‚Ä¢	index is your FAISS vector search engine
	‚Ä¢	It‚Äôs a flat L2 index ‚Üí it compares every new query against all stored vectors using L2 (Euclidean) distance

Own semantic search engine backend has been created 

# Query Your FAISS Index

We‚Äôll take a new sentence (a query), convert it to an embedding, and find the most similar vector(s) in your indexed dataset.

## 1. Define and Encode a Query Sentence

In [8]:
# New query sentence (can be anything job-related, etc.)
query = 'skilled in ml and dl'

# Convert to embedding
query_embedding = model.encode([query]).astype('float32') 

## 2. Perform the Search

In [9]:
# k = number of top results to return
k =2
distances, indices = index.search(query_embedding, k)

#show the top results 
print("\nüîç Query:", query)
for i, idx in enumerate(indices[0]):
    print(f"{i+1}. Match: {sentences[idx]}  (Distance: {distances[0][i]:.4f})")


üîç Query: skilled in ml and dl
1. Match: Strong knowledge of machine learning and deep learning.  (Distance: 1.1124)
2. Match: Proficient in data analysis using SQL and Python.  (Distance: 1.2626)


üß† What‚Äôs Happening?
	‚Ä¢	FAISS computes L2 distances between the query vector and all indexed vectors.
	‚Ä¢	Returns the k most similar sentences (lower distance = better match).

Example:
{
indices = [[2, 4]],
distances = [[0.19, 0.47]] 
}
‚Ä¢	Best match = sentence at index 2 (distance 0.19)
‚Ä¢	Second best = sentence at index 4 (distance 0.47)

# Dynamic Query Search with User Input

In [11]:
# Take input from user
user_query = input("üîç Enter your search query: ")

# Encode the input into vector
query_vector = model.encode([user_query]).astype("float32")

# Search in FAISS index
k = 2  # Number of top results
distances, indices = index.search(query_vector, k)

# Display results
print(f"\nüîç Top {k} results for your query: \"{user_query}\"\n")
for i, idx in enumerate(indices[0]):
    print(f"{i+1}. Match: {sentences[idx]}  (Distance: {distances[0][i]:.4f})")

üîç Enter your search query:  could u give me people skilled in python scripting



üîç Top 2 results for your query: "could u give me people skilled in python scripting"

1. Match: Looking for a backend engineer with Python experience.  (Distance: 0.8381)
2. Match: Proficient in data analysis using SQL and Python.  (Distance: 0.9466)
