# Demo Notebook: Embeddings & Vector Search (Task 2)
This notebook demonstrates dataset loading, chunking, embedding, storing, and similarity search.

In [24]:
import faiss
import numpy as np
import re
from sentence_transformers import SentenceTransformer



## Step 1: Load Dataset

In [25]:
try:
    dataset_text = """
Retrieval-Augmented Generation (RAG) significantly improves the accuracy and reliability of language models by grounding their answers in external knowledge sources.
Traditional language models often hallucinate or confidently produce incorrect information because they cannot verify facts or access new knowledge.
RAG solves this by connecting retrieval systems with generative models.
First, relevant documents are retrieved using similarity search techniques.
Then, the retrieved text is inserted into the prompt, allowing the model to generate answers based on real context instead of guessing.
This approach is used in systems like support chatbots, academic research assistants, knowledge search tools, and customer service AI.
It enables models to answer domain-specific questions such as university procedures, medical guidelines, or technical documentation.
With embeddings and vector search, we can find semantically similar text even if exact wording differs.
Therefore, chunking, embedding, storage, and cosine similarity are essential building blocks for a working RAG pipeline.

When writing queries for RAG systems, it is important to:
- Be clear and concise
- Use domain-specific keywords
- Include context when possible
- Avoid vague pronouns
Effective query design improves retrieval quality and reduces the chance of irrelevant results.

The pipeline often involve:
1. Preprocessing datasets
2. Chunking text into meaningful pieces
3. Creating embeddings for each chunk
4. Storing embeddings in a vector database
5. Computing similarity between query and chunks
6. Retrieving top-K chunks for LLM input

RAG reduces hallucination by grounding LLM responses in retrieved context.
Using retrieval-augmented generation, language models are less likely to hallucinate because they base answers on real text.
"""

    print("Step 1: Dataset loaded successfully.\n")

except Exception as e:
    print("Step 1 failed:", e)
    


Step 1: Dataset loaded successfully.



## Step 2: Chunk the text

In [26]:
try:
    # Split into lines, remove empty
    lines = [line.strip() for line in dataset_text.splitlines() if line.strip()]
    
    # Merge 2â€“3 sentences per chunk, keep numbered lists intact
    chunks = []
    current_chunk = ""

    for line in lines:
        # Handle numbered list items as separate chunks
        if line.strip().startswith(tuple(f"{i}." for i in range(1, 20))):
            if current_chunk:
                chunks.append(current_chunk.strip())
                current_chunk = ""
            chunks.append(line.strip())
        else:
            # Split line into sentences
            sentences = re.split(r'(?<=[.!?]) +', line)
            for s in sentences:
                if current_chunk:
                    current_chunk += " " + s
                else:
                    current_chunk = s
                if current_chunk.count('.') >= 2:  # 2 sentences per chunk
                    chunks.append(current_chunk.strip())
                    current_chunk = ""
    # Append leftover
    if current_chunk:
        chunks.append(current_chunk.strip())

    print(f"Step 2: Created {len(chunks)} meaningful chunks.\n")
    for i, c in enumerate(chunks):
        print(f"Chunk {i+1}:\n{c}\n")

except Exception as e:
    print("Step 2 failed:", e)
    

Step 2: Created 13 meaningful chunks.

Chunk 1:
Retrieval-Augmented Generation (RAG) significantly improves the accuracy and reliability of language models by grounding their answers in external knowledge sources. Traditional language models often hallucinate or confidently produce incorrect information because they cannot verify facts or access new knowledge.

Chunk 2:
RAG solves this by connecting retrieval systems with generative models. First, relevant documents are retrieved using similarity search techniques.

Chunk 3:
Then, the retrieved text is inserted into the prompt, allowing the model to generate answers based on real context instead of guessing. This approach is used in systems like support chatbots, academic research assistants, knowledge search tools, and customer service AI.

Chunk 4:
It enables models to answer domain-specific questions such as university procedures, medical guidelines, or technical documentation. With embeddings and vector search, we can find semantic

## Step 3: Generate embeddings (requires sentence-transformers package)

In [27]:
try:
    model = SentenceTransformer("all-MiniLM-L6-v2")
    print("Step 2: Model loaded successfully.")
except Exception as e:
    print("Step 2 failed:", e)
    raise

try:
    chunk_embeddings = model.encode(chunks, normalize_embeddings=True)
    chunk_embeddings = np.array(chunk_embeddings).astype("float32")
    print("Step 3: Embedded chunks successfully.")
except Exception as e:
    print("Embedding step failed:", e)
    raise


Step 2: Model loaded successfully.
Step 3: Embedded chunks successfully.


## Step 4: FIASS Index

In [28]:
dimension = chunk_embeddings.shape[1]
index = faiss.IndexFlatIP(dimension)

try:
    index.add(chunk_embeddings)
    print("Step 4: FAISS index created and populated successfully.")
except Exception as e:
    print("FAISS insertion failed:", e)
    raise


Step 4: FAISS index created and populated successfully.


## Step 5: Query

## Step 6: Text retrived

In [29]:
top_k = 3
threshold = 0.2  # keep only relevant results

print("\nType your queries below. Type 'exit' to stop.\n")

while True:
    query = input("Query: ").strip()
    
    if query.lower() == "exit":
        print("Stopping query loop.")
        break
    
    if not query:
        print("Empty query. Try again.")
        continue
    
    query_embedding = model.encode([query], normalize_embeddings=True)
    query_embedding = np.array(query_embedding).astype("float32")

    similarities, indices = index.search(query_embedding, top_k)

    print("\nRetrieved Top-K Chunks:\n")
    
    found_any = False
    for score, idx in zip(similarities[0], indices[0]):
        if score >= threshold:
            found_any = True
            print(f"Similarity={score:.4f}")
            print("Chunk:", chunks[idx], "\n")

    if not found_any:
        print("No relevant chunks found above similarity threshold.\n")



Type your queries below. Type 'exit' to stop.


Retrieved Top-K Chunks:

No relevant chunks found above similarity threshold.


Retrieved Top-K Chunks:

Similarity=0.4678
Chunk: Therefore, chunking, embedding, storage, and cosine similarity are essential building blocks for a working RAG pipeline. When writing queries for RAG systems, it is important to: - Be clear and concise - Use domain-specific keywords - Include context when possible - Avoid vague pronouns Effective query design improves retrieval quality and reduces the chance of irrelevant results. 

Similarity=0.4184
Chunk: RAG reduces hallucination by grounding LLM responses in retrieved context. Using retrieval-augmented generation, language models are less likely to hallucinate because they base answers on real text. 

Similarity=0.3504
Chunk: RAG solves this by connecting retrieval systems with generative models. First, relevant documents are retrieved using similarity search techniques. 

Stopping query loop.
