<a href="https://colab.research.google.com/github/somendrew/LLMs/blob/main/RAG_From_Scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Set-Up


In [6]:
!pip install -q transformers sentence-transformers faiss-cpu langchain


## Create a file and add content

In [2]:
!touch my_knowledge.txt


In [4]:
%%bash
cat << EOF >> my_knowledge.txt
Company Policy Manual:
- WFH Policy: All employees are eligible for a hybrid WFH schedule. Employees must be in the office on Tuesdays, Wednesdays, and Thursdays. Mondays and Fridays are optional remote days.
- PTO Policy: Full-time employees receive 20 days of Paid Time Off (PTO) per year. PTO accrues monthly.
- Tech Stack: The official backend language is Python, and the official frontend framework is React. For mobile development, we use React Native.
EOF

## Chunking

In [12]:
!pip install -q langchain_text_splitters
import os
from langchain_text_splitters import RecursiveCharacterTextSplitter

with open('my_knowledge.txt') as f:
  knowledge_text = f.read()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 150,
    chunk_overlap = 20,
    length_function = len,
    separators=["\n\n", "\n", ".", " ", ""]
)

chunks = text_splitter.split_text(knowledge_text)

print(f'We have {len(chunks)} chunks')
for i, chunk in enumerate(chunks):
  print(f'Chunk {i+1}: {chunk}')

We have 5 chunks
Chunk 1: Company Policy Manual:
Chunk 2: - WFH Policy: All employees are eligible for a hybrid WFH schedule. Employees must be in the office on Tuesdays, Wednesdays, and Thursdays
Chunk 3: . Mondays and Fridays are optional remote days.
Chunk 4: - PTO Policy: Full-time employees receive 20 days of Paid Time Off (PTO) per year. PTO accrues monthly.
Chunk 5: - Tech Stack: The official backend language is Python, and the official frontend framework is React. For mobile development, we use React Native.


## Embeddings

In [None]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

chunk_embeddings = model.encode(chunks)

print(f"Shape of our embeddings: {chunk_embeddings.shape}")

## Vector Store with FAISS

In [18]:
import faiss
import numpy as np

d = chunk_embeddings.shape[1]

# create faiss index
index = faiss.IndexFlatL2(d)

#add vector embeddings to index
index.add(np.array(chunk_embeddings).astype('float32'))

print(f"FAISS index created with {index.ntotal} vectors.")



FAISS index created with 5 vectors.


## Retrieve, Augment, Generate

In [None]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch

# 1. Load the Model and Tokenizer (The "Brain")
model_name = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
generator_model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# ------------------- RAG PIPELINE FUNCTION -------------------------
def answer_question(query):
    # ______ 1. RETRIEVE ______
    query_embedding = model.encode([query]).astype('float32')
    k = 2
    distances, indices = index.search(query_embedding, k)

    retrieved_chunks = [chunks[i] for i in indices[0]]
    context = "\n\n".join(retrieved_chunks)

    # ______ 2. AUGMENT ______
    # T5 thrives on clear "Context:" and "Question:" headers
    prompt_template = f"""Answer the following question using only the provided context.
If the answer is not in the context, say "I don't have that information."

Context:
{context}

Question:
{query}

Answer:"""

    # ______ 3. GENERATE ______
    # Encode the prompt into numbers (tokens)
    inputs = tokenizer(prompt_template, return_tensors="pt", truncation=True, max_length=512)

    # Generate the response
    outputs = generator_model.generate(
        **inputs,
        max_new_tokens=100,
        do_sample=False  # Keep it factual, not creative
    )

    # Decode the numbers back into text
    answer = tokenizer.decode(outputs[0], skip_special_tokens=True)

    print(f"--- CONTEXT ---\n{context}\n")
    return answer


In [29]:
query_2 = "What days are the office days?"
print(f"Query: {query_2}")
print(f"Answer: {answer_question(query_2)}\n")

Query: What days are the office days?
--- CONTEXT ---
. Mondays and Fridays are optional remote days.

- WFH Policy: All employees are eligible for a hybrid WFH schedule. Employees must be in the office on Tuesdays, Wednesdays, and Thursdays

Answer: Mondays and Fridays



In [30]:
# --- Execution ---
query_1 = "What is the WFH policy?"
print(f"Query: {query_1}")
print(f"Answer: {answer_question(query_1)}\n")

Query: What is the WFH policy?
--- CONTEXT ---
- WFH Policy: All employees are eligible for a hybrid WFH schedule. Employees must be in the office on Tuesdays, Wednesdays, and Thursdays

Company Policy Manual:

Answer: All employees are eligible for a hybrid WFH schedule

