# From Guessing to Grounded: Comparing a Base LLM with RAG

In this notebook, you’ll explore how adding a retrieval step can transform an AI model’s answers.

You will:
1. Ask a lightweight, open-source LLM a question with no extra context (the **base** approach).
2. Give the same model access to a small local knowledge base using **Retrieval-Augmented Generation (RAG)**.
3. Compare how each approach performs — side-by-side — on the same query.

By the end, you’ll see how even a small, fast model can deliver more accurate and relevant results when grounded in the right information.


## Environment setup

### Subtask:
Install necessary libraries like `transformers`, `torch`, `faiss-cpu`, and `datasets`.


**Reasoning**:
Install the required libraries using pip in a code cell.



In [1]:
!pip install transformers torch faiss-cpu datasets

Collecting faiss-cpu
  Downloading faiss_cpu-1.12.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylin

## Load base llm

### Subtask:
Load a lightweight open-source LLM from Hugging Face.


**Reasoning**:
Import the necessary classes and load a lightweight LLM and its tokenizer from Hugging Face.



In [29]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Keep a small causal model as requested
model_name = "gpt2"  # or "distilgpt2" if you prefer; both are non-instruction-tuned causal LMs

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()
for p in model.parameters():
    p.requires_grad_(False)

# Ensure we have a pad token (causal models often don't)
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({"pad_token": tokenizer.eos_token})
    model.resize_token_embeddings(len(tokenizer))

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

print(f"Model '{model_name}' loaded on {device}. pad_token_id={tokenizer.pad_token_id}")


Model 'gpt2' loaded on cpu. pad_token_id=50256


## Create knowledge base

### Subtask:
Define a small local knowledge base as a list of strings on a chosen topic.


**Reasoning**:
Define and print the knowledge base as a list of strings according to the instructions.



In [43]:
# Crystalline vs. Amorphous Materials (keep only relevant facts)
knowledge_base = [
    "Crystalline materials have a long-range ordered atomic lattice.",
    "Amorphous materials lack long-range order; atoms are arranged randomly.",
    "Crystalline materials typically have sharp, well-defined melting points.",
    "Amorphous materials soften gradually over a range of temperatures.",
    "Examples of crystalline materials include many metals, salts, and ice.",
    "Examples of amorphous materials include glass, rubber, and many plastics."
]

print("Knowledge Base on Crystalline vs. Amorphous Materials:")
for i, line in enumerate(knowledge_base, 1):
    print(f"[{i}] {line}")

Knowledge Base on Crystalline vs. Amorphous Materials:
[1] Crystalline materials have a long-range ordered atomic lattice.
[2] Amorphous materials lack long-range order; atoms are arranged randomly.
[3] Crystalline materials typically have sharp, well-defined melting points.
[4] Amorphous materials soften gradually over a range of temperatures.
[5] Examples of crystalline materials include many metals, salts, and ice.
[6] Examples of amorphous materials include glass, rubber, and many plastics.


## Vectorize knowledge base


Use a pre-trained sentence transformer model to create embeddings for the knowledge base documents and build a FAISS index for efficient retrieval.


**Reasoning**:
Import the necessary libraries for creating embeddings and building the FAISS index, load a sentence transformer model, generate embeddings for the knowledge base documents, determine the embedding dimensionality, create a FAISS index, add the embeddings to the index, and print a confirmation.



In [44]:
from sentence_transformers import SentenceTransformer
from faiss import IndexFlatL2
import numpy as np

# 2. Load a pre-trained sentence transformer model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

# 3. Generate embeddings for each document
document_embeddings = embedding_model.encode(knowledge_base)

# 4. Determine the dimensionality of the embeddings
embedding_dim = document_embeddings.shape[1]
print(f"Embedding dimensionality: {embedding_dim}")

# 5. Create a FAISS index
# Use IndexFlatL2 for L2 distance (Euclidean distance)
index = IndexFlatL2(embedding_dim)

# 6. Add the generated embeddings to the FAISS index
index.add(document_embeddings)

# 7. Print a confirmation
print(f"Embeddings created and added to FAISS index. Total vectors in index: {index.ntotal}")

Embedding dimensionality: 384
Embeddings created and added to FAISS index. Total vectors in index: 6


## Implement retrieval


Create a function to perform a similarity search on the FAISS index given a query and retrieve the most relevant documents.


**Reasoning**:
Define the `retrieve_documents` function to perform the similarity search using the FAISS index.



In [45]:
def retrieve_documents(query: str, k: int):
    """
    Performs a similarity search on the FAISS index to retrieve relevant documents.

    Args:
        query: The query string.
        k: The number of documents to retrieve.

    Returns:
        A list of the retrieved documents.
    """
    # Generate embedding for the query
    query_embedding = embedding_model.encode([query])

    # Perform similarity search
    distances, indices = index.search(query_embedding, k)

    # Retrieve the documents based on indices
    retrieved_docs = [knowledge_base[i] for i in indices[0]]

    return retrieved_docs


## Implement rag

Create a function that combines the retrieved documents with the original query to create a prompt for the LLM.


**Reasoning**:
Define the function `generate_rag_prompt` that takes the query and retrieved documents, formats them into a prompt string, and returns the prompt.



In [46]:
def generate_rag_prompt(query: str, retrieved_documents: list[str]) -> str:
    context = "\n".join(f"- {doc}" for doc in retrieved_documents)
    return (
        "You are a concise scientific assistant.\n"
        "Answer ONLY using facts from the context. "
        "Do NOT invent numbers. If the context is insufficient, reply: I don't know.\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {query}\n"
        "Answer:"
    )


In [47]:
def ensure_core_facts(docs: list[str], kb: list[str]) -> list[str]:
    need = {
        "crys": "long-range ordered atomic lattice",
        "amor": "lack long-range order"
    }
    have_crys = any("long-range ordered" in d for d in docs)
    have_amor = any("lack long-range order" in d for d in docs)

    # If missing, pull from KB
    if not have_crys:
        for d in kb:
            if "long-range ordered atomic lattice" in d and d not in docs:
                docs.append(d); break
    if not have_amor:
        for d in kb:
            if "lack long-range order" in d and d not in docs:
                docs.append(d); break
    return docs


## Compare outputs


Pose a question relevant to the knowledge base and generate responses using both the base LLM and the RAG-enhanced LLM. Print and compare the outputs.


**Reasoning**:
Define the question, generate responses using the base LLM and the RAG-enhanced LLM, and then print and compare the results.



In [49]:
import torch

def generate_text(prompt: str, max_new_tokens: int = 48) -> str:
    inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).to(device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=False,            # deterministic
            num_beams=4,                # focused
            no_repeat_ngram_size=3,     # curb loops
            repetition_penalty=1.15,
            early_stopping=True,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Keep only the completion after the last "Answer:"
    if "Answer:" in text:
        text = text.split("Answer:")[-1].strip()
    # Trim to first sentence for cleanliness
    if "." in text:
        text = text.split(".")[0].strip() + "."
    return text

question = "What is the main difference between a crystalline and an amorphous material?"
print(f"Original Question: {question}\n")

# --- Base (no context) ---
base_prompt = (
    "You are a concise scientific assistant.\n"
    f"Question: {question}\n"
    "Answer:"
)
print("--- Base LLM Response ---")
base_response = generate_text(base_prompt, max_new_tokens=40)
print(base_response)
print("-" * 25 + "\n")

# --- Retrieve & ensure core facts ---
k = 2
retrieved_docs = retrieve_documents(question, k)
retrieved_docs = ensure_core_facts(retrieved_docs, knowledge_base)

print(f"--- Retrieved Documents (k~{k}, after ensuring core facts) ---")
for i, doc in enumerate(retrieved_docs, 1):
    print(f"[{i}] {doc}")
print("-" * 25 + "\n")

# --- RAG prompt & answer ---
rag_prompt = generate_rag_prompt(question, retrieved_docs)
print("--- Generated RAG Prompt ---")
print(rag_prompt)
print("-" * 25 + "\n")

print("--- RAG-enhanced LLM Response ---")
rag_response = generate_text(rag_prompt, max_new_tokens=48)
print(rag_response)
print("-" * 25 + "\n")

print("--- Comparison ---")
print("Base: may be generic or partly wrong (no grounding).")
print("RAG: should contrast 'long-range ordered lattice' (crystalline) vs 'no long-range order' (amorphous), using only context facts.")
print("-" * 16)


Original Question: What is the main difference between a crystalline and an amorphous material?

--- Base LLM Response ---
A crystalline material is a material that is composed of two or more atoms.
-------------------------

--- Retrieved Documents (k~2, after ensuring core facts) ---
[1] Examples of amorphous materials include glass, rubber, and many plastics.
[2] Amorphous materials soften gradually over a range of temperatures.
[3] Crystalline materials have a long-range ordered atomic lattice.
[4] Amorphous materials lack long-range order; atoms are arranged randomly.
-------------------------

--- Generated RAG Prompt ---
You are a concise scientific assistant.
Answer ONLY using facts from the context. Do NOT invent numbers. If the context is insufficient, reply: I don't know.

Context:
- Examples of amorphous materials include glass, rubber, and many plastics.
- Amorphous materials soften gradually over a range of temperatures.
- Crystalline materials have a long-range ordered a