# From LLMs to the Breakdown of RAG

## What is a Large Language Model (LLM)?

A **Large Language Model (LLM)** is a type of artificial intelligence that understands and generates human language.

It learns patterns from massive amounts of text (books, articles, code, etc.) and uses those patterns to predict the **next most likely word** in a sequence.

---

## How an LLM Works

1. You give the model some text (a **prompt**)
2. The model processes the text using learned patterns
3. It predicts the next word, then the next, and so on
4. The result is a coherent **response**

### Key Terms

- **Prompt**: Input text given to the model
- **Completion**: Text generated by the model
- **Tokens**: Pieces of text (words or word fragments)
- **Context Window**: How much text the model can remember at once


<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/0*O5X15ycTtapwnzgc.png" alt="LLM timeline" width="800">


Set API Key

In [None]:
# 1. Install the Groq client library
!pip install -q groq

# 2. Import necessary libraries
import os
from google.colab import userdata
from groq import Groq

try:
    api_key = userdata.get('GROQ_API_KEY')
except:
    api_key = "gsk_..."

client = Groq(api_key=api_key)

completion = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[
        {
            "role": "user",
            "content": "What is machine learning?"
        }
    ],
    temperature=0.5,
    max_tokens=1024,
    top_p=1,
    stream=False,
    stop=None,
)

# 5. Print the result
print(completion.choices[0].message.content)

## RAG System


![Alt text](https://github.com/suthekshan/Agentic-Ai-Foundations/blob/main/04_LLM_RAG/images/rag1.png?raw=1)



- PDF document ingestion
- Text chunking
- Hugging Face embeddings
- ChromaDB vector store
- LLM for Domain Specific grounded question answering


<img src="https://github.com/suthekshan/Agentic-Ai-Foundations/blob/main/04_LLM_RAG/images/RAG.jpg?raw=1" alt="LLM timeline" width="800">



In [None]:
!pip install -q \
  langchain \
  langchain-core \
  langchain-community \
  langchain-chroma \
  langchain-text-splitters \
  langchain-groq \
  langchain-huggingface \
  sentence-transformers \
  pypdf \
  python-dotenv


## üîê Environment Setup

Create a `.env` file  with:

        GROQ_API_KEY=your_groq_api_key_here


In [1]:
from dotenv import load_dotenv
load_dotenv()


False

## Data ingestion

<img src="https://github.com/suthekshan/Agentic-Ai-Foundations/blob/main/04_LLM_RAG/images/data_ingestion.png?raw=1" alt="LLM timeline" width="800">


## üìÑ Load PDF Document

We load the PDF using `PyPDFLoader`.  
Each page becomes a `Document` object with metadata.


In [None]:
from google.colab import drive
drive.mount('/content/drive')


In [None]:
from langchain_community.document_loaders import PyPDFLoader
pdf_path = "/MyDrive/Agentic-AIOT-Workshop/pdf1.pdf"   # <- update if needed
pdf_path = "pdf1.pdf"  # change if needed
loader = PyPDFLoader(pdf_path)
documents = loader.load()

len(documents)


271

## ‚úÇÔ∏è Text Chunking

The document is split into overlapping chunks to improve retrieval accuracy.


In [None]:
from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

texts = text_splitter.split_documents(documents)
len(texts)


265

## üß† Embedding Model (Hugging Face)



The embeddings are stored in ChromaDB.


üí° Always use the same embedding model for a given Chroma directory.

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)


## üíæ ChromaDb Vector Store

ChromaDB persists vectors on disk so embeddings are reused across runs.


In [None]:
from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="pdf_collection",
    embedding_function=embeddings,
    persist_directory="./chroma_pdf_db"
)

# Add documents only once
if vector_store._collection.count() == 0:
    vector_store.add_documents(texts)

## üîç Retriever

The retriever fetches the most relevant chunks for a given query.


![Alt text](https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi6gsuxui6aagm90wfnmr.png)

In [None]:
retriever = vector_store.as_retriever(search_kwargs={"k": 3})


##   `k` in Retriever

- `k` = number of most similar document chunks retrieved
- Larger `k` ‚Üí more context, but more noise
- Smaller `k` ‚Üí precise, but may miss info

changing `k` affects the number of retrieved documents:

In [None]:
# Test query
test_query = "What is the main topic of this document?"

# Try different k values
k_values = [1, 3, 5]

for k in k_values:
    # Create retriever with specific k
    retriever_k = vector_store.as_retriever(search_kwargs={"k": k})

    # Retrieve documents
    retrieved_docs = retriever_k.invoke(test_query)

    print(f" Retrieved with k={k}:")
    print(f"Number of documents retrieved: {len(retrieved_docs)}")

    for i, doc in enumerate(retrieved_docs, 1):
        print(f"\n--- Document {i} ---")
        print(f"Content preview: {doc.page_content[:150]}...")


 Retrieved with k=1:
Number of documents retrieved: 1

--- Document 1 ---
Content preview: History
Topic Civilization & Culture
Subtopic
A History of India
Professor Michael H. Fisher
Oberlin College
Course Guidebook...
 Retrieved with k=3:
Number of documents retrieved: 3

--- Document 1 ---
Content preview: History
Topic Civilization & Culture
Subtopic
A History of India
Professor Michael H. Fisher
Oberlin College
Course Guidebook...

--- Document 2 ---
Content preview: leCTure 29‚ÄînaTionalisTs aMbedkar, bose, and JinnaH 
247
SuggeSted Reading
Bose, His Majesty‚Äôs Opponent.
Jaffrelot, Dr. Ambedkar and Untouchability.
Ja...

--- Document 3 ---
Content preview: v
Table of ConTenTs 
LECTURE 24
The British East India Company ............................... 194
LECTURE 25
The Issues and Events of 1857 .............
 Retrieved with k=5:
Number of documents retrieved: 5

--- Document 1 ---
Content preview: History
Topic Civilization & Culture
Subtopic
A History of India
Professor Michael 

## ‚ö° Groq LLM

We use Groq‚Äôs hosted LLM.  
Model runs remotely and is not stored locally.


In [None]:
from langchain_groq import ChatGroq

llm = ChatGroq(
    model="llama-3.1-8b-instant",
    temperature=0
)


## üìù Prompt Template

The LLM is instructed to answer strictly from retrieved context.


![Alt text](https://xaviercollantes.dev/_next/image?url=%2Fassets%2Fimages%2Frag-langchain%2Fdoge.webp&w=3840&q=75)

In [None]:
from langchain_core.prompts import PromptTemplate

prompt_template = """Use ONLY the context below to answer the question.
If the answer is not present in the context, say:
"I do not know based on the provided document."

Context:
{context}

Question:
{query}

Answer:
"""

prompt = PromptTemplate.from_template(prompt_template)


## üîó RAG Chain

The pipeline:
Retriever ‚Üí Prompt ‚Üí LLM ‚Üí Output


In [None]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

def format_docs(docs):
    return "\n\n".join(
        f"Page {doc.metadata.get('page', 'N/A')}:\n{doc.page_content}"
        for doc in docs
    )

rag_chain = (
    {"context": retriever | format_docs, "query": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)


## ‚ùì Ask Questions


In [None]:
rag_chain.invoke(
    "What does the document say about Ukraine?"
)


'I do not know based on the provided document.'

## üö´ Out-of-Context Question


In [None]:
rag_chain.invoke(
    "What does the document say about Ukraine?"
)


'I do not know based on the provided document.'