# Beginner-Friendly RAG Document Question-Answering System

This notebook creates a Retrieval-Augmented Generation (RAG) system to answer questions based on documents (PDFs or text files). It’s designed to run on a CPU (no GPU required) and uses lightweight models to fit within 32GB RAM.

## What You'll Learn
- How to set up a Python environment for machine learning.
- How to load and process documents (PDFs or text files).
- How to create a vector store for document retrieval.
- How to use a language model to answer questions based on documents.

## Hardware
- Your system: Intel i7-12700, 32GB RAM, no NVIDIA GPU (CPU-only).

## Prerequisites
- Basic Python knowledge.
- A few sample PDF or text files to use as input.

Let’s get started!

## Step 1: Install Required Libraries

Run the cell below to install the necessary Python packages. This may take a few minutes.

**Note**: If you’re running this on your local machine, you may want to create a virtual environment first:
```bash
python -m venv rag_env
source rag_env/bin/activate  # On Windows: rag_env\Scripts\activate
```
Then run the pip install command below.

In [None]:
# !pip install torch transformers sentence-transformers langchain langchain-huggingface chromadb pypdf

## Step 2: Import Libraries and Set Up the Environment

We’ll import the required libraries and ensure the system uses your CPU. Since you don’t have an NVIDIA GPU, we’ll configure the models to run on CPU with `torch.float32` to avoid memory issues.

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain.document_loaders import PyPDFLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_huggingface import HuggingFacePipeline, HuggingFaceEmbeddings
import streamlit as st  # Used for caching, but we’ll simulate it here

# Set device to CPU (no GPU available)
device = "cpu"
print(f"Using device: {device}")

# Define model names
MODEL_NAME = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"  # Lightweight model for CPU
EMBEDDING_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"  # Lightweight embedding model

## Step 3: Load the Language Model

We’ll load the TinyLlama model, which is small enough to run on your system. The `@st.cache_resource` decorator is used in Streamlit apps to cache the model, but in a Jupyter Notebook, we’ll load it directly.

**What’s happening here?**
- **Tokenizer**: Converts text to numbers the model understands.
- **Model**: The TinyLlama language model generates text.
- **Pipeline**: A tool to generate text easily with controlled parameters (e.g., max tokens, temperature).

**Note**: This step may take a few minutes to download the model (about 2GB).

In [None]:
def load_llm():
    # Load tokenizer and model
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        torch_dtype=torch.float32,  # Use float32 for CPU compatibility
        device_map="cpu",  # Explicitly map to CPU
        trust_remote_code=True
    )
    
    # Create a text generation pipeline
    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=256,  # Limit response length
        temperature=0.7,  # Controls randomness
        top_p=0.95,  # Controls diversity
        repetition_penalty=1.15,  # Avoids repetitive text 
        do_sample=True, 
        top_k=50,
    )
    
    # Wrap pipeline in LangChain
    hf_pipe = HuggingFacePipeline(pipeline=pipe)
    return hf_pipe

# Load the model (run this once)
llm = load_llm()
print("Language model loaded successfully!")

## Step 4: Process Documents

This function loads and processes your documents (PDFs or text files). You’ll need to specify the file paths.

**What’s happening?**
- **Loaders**: `PyPDFLoader` for PDFs, `TextLoader` for text files.
- **Text Splitter**: Breaks documents into smaller chunks (1000 characters each) to make retrieval efficient.

**Action**: Replace `file_paths` with the paths to your PDF or text files (e.g., `["./document.pdf", "./notes.txt"]`).

In [None]:
def process_documents(file_paths):
    documents = []
    for file_path in file_paths:
        if file_path.endswith('.pdf'):
            loader = PyPDFLoader(file_path)
        elif file_path.endswith('.txt'):
            loader = TextLoader(file_path)
        else:
            print(f"Unsupported file type: {file_path}")
            continue
        
        documents.extend(loader.load())
    
    # Split documents into chunks
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len
    )
    
    chunks = text_splitter.split_documents(documents)
    print(f"Split {len(documents)} documents into {len(chunks)} chunks")
    return chunks

# Specify your document paths here
file_paths = ["./Junior Python Developer (AI-focused).pdf"]  # Replace with your file paths
chunks = process_documents(file_paths)

## Step 5: Create a Vector Store

We’ll convert document chunks into numerical embeddings and store them in a vector database (Chroma) for fast retrieval.

**What’s happening?**
- **Embeddings**: Convert text chunks into vectors using a lightweight model (`all-MiniLM-L6-v2`).
- **Vector Store**: Stores embeddings for similarity-based retrieval.

This step creates a `chroma_db` folder in your working directory to persist the vector store.

In [None]:
def create_vector_store(chunks):
    # Initialize embedding model
    embeddings = HuggingFaceEmbeddings(
        model_name=EMBEDDING_MODEL_NAME,
        model_kwargs={'device': device}
    )
    
    # Create vector store
    vector_store = Chroma.from_documents(
        documents=chunks,
        embedding=embeddings,
        persist_directory="./chroma_db"
    )
    
    return vector_store

# Create the vector store
vector_store = create_vector_store(chunks)
print("Vector store created successfully!")

## Step 6: Create the RAG Chain

This combines the language model and vector store to answer questions based on your documents.

**What’s happening?**
- **Retriever**: Finds the top 3 most relevant document chunks for a query.
- **QA Chain**: Uses the language model to generate answers based on retrieved chunks.

In [None]:
def create_rag_chain(vector_store):
    # Create retriever
    retriever = vector_store.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 3}
    )
    
    # Create QA chain
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True
    )
    
    return qa_chain

# Create the RAG chain
qa_chain = create_rag_chain(vector_store)
print("RAG chain created successfully!")

## Step 7: Ask Questions

Now you can ask questions about your documents! The system will retrieve relevant chunks and generate an answer.

**Example**: If your document is about space exploration, you could ask, "What is the purpose of the Mars Rover?"

Run the cell below and replace the `query` with your question.

In [None]:
# Ask a question
query = "summerize the document"  # Replace with your question
result = qa_chain({"query": query})

# Print the answer and source documents
print("Answer:", result["result"])
print("\nSource Documents:")
for doc in result["source_documents"]:
    print(f"- {doc.metadata['source']}: {doc.page_content[:200]}...")

## Step 8: What’s Next?

- **Experiment**: Try different questions or add more documents.
- **Optimize**: Adjust `chunk_size`, `chunk_overlap`, or `max_new_tokens` to improve results.
- **Learn More**: Check out the [LangChain documentation](https://python.langchain.com/docs/get_started/introduction) or [Hugging Face tutorials](https://huggingface.co/docs/transformers/index).

If you run into memory issues, try reducing `max_new_tokens` or using smaller document chunks.