## Simple RAG Demo: Question Answering with a PDF

This notebook demonstrates a basic Retrieval Augmented Generation (RAG) pipeline. We will:
1. Load a PDF document.
2. Split the document into manageable chunks.
3. Create vector embeddings for these chunks using a sentence transformer model.
4. Store these embeddings in a FAISS vector store for efficient similarity search.
5. Use the Groq API with a Llama3 model as the Large Language Model (LLM).
6. Create a `RetrievalQA` chain from Langchain to ask questions about the PDF content.

### 1. Setup: Install Libraries and Import Modules

In [1]:
!pip install -q langchain langchain-community pypdf langchain-groq faiss-cpu pypdf sentence-transformers

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.3/302.3 kB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.5/127.5 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m33.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m27.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
import os
import getpass
from langchain_groq import ChatGroq
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

### 2. Configure Groq API Key

You'll need a Groq API key. You can get one for free from [https://console.groq.com/](https://console.groq.com/).

In [3]:
os.environ["GROQ_API_KEY"] = getpass.getpass("Enter your Groq API Key: ")

Enter your Groq API Key: ··········


### 3. Prepare PDF Document

Please create a folder named `pdfs` in your Colab environment (in the `/content/` directory) and upload the `cs229.stanford.edu_main_notes.pdf` file into it.

You can download the PDF from: [https://cs229.stanford.edu/main_notes.pdf](https://cs229.stanford.edu/main_notes.pdf)

In [4]:
os.makedirs("pdfs", exist_ok=True)

# Step 3: Download the PDF using requests
import requests

url = "https://cs229.stanford.edu/main_notes.pdf"
pdf_path = "pdfs/main_notes.pdf"

response = requests.get(url)
with open(pdf_path, "wb") as f:
    f.write(response.content)

print(f"PDF downloaded to: {pdf_path}")

PDF downloaded to: pdfs/main_notes.pdf


### 4. Load and Chunk the PDF

In [5]:
if os.path.exists(pdf_path):
    loader = PyPDFLoader(pdf_path)
    documents = loader.load()
    print(f"Loaded {len(documents)} pages from the PDF.")

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
    chunks = text_splitter.split_documents(documents)
    print(f"Split the document into {len(chunks)} chunks.")
else:
    chunks = []
    print("Skipping chunking as PDF is not available.")

Loaded 227 pages from the PDF.
Split the document into 514 chunks.


### 5. Create Embeddings and Vector Store

We'll use HuggingFace embeddings (a popular open-source model) and FAISS for local vector storage.

In [6]:
if chunks:
    # Using a common, effective embedding model
    embedding_model_name = "sentence-transformers/all-MiniLM-L6-v2"
    embeddings = HuggingFaceEmbeddings(model_name=embedding_model_name)

    # Create FAISS vector store from chunks
    print("Creating FAISS vector store... This might take a few minutes for a large PDF.")
    vector_store = FAISS.from_documents(chunks, embeddings)
    print("FAISS vector store created.")

    # Save the vector store locally (optional, but good for reuse)
    # vector_store.save_local("faiss_index_cs229")
    # To load: vector_store = FAISS.load_local("faiss_index_cs229", embeddings, allow_dangerous_deserialization=True)
else:
    vector_store = None
    print("Skipping vector store creation as there are no chunks.")

  embeddings = HuggingFaceEmbeddings(model_name=embedding_model_name)
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Creating FAISS vector store... This might take a few minutes for a large PDF.
FAISS vector store created.


### 6. Initialize the LLM (Groq)

In [7]:
llm = ChatGroq(model_name="llama3-8b-8192", temperature=0.1)

### 7. Create and Run the RAG Chain (RetrievalQA)

In [10]:
if vector_store:
    retriever = vector_store.as_retriever(search_kwargs={'k': 3}) # Retrieve top 3 chunks

    # You can customize the prompt if needed
    prompt_template = """
    Use the following pieces of context to answer the question at the end.
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    Keep the answer concise and relevant to the context provided.

    Context: {context}

    Question: {question}

    Helpful Answer:"""
    prompt_template2 = """
    Use the following pieces of context along with your inherent knwoledge to answer the question at the end.
    If the answer is not hound in the provided chunks, inform the same.
    Keep the answer to an engineering grad student who is intetreseted to learn more and provide follow up questions.

    Context: {context}

    Question: {question}

    Helpful Answer:"""

    QA_CHAIN_PROMPT = PromptTemplate.from_template(prompt_template2)

    qa_chain = RetrievalQA.from_chain_type(
        llm,
        retriever=retriever,
        chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
        return_source_documents=True # Set to True to see which chunks were retrieved
    )

    print("RAG chain created. You can now ask questions.")
else:
    qa_chain = None
    print("RAG chain not created as vector store is unavailable.")

RAG chain created. You can now ask questions.


### 8. Ask Questions!

In [11]:
if qa_chain:
    question1 = "What is logistic regression?"
    print(f"\nQuestion: {question1}")
    result1 = qa_chain.invoke({"query": question1})
    print(f"Answer: {result1['result']}")
    # print(f"Source Documents: {result1['source_documents']}") # Uncomment to see sources

    question2 = "Explain the concept of a support vector machine."
    print(f"\nQuestion: {question2}")
    result2 = qa_chain.invoke({"query": question2})
    print(f"Answer: {result2['result']}")

    question3 = "What are the main topics covered in the chapter on unsupervised learning?"
    print(f"\nQuestion: {question3}")
    result3 = qa_chain.invoke({"query": question3})
    print(f"Answer: {result3['result']}")
else:
    print("Cannot ask questions as RAG chain is not available.")


Question: What is logistic regression?
Answer: Logistic regression is a type of classification model that is used to predict the probability of an event occurring based on one or more predictor variables. In the context of the provided text, logistic regression is used to predict the probability of a binary outcome (0 or 1) given a set of input features (x).

In the text, it is mentioned that the logistic regression model can be written as:

g(z) = (1 + e^(-z))^(-1)

where g(z) is the predicted probability, z is a linear combination of the input features and the parameters θ, and e is the base of the natural logarithm.

The goal of logistic regression is to find the values of the parameters θ that maximize the likelihood of observing the training data. This is done by maximizing the log likelihood function ℓ(θ), which is given by:

ℓ(θ) = ∑[y(i) \* log(g(z(i))) + (1 - y(i)) \* log(1 - g(z(i)))]


where y(i) is the true label for the i-th training example, and g(z(i)) is the predicted 

### 9. Conclusion

This notebook demonstrated a simple RAG pipeline. We loaded a PDF, chunked it, created embeddings, stored them in FAISS, and used a Groq LLM with a `RetrievalQA` chain to answer questions based on the document's content. This approach allows LLMs to answer questions using information from specific documents, making their responses more accurate and contextually relevant.