## Simple RAG Demo: Contextual Summarization from a PDF

This notebook demonstrates how RAG can be used to provide context for summarization tasks. Instead of asking the LLM to summarize an entire document (which might be too large or lead to generic summaries), we first retrieve relevant sections based on a topic/query and then ask the LLM to summarize that specific context.

We will:
1. Load a PDF document and create a vector store (similar to the QA demo).
2. Define a topic or query for summarization.
3. Retrieve relevant chunks from the PDF related to this topic.
4. Create a custom prompt to ask the LLM to summarize the retrieved context.
5. Use the Groq API with a Llama3 model for summarization.

### 1. Setup: Install Libraries and Import Modules

In [1]:
!pip install -q langchain langchain-groq langchain-community pypdf faiss-cpu pypdf sentence-transformers

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.6/2.5 MB[0m [31m15.9 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m1.3/2.5 MB[0m [31m18.9 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.5/2.5 MB[0m [31m24.2 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m19.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.3/302.3 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m19.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.5/127.5 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━

In [2]:
import os
import getpass
from langchain_groq import ChatGroq
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain

### 2. Configure Groq API Key

In [3]:
os.environ["GROQ_API_KEY"] = getpass.getpass("Enter your Groq API Key: ")

Enter your Groq API Key: ··········


### 3. Prepare PDF Document

Ensure `cs229.stanford.edu_main_notes.pdf` is in the `pdfs` folder.

In [4]:
os.makedirs("pdfs", exist_ok=True)

# Step 3: Download the PDF using requests
import requests

url = "https://cs229.stanford.edu/main_notes.pdf"
pdf_path = "pdfs/main_notes.pdf"

response = requests.get(url)
with open(pdf_path, "wb") as f:
    f.write(response.content)

print(f"PDF downloaded to: {pdf_path}")

PDF downloaded to: pdfs/main_notes.pdf


### 4. Load, Chunk, and Create Vector Store (Same as QA Demo)

In [5]:
chunks = []
vector_store = None

if os.path.exists(pdf_path):
    loader = PyPDFLoader(pdf_path)
    documents = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
    chunks = text_splitter.split_documents(documents)
    print(f"Split into {len(chunks)} chunks.")

    if chunks:
        embedding_model_name = "sentence-transformers/all-MiniLM-L6-v2"
        embeddings = HuggingFaceEmbeddings(model_name=embedding_model_name)
        print("Creating FAISS vector store...")
        vector_store = FAISS.from_documents(chunks, embeddings)
        print("FAISS vector store created.")
else:
    print("PDF not found, skipping processing.")

Split into 514 chunks.


  embeddings = HuggingFaceEmbeddings(model_name=embedding_model_name)
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Creating FAISS vector store...
FAISS vector store created.


### 5. Initialize the LLM (Groq)

In [6]:
llm = ChatGroq(model_name="llama3-8b-8192", temperature=0.2)

### 6. Custom RAG for Summarization

In [7]:
def summarize_topic(topic_query, vector_store, llm_model, num_chunks=3):
    if not vector_store:
        return "Vector store not available."

    print(f"Retrieving relevant context for: '{topic_query}'")
    retriever = vector_store.as_retriever(search_kwargs={'k': num_chunks})
    relevant_docs = retriever.get_relevant_documents(topic_query)

    if not relevant_docs:
        return f"No relevant context found for '{topic_query}'."

    context_text = "\n\n---\n\n".join([doc.page_content for doc in relevant_docs])
    # print(f"\nRetrieved Context:\n{context_text[:1000]}...") # Print first 1000 chars of context

    summarization_prompt_template = """
    Based on the following context, please provide a concise summary about '{user_query}'.
    Focus on the key points and main ideas presented in the text regarding the query.

    Context:
    {provided_context}

    Concise Summary about '{user_query}':
    """

    prompt = PromptTemplate(
        input_variables=["user_query", "provided_context"],
        template=summarization_prompt_template
    )

    summarization_chain = LLMChain(llm=llm_model, prompt=prompt)

    print("\nGenerating summary...")
    summary = summarization_chain.invoke({"user_query": topic_query, "provided_context": context_text})
    return summary['text']

### 7. Generate Summaries

In [8]:
if vector_store:
    topic1 = "Generative Learning Algorithms"
    summary1 = summarize_topic(topic1, vector_store, llm)
    print(f"\n--- Summary for: {topic1} ---")
    print(summary1)

    topic2 = "The K-means clustering algorithm"
    summary2 = summarize_topic(topic2, vector_store, llm, num_chunks=4)
    print(f"\n--- Summary for: {topic2} ---")
    print(summary2)
else:
    print("Cannot generate summaries as vector store is not available.")

Retrieving relevant context for: 'Generative Learning Algorithms'

Generating summary...


  relevant_docs = retriever.get_relevant_documents(topic_query)
  summarization_chain = LLMChain(llm=llm_model, prompt=prompt)



--- Summary for: Generative Learning Algorithms ---
Here is a concise summary of "Generative Learning Algorithms":

Generative learning algorithms model the joint distribution of input data (x) and output labels (y), i.e., p(x|y) and p(y). Unlike traditional learning algorithms that focus on modeling p(y|x), generative algorithms aim to learn the underlying distribution of data and generate new samples that resemble the training data. This approach is useful for tasks such as modeling the distribution of dogs' and elephants' features, and generating new images or text that are similar to existing ones. Generative algorithms can be complex models parameterized by neural networks, such as variational auto-encoders, which extend traditional EM algorithms to high-dimensional continuous latent variables.
Retrieving relevant context for: 'The K-means clustering algorithm'

Generating summary...

--- Summary for: The K-means clustering algorithm ---
Here is a concise summary of the K-means c

### 8. Conclusion

This notebook showed how to use RAG for contextual summarization. By first retrieving relevant information and then prompting an LLM to summarize that specific context, we can generate more focused and accurate summaries, especially for large documents or broad topics. This technique is useful for extracting key information quickly.