# RAG with Gemini + Pinecone + LangChain (Python)

> A tiny, beginner‑friendly notebook to learn Retrieval‑Augmented Generation (RAG) step by step — with very little code.

**What you’ll learn**  
1. What RAG is and why it’s useful  
2. How to index your docs in Pinecone using Gemini embeddings  
3. How to retrieve the most relevant chunks for a user’s question  
4. How to let Gemini read those chunks and answer — with sources

**Stack**  
- **LLM:** Google Gemini 
- **Vector DB:** Pinecone  
- **Framework:** LangChain  
- **Language:** Python

> You’ll only need two API keys: **Google Generative AI** and **Pinecone**.

```text
                ┌────────────────────┐
                │      Client        │
                │ (User Question)    │
                └────────┬───────────┘
                         │
                         ▼
                ┌──────────────────────┐
                │    Framework         │
                │ (Python + LangChain) │
                └────────┬─────────────┘
                         │
          (1) Semantic Search from Question Embedding
                         │
                         ▼
                ┌────────────────────┐
                │   Vector DB        │
                │   (Pinecone)       │
                │ Stores document    │
                │ embeddings         │
                └────────┬───────────┘
                         │
          (2) Retrieve Top Relevant Chunks
                         │
                         ▼
                ┌────────────────────┐
                │  Gemini (LLM)      │
                │  Reads context +   │
                │  question → Answer │
                └────────┬───────────┘
                         │
          (3) Return Final Response
                         │
                         ▼
                ┌────────────────────┐
                │     Client         │
                │ (Answer Displayed) │
                └────────────────────┘


## 0) Prerequisites

- Create accounts and get API keys:
  - **Google Generative AI**: https://ai.google.dev/
  - **Pinecone**: https://www.pinecone.io/
- Save keys in `.env` file. 
- Your `.env` file should have `GOOGLE_API_KEY` & `PINECONE_API_KEY`
- Make sure you’re on Python 3.10+.


## 1) Install libraries 

Run this once per environment.


In [1]:
# If you're in Colab, uncomment the next line:
!pip -q install -r ../requirements.txt


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## 2) Extract your API keys from `.env` file

- We’ll read keys from environment variables to keep them out of source control.
- If you prefer, just paste directly when prompted.


In [4]:
import os, getpass

os.environ['GOOGLE_API_KEY'] = os.getenv('GOOGLE_API_KEY') or getpass.getpass('Enter GOOGLE_API_KEY: ')
os.environ['PINECONE_API_KEY'] = os.getenv('PINECONE_API_KEY') or getpass.getpass('Enter PINECONE_API_KEY: ')

print('Keys set in this session. ✅')

Keys set in this session. ✅


## 3) What is RAG? (plain English)

**Problem:** LLMs don’t know your private data and can hallucinate.  
**RAG idea:** 1) **Retrieve** the best passages from your knowledge base, then 2) **Generate** an answer that **cites** those passages.  
This keeps answers grounded in your data, is cheaper than fine‑tuning, and updates instantly when you add new docs.


## 4) Tiny pdf to play with 

To keep it simple, we'll use a sample pdf which can be used as context


In [5]:
# Extract text from PDF
import pypdf

def extract_pdf_text(pdf_path):
    """Extract text from PDF file"""
    text = ""
    with open(pdf_path, 'rb') as file:
        pdf_reader = pypdf.PdfReader(file)
        for page in pdf_reader.pages:
            text += page.extract_text() + "\n"
    return text

# Extract text from your ResumeBook.pdf
pdf_text = extract_pdf_text("ResumeBook.pdf")

# Create documents from PDF content
docs = [
    {
        "id": "resume-book",
        "text": pdf_text
    }
]

print(f"Extracted {len(pdf_text)} characters from PDF")
print(f"Number of documents: {len(docs)}")
print(f"First 200 characters of PDF content:")
print(pdf_text[:200] + "...")

Ignoring wrong pointing object 27 0 (offset 0)
Ignoring wrong pointing object 35 0 (offset 0)
Ignoring wrong pointing object 60 0 (offset 0)
Ignoring wrong pointing object 327 0 (offset 0)
Ignoring wrong pointing object 329 0 (offset 0)
Ignoring wrong pointing object 331 0 (offset 0)
Ignoring wrong pointing object 333 0 (offset 0)
Ignoring wrong pointing object 335 0 (offset 0)
Ignoring wrong pointing object 337 0 (offset 0)
Ignoring wrong pointing object 339 0 (offset 0)
Ignoring wrong pointing object 341 0 (offset 0)
Ignoring wrong pointing object 343 0 (offset 0)
Ignoring wrong pointing object 345 0 (offset 0)
Ignoring wrong pointing object 347 0 (offset 0)
Ignoring wrong pointing object 355 0 (offset 0)


Extracted 44389 characters from PDF
Number of documents: 1
First 200 characters of PDF content:

Introduction
Whether you have recently graduated, been laid off, or are simply 
window shopping for new opportunities - if you are reading this book 
you are in a period of transition. An exciting ti...


## 5) Chunk the text

RAG works best when long docs are split into **chunks**. We’ll do a super simple split here.


In [6]:
from typing import List, Dict

def chunk_text(text: str, chunk_size: int = 400, overlap: int = 40) -> List[str]:
    """Split text into overlapping chunks for better retrieval"""
    words = text.split()
    chunks = []
    start = 0
    while start < len(words):
        end = min(len(words), start + chunk_size)
        chunks.append(' '.join(words[start:end]))
        if end == len(words): break
        start = end - overlap
    return chunks

# Process the PDF document into chunks
chunks: List[Dict] = []
for d in docs:
    print(f"Processing document: {d['id']}")
    print(f"Text length: {len(d['text'])} characters")
    
    # Create chunks from the PDF text
    text_chunks = chunk_text(d["text"])
    print(f"Created {len(text_chunks)} chunks")
    
    for i, ch in enumerate(text_chunks):
        chunks.append({
            "id": f'{d["id"]}-{i}',
            "text": ch,
            "source": d["id"]
        })

print(f"\nTotal chunks created: {len(chunks)}")
print(f"First chunk preview:")
print(f"ID: {chunks[0]['id']}")
print(f"Source: {chunks[0]['source']}")
print(f"Text: {chunks[0]['text'][:200]}...")

Processing document: resume-book
Text length: 44389 characters
Created 20 chunks

Total chunks created: 20
First chunk preview:
ID: resume-book-0
Source: resume-book
Text: Introduction Whether you have recently graduated, been laid off, or are simply window shopping for new opportunities - if you are reading this book you are in a period of transition. An exciting time,...


## 6) Create a Pinecone index and store embeddings

We’ll:
1. Initialize Pinecone
2. Create or connect to an index
3. Embed each chunk with **Gemini embeddings**  
4. Upsert vectors into Pinecone


In [7]:
import os
from pinecone import Pinecone, ServerlessSpec
from langchain_google_genai import GoogleGenerativeAIEmbeddings

PINECONE_API_KEY = os.environ['PINECONE_API_KEY']
pc = Pinecone(api_key=PINECONE_API_KEY)

index_name = "rag-demo-gemini"
# Create the index if it doesn't exist
if index_name not in [i.name for i in pc.list_indexes()]:
    pc.create_index(
        name=index_name,
        dimension=768,  # dimension of text-embedding-004
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )

index = pc.Index(index_name)

# Embeddings: Google's newest general embedding model as of 2024+
embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

# Build vectors for upsert
vectors = []
for c in chunks:
    vec = embeddings.embed_query(c["text"])  # returns a 768-dim list
    vectors.append({
        "id": c["id"],
        "values": vec,
        "metadata": {"text": c["text"], "source": c["source"]}
    })

# Upsert to Pinecone
index.upsert(vectors=vectors)
len(vectors)

E0000 00:00:1760916565.556182   85760 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.


20

## 7) Build the retriever and the RAG chain (few lines)

- **Retriever:** queries Pinecone for the top‑k similar chunks  
- **LLM:** Gemini reads the chunks and answers, with simple citations


In [8]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import PromptTemplate
from pinecone import Pinecone

# Init Pinecone
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
index_name = "rag-demo-gemini"
index = pc.Index(index_name)

# LangChain embeddings for querying
embeddings = GoogleGenerativeAIEmbeddings(model="text-embedding-004")

# LLM for answering
llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",  # cheap + fast; swap to gemini-1.5-pro for higher quality
    temperature=0.2,
)

# Prompt template
prompt = PromptTemplate(
    input_variables=["question", "context"],
    template=(
        """You are a helpful assistant that answers using only the provided context.

Context:
{context}

Question: {question}

Rules:
- Be concise and clear.
- If the answer is not in the context, say you don't know.
- Cite sources at the end as [source:id]."""
    ),
)

def retrieve_docs(question, k=3):
    """Retrieve relevant documents from Pinecone using direct query"""
    # Generate embedding for the question
    query_embedding = embeddings.embed_query(question)
    
    # Query Pinecone directly
    results = index.query(
        vector=query_embedding,
        top_k=k,
        include_metadata=True
    )
    
    # Format results for the prompt
    docs = []
    for i, match in enumerate(results.matches, 1):
        source = match.metadata.get("source", "unknown")
        text = match.metadata.get("text", "")
        docs.append(f"[{i}] ({source})\n{text}")
    
    return "\n\n".join(docs)

def ask_question(question):
    """Ask a question and get an answer using RAG"""
    # Retrieve relevant context
    context = retrieve_docs(question)
    print("Context:")
    print(context)
    # Format the prompt
    formatted_prompt = prompt.format(question=question, context=context)
    
    # Get answer from LLM
    response = llm.invoke(formatted_prompt)
    return response

print("Simplified RAG ready. ✅")

Simplified RAG ready. ✅


E0000 00:00:1760916571.198188   85760 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.
E0000 00:00:1760916571.199721   85760 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.


## 8) Ask questions

Now you can ask questions about your resume book! Try questions like:
- *Give me tips to prepare a resume*  
- *How many pages should my Resume have?*  
- *Do I need a cover letter?*


In [9]:
question = "Give me tips to prepare a resume"
response = ask_question(question)
print("Response:")
print(response.content)

Context:
[1] (resume-book)
shows you’re a skilled communicator who is passionate about the company. Adapt your generic cover letter completely to ﬁt the companies’ needs and style. Here are 2 cover letters that got people hired at Crew from Mikael Cho’s article. Notice how the tone and the vibe matches the company. Sample Introductory email Short and sweet cover letter introduction 2. A custom resume Most people send the same generic resume to every employer. This is a huge miss. Every employer has different needs that you need to ﬁll. You need to think like a marketer — send the recruiter to a custom landing page targeted to meeting their needs. More details on customizing your resume can be found in the next section. 3. Bonus: Beef up your online brand Here are some additional things you can do to increase your marketplace value. LinkedIn. Update your LinkedIn with relevant work experience, your summary, and your brand. Add at least 50 people to your network. Twitter. Write a good de

## 9) How the pieces fit together

1. **Embed** your chunks with Gemini → numbers (vectors) that capture meaning  
2. **Store** vectors in **Pinecone** → fast similarity search  
3. **Query Pinecone directly** for top‑k chunks using question embedding  
4. **Generate** an answer with **Gemini**, using a prompt that **forces grounding** in the retrieved context  
5. **Cite** sources so users can verify

That’s RAG — retrieve first, then generate.
