# Long Context & Efficient Prompting with Gemini 3 Pro

> **Created by [Build Fast with AI](https://www.buildfastwithai.com)**

This notebook demonstrates how to work with long contexts and create efficient prompts using Gemini 3 Pro's extended context window.

## What you'll learn:
- Understanding context windows
- Working with long documents
- Prompt optimization techniques
- Context management strategies
- Token counting and optimization
- Best practices for long-form content

## 1. Installation and Setup

In [None]:
!pip install -q google-generativeai tiktoken

In [None]:
import os
import google.generativeai as genai
from IPython.display import Markdown, display
import time

In [None]:
# Configure API key
try:
    from google.colab import userdata
    GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
except:
    GOOGLE_API_KEY = os.environ.get('GOOGLE_API_KEY', 'your-api-key-here')

genai.configure(api_key=GOOGLE_API_KEY)

## 2. Understanding Context Windows

Gemini 3 Pro has a large context window that allows processing extensive amounts of text.

In [None]:
# Get model information
for model in genai.list_models():
    if 'gemini-3-pro' in model.name:
        print(f"Model: {model.name}")
        print(f"Display Name: {model.display_name}")
        print(f"Input Token Limit: {model.input_token_limit}")
        print(f"Output Token Limit: {model.output_token_limit}")
        print(f"Supported Methods: {model.supported_generation_methods}")
        print("\n" + "="*80 + "\n")

## 3. Token Counting

Understanding how to count tokens is crucial for working with long contexts.

In [None]:
def estimate_tokens(text: str) -> int:
    """Estimate token count (rough approximation)."""
    # Rough estimate: ~4 characters per token
    return len(text) // 4

def count_tokens_gemini(text: str) -> int:
    """Count tokens using Gemini's token counter."""
    model = genai.GenerativeModel('gemini-3-pro')
    return model.count_tokens(text).total_tokens

# Test token counting
sample_texts = [
    "Hello, world!",
    "The quick brown fox jumps over the lazy dog.",
    "Artificial intelligence is transforming the way we interact with technology. " * 10
]

print("Token Counting Comparison:\n")
for i, text in enumerate(sample_texts, 1):
    estimated = estimate_tokens(text)
    actual = count_tokens_gemini(text)
    
    print(f"Text {i}:")
    print(f"  Characters: {len(text)}")
    print(f"  Words: {len(text.split())}")
    print(f"  Estimated tokens: {estimated}")
    print(f"  Actual tokens: {actual}")
    print()

## 4. Processing Long Documents

In [None]:
# Generate a long document
def generate_long_document():
    """Generate a sample long document."""
    sections = [
        {
            "title": "Introduction to Machine Learning",
            "content": "Machine learning is a subset of artificial intelligence that enables " 
                      "computers to learn from data without being explicitly programmed. " * 20
        },
        {
            "title": "Supervised Learning",
            "content": "Supervised learning involves training models on labeled data where the " 
                      "correct output is known. Common algorithms include linear regression, " 
                      "decision trees, and neural networks. " * 20
        },
        {
            "title": "Unsupervised Learning",
            "content": "Unsupervised learning works with unlabeled data to discover hidden patterns. " 
                      "Clustering algorithms like K-means and dimensionality reduction techniques " 
                      "like PCA are popular methods. " * 20
        },
        {
            "title": "Deep Learning",
            "content": "Deep learning uses neural networks with multiple layers to process complex " 
                      "patterns in large datasets. It has revolutionized fields like computer vision, " 
                      "natural language processing, and speech recognition. " * 20
        },
        {
            "title": "Applications",
            "content": "Machine learning is applied in numerous domains including healthcare, finance, " 
                      "autonomous vehicles, recommendation systems, and fraud detection. " * 20
        }
    ]
    
    document = "\n\n".join([
        f"## {section['title']}\n\n{section['content']}"
        for section in sections
    ])
    
    return document

long_doc = generate_long_document()

print(f"Document Statistics:")
print(f"  Characters: {len(long_doc):,}")
print(f"  Words: {len(long_doc.split()):,}")
print(f"  Estimated tokens: {estimate_tokens(long_doc):,}")
print(f"  Actual tokens: {count_tokens_gemini(long_doc):,}")
print(f"\nFirst 500 characters:")
print(long_doc[:500] + "...")

In [None]:
# Process the long document
model = genai.GenerativeModel('gemini-3-pro')

# Various operations on long context
operations = [
    ("Summarize", "Provide a concise summary of this document in 3-4 sentences."),
    ("Key Points", "Extract the 5 most important key points from this document."),
    ("Questions", "Generate 3 questions that this document answers."),
    ("Critique", "What topics are missing that should be included?")
]

for operation_name, instruction in operations:
    print(f"\n{'='*80}")
    print(f"Operation: {operation_name}")
    print(f"{'='*80}\n")
    
    prompt = f"{instruction}\n\nDocument:\n{long_doc}"
    response = model.generate_content(prompt)
    
    display(Markdown(response.text))

## 5. Efficient Prompting Techniques

In [None]:
class EfficientPrompter:
    """Helper class for efficient prompt engineering."""
    
    def __init__(self):
        self.model = genai.GenerativeModel('gemini-3-pro')
    
    def structured_prompt(self, task: str, context: str, constraints: list = None,
                         examples: list = None, output_format: str = None) -> str:
        """Create a well-structured prompt."""
        parts = []
        
        # Task
        parts.append(f"**Task:** {task}\n")
        
        # Context
        if context:
            parts.append(f"**Context:**\n{context}\n")
        
        # Constraints
        if constraints:
            parts.append("**Constraints:**\n" + "\n".join(f"- {c}" for c in constraints) + "\n")
        
        # Examples
        if examples:
            parts.append("**Examples:**\n" + "\n".join(f"{i}. {ex}" for i, ex in enumerate(examples, 1)) + "\n")
        
        # Output format
        if output_format:
            parts.append(f"**Output Format:** {output_format}\n")
        
        return "\n".join(parts)
    
    def chain_of_thought(self, question: str, context: str = "") -> str:
        """Use chain-of-thought prompting."""
        prompt = f"""
        Question: {question}
        
        {f'Context: {context}' if context else ''}
        
        Let's approach this step-by-step:
        1. First, let's understand what we're being asked
        2. Then, let's identify the relevant information
        3. Next, let's reason through the problem
        4. Finally, let's arrive at our answer
        
        Please provide your reasoning for each step.
        """
        
        response = self.model.generate_content(prompt)
        return response.text
    
    def few_shot_prompt(self, task: str, examples: list, query: str) -> str:
        """Create a few-shot learning prompt."""
        prompt = f"Task: {task}\n\nExamples:\n"
        
        for i, example in enumerate(examples, 1):
            prompt += f"\nExample {i}:\n"
            prompt += f"Input: {example['input']}\n"
            prompt += f"Output: {example['output']}\n"
        
        prompt += f"\nNow, apply this to:\nInput: {query}\nOutput:"
        
        response = self.model.generate_content(prompt)
        return response.text

# Test efficient prompting
prompter = EfficientPrompter()

# Example 1: Structured prompt
print("Example 1: Structured Prompt\n")
structured = prompter.structured_prompt(
    task="Analyze the sentiment of customer reviews",
    context="Review: 'This product exceeded my expectations! Great quality and fast shipping.'",
    constraints=[
        "Classify as Positive, Negative, or Neutral",
        "Provide confidence score (0-1)",
        "Identify key phrases"
    ],
    output_format="JSON with fields: sentiment, confidence, key_phrases"
)

print(structured)
response = prompter.model.generate_content(structured)
display(Markdown(response.text))

In [None]:
# Example 2: Chain of Thought
print("\n" + "="*80)
print("Example 2: Chain of Thought Prompting\n")

cot_response = prompter.chain_of_thought(
    question="If a store has a 20% discount on an item that costs $150, and then applies an additional 10% off the discounted price, what is the final price?",
    context=""
)

display(Markdown(cot_response))

In [None]:
# Example 3: Few-Shot Learning
print("\n" + "="*80)
print("Example 3: Few-Shot Learning\n")

few_shot_response = prompter.few_shot_prompt(
    task="Convert technical jargon into simple language",
    examples=[
        {
            "input": "The API endpoint returns a JSON payload containing user metadata.",
            "output": "The web service sends back user information in a structured format."
        },
        {
            "input": "We need to optimize the database queries to reduce latency.",
            "output": "We need to make the database faster by improving how we ask for information."
        }
    ],
    query="The machine learning model achieved 94% accuracy on the validation set."
)

display(Markdown(few_shot_response))

## 6. Context Management Strategies

In [None]:
class ContextManager:
    """Manage long contexts efficiently."""
    
    def __init__(self, max_tokens: int = 10000):
        self.model = genai.GenerativeModel('gemini-3-pro')
        self.max_tokens = max_tokens
        self.context = []
    
    def add_to_context(self, text: str, role: str = "user"):
        """Add text to context with role."""
        self.context.append({
            "role": role,
            "content": text,
            "tokens": count_tokens_gemini(text)
        })
        self._manage_context_size()
    
    def _manage_context_size(self):
        """Ensure context doesn't exceed max tokens."""
        total_tokens = sum(item["tokens"] for item in self.context)
        
        while total_tokens > self.max_tokens and len(self.context) > 1:
            # Remove oldest non-system messages
            removed = self.context.pop(0)
            total_tokens -= removed["tokens"]
            print(f"Removed old context (saved {removed['tokens']} tokens)")
    
    def get_context_string(self) -> str:
        """Get formatted context string."""
        return "\n\n".join([
            f"{item['role'].upper()}: {item['content']}"
            for item in self.context
        ])
    
    def query_with_context(self, question: str) -> str:
        """Query using managed context."""
        self.add_to_context(question, "user")
        
        full_prompt = self.get_context_string()
        response = self.model.generate_content(full_prompt)
        
        self.add_to_context(response.text, "assistant")
        
        return response.text
    
    def get_stats(self) -> dict:
        """Get context statistics."""
        total_tokens = sum(item["tokens"] for item in self.context)
        return {
            "messages": len(self.context),
            "total_tokens": total_tokens,
            "remaining_tokens": self.max_tokens - total_tokens,
            "utilization": f"{(total_tokens / self.max_tokens) * 100:.1f}%"
        }

# Test context management
manager = ContextManager(max_tokens=1000)

# Add context
manager.add_to_context("You are a helpful programming tutor.", "system")

# Have a conversation
questions = [
    "What is Python?",
    "How do I create a list?",
    "What's the difference between a list and a tuple?",
    "Can you show me an example?"
]

for q in questions:
    print(f"\nUser: {q}")
    response = manager.query_with_context(q)
    print(f"Assistant: {response[:200]}..." if len(response) > 200 else f"Assistant: {response}")
    print(f"\nContext Stats: {manager.get_stats()}")
    print("-" * 80)

## 7. Document Chunking Strategies

In [None]:
class DocumentChunker:
    """Intelligent document chunking."""
    
    def __init__(self, max_chunk_tokens: int = 2000):
        self.max_chunk_tokens = max_chunk_tokens
    
    def chunk_by_tokens(self, text: str, overlap: int = 100) -> list:
        """Chunk text by token count with overlap."""
        words = text.split()
        chunks = []
        
        # Approximate: 1.3 words per token
        words_per_chunk = int(self.max_chunk_tokens * 1.3)
        overlap_words = int(overlap * 1.3)
        
        for i in range(0, len(words), words_per_chunk - overlap_words):
            chunk = " ".join(words[i:i + words_per_chunk])
            chunks.append(chunk)
            
            if i + words_per_chunk >= len(words):
                break
        
        return chunks
    
    def chunk_by_paragraphs(self, text: str) -> list:
        """Chunk by paragraphs, combining small ones."""
        paragraphs = text.split('\n\n')
        chunks = []
        current_chunk = ""
        current_tokens = 0
        
        for para in paragraphs:
            para_tokens = estimate_tokens(para)
            
            if current_tokens + para_tokens > self.max_chunk_tokens:
                if current_chunk:
                    chunks.append(current_chunk.strip())
                current_chunk = para
                current_tokens = para_tokens
            else:
                current_chunk += "\n\n" + para
                current_tokens += para_tokens
        
        if current_chunk:
            chunks.append(current_chunk.strip())
        
        return chunks
    
    def chunk_by_sentences(self, text: str) -> list:
        """Chunk by sentences."""
        import re
        sentences = re.split(r'(?<=[.!?])\s+', text)
        
        chunks = []
        current_chunk = ""
        current_tokens = 0
        
        for sentence in sentences:
            sentence_tokens = estimate_tokens(sentence)
            
            if current_tokens + sentence_tokens > self.max_chunk_tokens:
                if current_chunk:
                    chunks.append(current_chunk.strip())
                current_chunk = sentence
                current_tokens = sentence_tokens
            else:
                current_chunk += " " + sentence
                current_tokens += sentence_tokens
        
        if current_chunk:
            chunks.append(current_chunk.strip())
        
        return chunks

# Test chunking strategies
chunker = DocumentChunker(max_chunk_tokens=500)

strategies = {
    "Token-based": chunker.chunk_by_tokens,
    "Paragraph-based": chunker.chunk_by_paragraphs,
    "Sentence-based": chunker.chunk_by_sentences
}

for strategy_name, strategy_func in strategies.items():
    chunks = strategy_func(long_doc)
    
    print(f"\n{strategy_name} Chunking:")
    print(f"  Total chunks: {len(chunks)}")
    print(f"  Avg tokens per chunk: {sum(estimate_tokens(c) for c in chunks) / len(chunks):.0f}")
    print(f"  First chunk preview: {chunks[0][:100]}...")
    print("-" * 80)

## 8. Summarization of Long Documents

In [None]:
class LongDocumentSummarizer:
    """Summarize documents that exceed context window."""
    
    def __init__(self):
        self.model = genai.GenerativeModel('gemini-3-pro')
        self.chunker = DocumentChunker(max_chunk_tokens=3000)
    
    def summarize_chunks(self, text: str) -> str:
        """Summarize by processing chunks."""
        chunks = self.chunker.chunk_by_paragraphs(text)
        
        print(f"Processing {len(chunks)} chunks...\n")
        
        chunk_summaries = []
        for i, chunk in enumerate(chunks, 1):
            print(f"Summarizing chunk {i}/{len(chunks)}...")
            
            response = self.model.generate_content(
                f"Summarize this section concisely:\n\n{chunk}"
            )
            chunk_summaries.append(response.text)
        
        # Combine summaries
        combined = "\n\n".join(chunk_summaries)
        
        # Final summary
        print("\nCreating final summary...")
        final_response = self.model.generate_content(
            f"Create a comprehensive summary from these section summaries:\n\n{combined}"
        )
        
        return final_response.text
    
    def map_reduce_summarize(self, text: str) -> str:
        """Use map-reduce strategy for summarization."""
        chunks = self.chunker.chunk_by_paragraphs(text)
        
        # Map: Summarize each chunk
        summaries = []
        for chunk in chunks:
            response = self.model.generate_content(
                f"Extract key points from this text:\n\n{chunk}"
            )
            summaries.append(response.text)
        
        # Reduce: Combine summaries
        combined = "\n".join(summaries)
        
        final = self.model.generate_content(
            f"Synthesize these key points into a coherent summary:\n\n{combined}"
        )
        
        return final.text

# Test summarization
summarizer = LongDocumentSummarizer()

print("Summarizing long document...\n")
summary = summarizer.summarize_chunks(long_doc)

print("\n" + "="*80)
print("Final Summary:")
print("="*80)
display(Markdown(summary))

## 9. Question Answering Over Long Documents

In [None]:
class LongDocumentQA:
    """Answer questions about long documents."""
    
    def __init__(self):
        self.model = genai.GenerativeModel('gemini-3-pro')
        self.chunker = DocumentChunker(max_chunk_tokens=2000)
    
    def find_relevant_chunks(self, question: str, chunks: list, top_k: int = 3) -> list:
        """Find most relevant chunks for the question."""
        # Simple relevance scoring (in production, use embeddings)
        question_words = set(question.lower().split())
        
        scored_chunks = []
        for i, chunk in enumerate(chunks):
            chunk_words = set(chunk.lower().split())
            overlap = len(question_words & chunk_words)
            scored_chunks.append((i, chunk, overlap))
        
        # Sort by score and return top k
        scored_chunks.sort(key=lambda x: x[2], reverse=True)
        return [chunk for _, chunk, _ in scored_chunks[:top_k]]
    
    def answer_question(self, question: str, document: str) -> dict:
        """Answer a question about the document."""
        # Chunk the document
        chunks = self.chunker.chunk_by_paragraphs(document)
        
        # Find relevant chunks
        relevant_chunks = self.find_relevant_chunks(question, chunks, top_k=3)
        
        # Combine relevant context
        context = "\n\n".join(relevant_chunks)
        
        # Generate answer
        prompt = f"""
        Answer the following question based on the provided context.
        If the answer is not in the context, say so.
        
        Context:
        {context}
        
        Question: {question}
        
        Answer:
        """
        
        response = self.model.generate_content(prompt)
        
        return {
            "answer": response.text,
            "chunks_used": len(relevant_chunks),
            "context_length": len(context)
        }

# Test Q&A
qa_system = LongDocumentQA()

questions = [
    "What is supervised learning?",
    "What are some applications of machine learning?",
    "How does deep learning differ from traditional machine learning?",
    "What is quantum computing?"  # Not in document
]

print("Question Answering Test:\n")
for q in questions:
    print(f"Q: {q}")
    result = qa_system.answer_question(q, long_doc)
    print(f"A: {result['answer']}")
    print(f"   (Used {result['chunks_used']} chunks, {result['context_length']} chars)\n")
    print("-" * 80 + "\n")

## 10. Best Practices for Long Context Processing

### Context Window Management:

1. **Know Your Limits**: Understand the model's context window
2. **Token Counting**: Always count tokens before sending
3. **Graceful Degradation**: Have fallback strategies for oversized inputs
4. **Context Pruning**: Remove less relevant information when needed

### Chunking Strategies:

1. **Semantic Boundaries**: Chunk at natural boundaries (paragraphs, sections)
2. **Overlap**: Use overlapping chunks to maintain context
3. **Adaptive Sizing**: Adjust chunk size based on content structure
4. **Metadata**: Preserve metadata about chunk position

### Prompt Optimization:

1. **Front-Load Instructions**: Put key instructions early
2. **Clear Structure**: Use clear sections and formatting
3. **Concise Context**: Include only relevant information
4. **Example-Driven**: Use examples for complex tasks

### Performance:

1. **Parallel Processing**: Process independent chunks in parallel
2. **Caching**: Cache results for repeated operations
3. **Incremental Processing**: Process streams incrementally
4. **Batch Operations**: Group similar operations

### Quality:

1. **Validation**: Validate outputs for consistency
2. **Cross-Referencing**: Verify answers across chunks
3. **Confidence Scoring**: Include confidence in responses
4. **Human Review**: Enable human oversight for critical tasks

## 11. Advanced Techniques

### Map-Reduce Pattern:
- Map: Process each chunk independently
- Reduce: Combine results into final output
- Use for: Summarization, extraction, analysis

### Iterative Refinement:
- First pass: Quick extraction
- Second pass: Refinement and verification
- Use for: Complex analysis, fact-checking

### Hierarchical Processing:
- Level 1: Process small sections
- Level 2: Combine section results
- Level 3: Final synthesis
- Use for: Very long documents, books

### Selective Context:
- Identify key sections
- Focus processing on relevant parts
- Use for: Q&A, targeted extraction

## 12. Real-World Applications

### Long Context Use Cases:

1. **Legal Document Analysis**:
   - Contract review
   - Compliance checking
   - Case law research

2. **Research Paper Processing**:
   - Literature review
   - Citation analysis
   - Methodology extraction

3. **Business Intelligence**:
   - Report analysis
   - Trend identification
   - Competitive analysis

4. **Content Creation**:
   - Book summarization
   - Long-form content generation
   - Editorial review

5. **Customer Support**:
   - Knowledge base Q&A
   - Ticket analysis
   - Documentation search

6. **Code Analysis**:
   - Large codebase review
   - Documentation generation
   - Bug detection

## Next Steps

Continue exploring advanced AI capabilities:
- Build production-grade RAG systems
- Implement multi-agent workflows
- Create domain-specific AI assistants
- Deploy at scale with monitoring

---

## Learn More

Master advanced AI techniques and build production systems with the **[Gen AI Crash Course](https://www.buildfastwithai.com/genai-course)** by Build Fast with AI!

**Created by [Build Fast with AI](https://www.buildfastwithai.com)**