# 📚 Building Your First AI Chatbot - Interactive Tutorial

Welcome! This notebook will guide you through building an AI chatbot that can answer questions about PDF documents.

## 🎯 What You'll Learn
1. How LLMs (Large Language Models) work
2. Processing PDF documents
3. Context Augmented Generation (CAG)
4. Interacting with AI APIs
5. Building a complete chatbot

## 📋 Prerequisites
- Basic Python knowledge
- OpenAI API key (or we'll use a free alternative)
- A PDF file to test with

## Step 1: Setup and Installation

First, let's make sure we have all the required packages installed.

In [None]:
# Install required packages (run this once)
# Uncomment the line below if packages are not installed
# !pip install openai python-dotenv pypdf

In [None]:
# Import libraries
import os
from pathlib import Path
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

print("✅ Imports successful!")

## Step 2: Understanding PDFs and Text Extraction

Before we can chat about a PDF, we need to extract its text content.

In [None]:
import pypdf

# Let's extract text from a PDF
pdf_path = "../data/sample.pdf"  # Change this to your PDF

# Check if file exists
if Path(pdf_path).exists():
    with open(pdf_path, 'rb') as file:
        pdf_reader = pypdf.PdfReader(file)
        
        print(f"📄 PDF Info:")
        print(f"   Pages: {len(pdf_reader.pages)}")
        print(f"   Metadata: {pdf_reader.metadata}")
        
        # Extract first page text
        first_page = pdf_reader.pages[0]
        text = first_page.extract_text()
        
        print(f"\n📝 First 500 characters:")
        print(text[:500])
else:
    print("❌ PDF file not found!")
    print("💡 Add a PDF to the data/ folder and update the path above")

## Step 3: Understanding LLMs (Large Language Models)

### What is an LLM?
- A neural network trained on massive amounts of text
- Predicts the next word/token based on previous context
- Examples: GPT-4, Claude, Llama, Gemini

### Key Concepts:
- **Tokens**: Words or word pieces (1 token ≈ 4 characters)
- **Context Window**: Max tokens the model can process at once
- **Temperature**: Controls randomness (0=deterministic, 1=creative)
- **System Prompt**: Instructions for how the AI should behave
- **User Prompt**: Your actual question or request

In [None]:
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Simple example: asking the LLM a question
response = client.chat.completions.create(
    model="gpt-4o-mini",  # Using the cheaper, faster model
    messages=[
        {"role": "system", "content": "You are a helpful teacher."},
        {"role": "user", "content": "Explain what an AI agent is in one simple sentence."}
    ],
    temperature=0.7
)

answer = response.choices[0].message.content
print(f"🤖 AI Response:\n{answer}")
print(f"\n📊 Tokens used: {response.usage.total_tokens}")

## Step 4: Context Augmented Generation (CAG)

### The Magic of CAG:
1. Extract text from PDF
2. Send the text + question to the LLM
3. LLM answers based on the provided context

### CAG vs RAG:
- **CAG**: Send entire document as context (simple, works for smaller docs)
- **RAG**: Search for relevant chunks, only send those (better for large docs)

For learning, CAG is perfect!

In [None]:
# Let's extract full PDF text
def extract_full_pdf(pdf_path):
    with open(pdf_path, 'rb') as file:
        pdf_reader = pypdf.PdfReader(file)
        text_parts = []
        
        for page in pdf_reader.pages:
            text_parts.append(page.extract_text())
        
        return "\n\n".join(text_parts)

# Try it (if you have a PDF)
if Path(pdf_path).exists():
    full_text = extract_full_pdf(pdf_path)
    
    print(f"📊 Extraction Stats:")
    print(f"   Characters: {len(full_text):,}")
    print(f"   Words: {len(full_text.split()):,}")
    print(f"   Est. Tokens: {len(full_text) // 4:,}")
    
    # Store for next step
    pdf_content = full_text
else:
    print("Add a PDF to test this!")

## Step 5: Building the Chatbot!

Now let's put it all together and create our PDF chatbot.

In [None]:
def ask_pdf_question(question, pdf_content):
    """
    Ask a question about the PDF content.
    This is CAG in action!
    """
    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    
    # Build the messages
    messages = [
        {
            "role": "system",
            "content": (
                "You are a helpful assistant that answers questions based on "
                "the provided document. Use only information from the document. "
                "If the answer isn't in the document, say so clearly."
            )
        },
        {
            "role": "system",
            "content": f"Document content:\n\n{pdf_content}"
        },
        {
            "role": "user",
            "content": question
        }
    ]
    
    # Get response
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        temperature=0.7,
        max_tokens=1000
    )
    
    answer = response.choices[0].message.content
    tokens = response.usage.total_tokens
    
    return answer, tokens

In [None]:
# Test the chatbot!
if 'pdf_content' in locals():
    question = "What is this document about? Give me a brief summary."
    
    print(f"❓ Question: {question}\n")
    answer, tokens = ask_pdf_question(question, pdf_content)
    print(f"🤖 Answer: {answer}")
    print(f"\n📊 Tokens used: {tokens}")
else:
    print("Add a PDF first to test the chatbot!")

## Step 6: Try Your Own Questions!

Now you can ask any question about your PDF.

In [None]:
# Interactive question answering
if 'pdf_content' in locals():
    # Try different questions:
    questions = [
        "What are the main topics covered?",
        "List any key concepts or terms",
        "What's the most important takeaway?"
    ]
    
    for q in questions:
        print(f"\n{'='*60}")
        print(f"❓ {q}")
        print('='*60)
        answer, _ = ask_pdf_question(q, pdf_content)
        print(f"🤖 {answer}")
else:
    print("Add a PDF to the data/ folder to start chatting!")

## 🎓 What You've Learned!

Congratulations! You've now built a working AI chatbot. Here's what you learned:

1. **PDF Processing**: How to extract text from documents
2. **LLM Interaction**: How to send prompts and get responses
3. **CAG**: How to use context to make AI more accurate
4. **Prompt Engineering**: Crafting system and user prompts
5. **Token Management**: Understanding costs and limits

## 🚀 Next Steps

Ready to level up? Here are ideas:

1. **Add RAG**: Learn vector databases (ChromaDB) for large documents
2. **Memory**: Make the chatbot remember conversation history
3. **Multi-PDF**: Handle multiple documents at once
4. **Local Models**: Use Ollama to run LLMs locally (free!)
5. **Web Interface**: Build a UI with Streamlit or Gradio
6. **Advanced Agents**: Add tools, function calling, chains

## 📚 Resources

- [OpenAI API Documentation](https://platform.openai.com/docs)
- [LangChain](https://python.langchain.com/) - Framework for LLM apps
- [Ollama](https://ollama.ai/) - Run local models
- [ChromaDB](https://www.trychroma.com/) - Vector database for RAG

## 💡 Exercises

Try these challenges:

1. Modify the temperature parameter - how does it change responses?
2. Try different system prompts - make it formal, casual, or expert-level
3. Count tokens in your PDF - will it fit in the context window?
4. Add error handling for when questions aren't answerable
5. Create a conversation history feature