# RAG Chatbot - Simple Version

## What This Does

This notebook creates a chatbot that answers questions about your PDF documents.

**You Need:**
- PDF files in your Google Drive
- A free Google Gemini API key
- Internet connection

**Cost:** 100% FREE

---

**Need help?** See STUDENT_GUIDE.md for detailed instructions.

---
## Step 1: Install Libraries

This installs the required tools. Takes about 30 seconds.

In [None]:
# Install all required packages
!pip install -q chromadb gradio pypdf sentence-transformers google-generativeai vaderSentiment

print("‚úÖ All libraries installed successfully!")
print("\n‚ÑπÔ∏è  Note: You may see dependency warnings about 'opentelemetry' packages.")
print("   These are non-critical and won't affect functionality. You can safely ignore them.")

---
## Step 2: Load Libraries

Load the tools we just installed.

In [None]:
import os
import time
import asyncio
import gradio as gr
import google.generativeai as genai
from pypdf import PdfReader
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.config import Settings
from google.colab import drive
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Libraries imported successfully!")

---
## Step 3: Connect Google Drive

**Steps:**
1. Click the link that appears
2. Choose your Google account
3. Click "Allow"

Your files will be at: `/content/drive/MyDrive/`

In [None]:
# Mount Google Drive
drive.mount('/content/drive')

print("‚úÖ Google Drive mounted successfully!")
print("üìÅ Your files are available at: /content/drive/MyDrive/")

---
## Step 4: Configuration - CHANGE THESE! ‚úèÔ∏è

**‚ö†Ô∏è IMPORTANT: You must edit the values below**

### Get Your FREE API Key:
1. Go to: https://aistudio.google.com/app/apikey
2. Click "Create API Key"
3. Copy the key
4. Paste it in the next cell where it says `YOUR_API_KEY_HERE`

### Add Your PDF Files:
- Upload PDFs to your Google Drive
- Update the `PDF_PATHS` list with your file paths

In [None]:
# ============================================================================
# 1. API KEY - CHANGE THIS! ‚úèÔ∏è
# ============================================================================
# Get your key from: https://aistudio.google.com/app/apikey

GEMINI_API_KEY = "YOUR_API_KEY_HERE"  # ‚Üê PASTE YOUR KEY HERE

# ============================================================================
# 2. PERSONA - CUSTOMIZE THIS! ‚úèÔ∏è
# ============================================================================

PERSONA_NAME = "Your Persona Name"  # ‚úèÔ∏è CHANGE THIS - e.g., "Albert Einstein", "Oprah Winfrey"

# ‚úèÔ∏è CUSTOMIZE THIS: Describe your persona's speaking style and personality
PERSONA_DESCRIPTION = """
Replace this entire section with your persona's description.

Template:
You are [NAME], [brief description/title/role].
You speak in a [adjective] manner, using [characteristic words/phrases].
You emphasize [what they care about] and often [communication patterns].

Instructions for customization:
- Describe HOW they speak (tone, word choice, sentence structure)
- Include specific phrases or words they commonly use
- Mention what topics/themes they emphasize
- Note any unique speaking patterns or habits
- Keep it focused on communication style, not just biographical facts

Example:
"You are Marie Curie, pioneering scientist. You speak precisely and scientifically,
using terms like 'research,' 'experiment,' and 'discovery.' You emphasize evidence-based
reasoning and the importance of persistence in scientific work."
"""

# Tip: Describe how your person speaks and thinks

# ============================================================================
# 3. RESPONSE SETTINGS - OPTIONAL
# ============================================================================

TEMPERATURE = 0.7  # Creativity level (0.0 = focused, 1.0 = creative)

MAX_OUTPUT_TOKENS = 500  # Maximum response length (~375 words)

NUM_RETRIEVED_DOCS = 7  # How many document pieces to search

# ============================================================================
# 4. SOURCE CITATIONS - OPTIONAL
# ============================================================================

SHOW_SOURCES = True  # True = show which PDFs were used, False = hide

# ============================================================================
# 5. CHUNKING SETTINGS - OPTIONAL
# ============================================================================
# How to split your PDFs into searchable pieces

CHUNK_SIZE = 1000  # Characters per chunk (500-2000 recommended)
OVERLAP = 200      # Overlap between chunks (keeps context)

# üí° EXAMPLES - When to adjust:
#
# Example 1: SHORT & PRECISE (for quick facts)
#   CHUNK_SIZE = 500
#   OVERLAP = 100
#   ‚úÖ Best for: Short Q&A, specific facts, definitions
#   ‚úÖ Pros: Fast, precise answers
#   ‚ùå Cons: May miss broader context
#
# Example 2: LONG & CONTEXTUAL (for complex topics)
#   CHUNK_SIZE = 1500
#   OVERLAP = 300
#   ‚úÖ Best for: Detailed explanations, complex reasoning
#   ‚úÖ Pros: Rich context, complete thoughts
#   ‚ùå Cons: Slower, may include irrelevant info
#
# üéØ CURRENT (BALANCED): 1000 chars, 200 overlap
#   ‚úÖ Works well for general conversation and empathy training

# ============================================================================
# 6. PDF FILES - CHANGE THIS! ‚úèÔ∏è
# ============================================================================
# Format: "/content/drive/MyDrive/folder_name/file_name.pdf"

PDF_PATHS = [
    "/content/drive/MyDrive/your_folder/document1.pdf",  # ‚Üê CHANGE THESE
    "/content/drive/MyDrive/your_folder/document2.pdf",  # ‚Üê TO YOUR PATHS
    "/content/drive/MyDrive/your_folder/document3.pdf",
    # Add more files as needed
]

# ============================================================================
# SETUP (Don't change this part)
# ============================================================================
genai.configure(api_key=GEMINI_API_KEY)
model = genai.GenerativeModel('gemini-2.0-flash')  # Fast & free AI model

print("‚úÖ Configuration complete!")
print(f"üìã Persona: {PERSONA_NAME}")
print(f"ü§ñ Model: gemini-2.0-flash")
print(f"üìÑ PDF files: {len(PDF_PATHS)}")
print(f"üå°Ô∏è  Creativity: {TEMPERATURE}")
print(f"üìä Search pieces: {NUM_RETRIEVED_DOCS}")
print(f"üìö Show sources: {'ON ‚úÖ' if SHOW_SOURCES else 'OFF'}")
print(f"üìè Chunk size: {CHUNK_SIZE} chars (overlap: {OVERLAP})")

---
## üß™ Step 5: Test API Connection

**Run this first!** This checks if your API key works.

‚úÖ If successful: Continue to next step  
‚ùå If failed: Check your API key and try again

In [None]:
print("üß™ Testing API connection...")
print("=" * 60)

try:
    # Simple test prompt
    test_response = model.generate_content(
        "Say 'Hello! API is working!' in a friendly, enthusiastic style.",
        generation_config=genai.types.GenerationConfig(
            temperature=0.7,
            max_output_tokens=50,
        ),
    )
    
    print("‚úÖ SUCCESS! API is working!")
    print(f"\nTest Response: {test_response.text}")
    print("\n" + "=" * 60)
    print("‚úÖ You can proceed with the rest of the notebook!")
    
except Exception as e:
    print(f"‚ùå API TEST FAILED!")
    print(f"Error: {str(e)}")
    print("\n" + "=" * 60)
    print("‚ö†Ô∏è  STOP! Fix this issue before proceeding:")
    print("  1. Check your API key is correct")
    print("  2. Check your internet connection")
    print("  3. Visit https://aistudio.google.com/app/apikey to verify your key")
    print("  4. Check API status at https://status.cloud.google.com/")

---
## Step 6: Read PDF Files

This reads your PDFs and splits them into small pieces for searching.

**Time:** 1-2 minutes depending on file size

In [None]:
def extract_text_from_pdf(pdf_path):
    """Get text from a PDF file."""
    try:
        reader = PdfReader(pdf_path)
        text = ""
        for page in reader.pages:
            text += page.extract_text() + "\n"
        return text
    except Exception as e:
        print(f"‚ùå Error reading {pdf_path}: {str(e)}")
        return ""

def chunk_text(text, chunk_size=CHUNK_SIZE, overlap=OVERLAP):
    """Split text into small pieces (chunks) for better searching.
    
    Uses settings from Cell 8:
    - chunk_size: Characters per chunk
    - overlap: Characters that overlap between chunks (prevents splitting mid-sentence)
    """
    chunks = []
    start = 0
    
    while start < len(text):
        end = start + chunk_size
        chunk = text[start:end]
        
        if chunk.strip():
            chunks.append(chunk)
        
        start += chunk_size - overlap  # Move forward, keep overlap
    
    return chunks

# Process all PDFs
print("üìñ Reading PDF files...\n")
all_chunks = []
metadata = []

for idx, pdf_path in enumerate(PDF_PATHS):
    print(f"Processing: {pdf_path}")
    
    if not os.path.exists(pdf_path):
        print(f"‚ö†Ô∏è  File not found - {pdf_path}")
        continue
    
    text = extract_text_from_pdf(pdf_path)
    
    if text:
        chunks = chunk_text(text)  # Split into small pieces
        all_chunks.extend(chunks)
        
        # Save info about where each chunk came from
        for chunk_idx, chunk in enumerate(chunks):
            metadata.append({
                "source": os.path.basename(pdf_path),
                "chunk_id": chunk_idx,
                "total_chunks": len(chunks)
            })
        
        print(f"  ‚úÖ Created {len(chunks)} pieces")
    else:
        print(f"  ‚ö†Ô∏è  No text found")

print(f"\n‚úÖ Done!")
print(f"üìä Total pieces: {len(all_chunks)}")
print(f"üìè Using chunk size: {CHUNK_SIZE} chars with {OVERLAP} char overlap")

if len(all_chunks) == 0:
    print("\n‚ö†Ô∏è  WARNING: No text found in PDFs!")
    print("Check: File paths correct? PDFs not password-protected?")

---
## Step 7: Create Search Database

This creates a searchable database from your PDFs.

**Time:** 1-2 minutes

In [None]:
print("üîß Loading search model...")
# This model converts text to numbers for searching
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
print("‚úÖ Model loaded!")

print("\nüóÑÔ∏è  Creating database...")
chroma_client = chromadb.Client(Settings(
    anonymized_telemetry=False,
    allow_reset=True
))

# Delete old database if it exists
try:
    chroma_client.delete_collection("documents")
except:
    pass

# Create new database
collection = chroma_client.create_collection(
    name="documents",
    metadata={"description": f"Documents for {PERSONA_NAME} chatbot"}
)
print("‚úÖ Database created!")

# Add PDF pieces to database
if len(all_chunks) > 0:
    print(f"\nüì• Adding {len(all_chunks)} pieces to database...")
    
    # Convert text to searchable numbers
    embeddings = embedding_model.encode(all_chunks, show_progress_bar=True)
    
    # Store in database
    collection.add(
        embeddings=embeddings.tolist(),
        documents=all_chunks,
        metadatas=metadata,
        ids=[f"chunk_{i}" for i in range(len(all_chunks))]
    )
    
    print("‚úÖ Database ready!")
    print(f"üìä Total pieces in database: {collection.count()}")
else:
    print("‚ö†Ô∏è  No pieces to add!")

---
## Step 8: Setup Question Answering

This prepares the chatbot to answer your questions.

In [None]:
# Store which PDFs were used for the last answer
last_sources_used = []

def retrieve_relevant_context(query, n_results=NUM_RETRIEVED_DOCS):
    """Find relevant pieces from your PDFs based on the question."""
    global last_sources_used
    try:
        # Convert question to searchable numbers
        query_embedding = embedding_model.encode([query])
        
        # Search database for matching pieces
        results = collection.query(
            query_embeddings=query_embedding.tolist(),
            n_results=min(n_results, collection.count())
        )
        
        documents = results['documents'][0] if results['documents'] else []
        metadatas = results['metadatas'][0] if results['metadatas'] else []
        
        # Track which PDFs were used
        last_sources_used = []
        if metadatas:
            seen_sources = set()
            for meta in metadatas[:2]:  # Only track the 2 pieces we actually use
                source_name = meta.get('source', 'Unknown')
                if source_name not in seen_sources:
                    last_sources_used.append(source_name)
                    seen_sources.add(source_name)
        
        return documents
    except Exception as e:
        print(f"Error searching: {str(e)}")
        last_sources_used = []
        return []

def generate_response_sync(question):
    """Get answer from AI using relevant PDF pieces."""
    # Find relevant pieces from PDFs
    context_docs = retrieve_relevant_context(question)
    
    # Use only top 2 pieces, max 1500 characters (faster responses)
    if context_docs:
        context_docs = context_docs[:2]
        context = "\n\n".join(context_docs)
        context = context[:1500]
    else:
        context = "No relevant documents found."
    
    # Create prompt for AI
    prompt = f"""{PERSONA_DESCRIPTION}

Context: {context}

Question: {question}

Answer in your persona's style:"""
    
    # Get answer from AI (Gemini)
    response = model.generate_content(
        prompt,
        generation_config=genai.types.GenerationConfig(
            temperature=TEMPERATURE,
            max_output_tokens=200,  # Keep answers short
            top_p=0.95,
            top_k=40,
        ),
    )
    
    return response.text

async def generate_response_async(question, chat_history=None, timeout_seconds=30):
    """Wrapper with 30-second timeout to prevent hanging."""
    try:
        # Run AI call with timeout
        response_text = await asyncio.wait_for(
            asyncio.to_thread(generate_response_sync, question),
            timeout=timeout_seconds
        )
        return response_text
    
    except asyncio.TimeoutError:
        return "‚è±Ô∏è **Timeout** - Took too long (>30 seconds). Try a simpler question."
    
    except Exception as e:
        error_str = str(e).lower()
        
        if "429" in str(e) or "quota" in error_str or "rate limit" in error_str:
            return "‚ö†Ô∏è **Rate Limit** - Wait 1-2 minutes and try again."
        elif "timeout" in error_str or "connection" in error_str:
            return "‚ö†Ô∏è **Connection Error** - Check your internet."
        elif "blocked" in error_str or "safety" in error_str:
            return "‚ö†Ô∏è **Content Blocked** - Try different wording."
        elif "api" in error_str or "key" in error_str:
            return "‚ö†Ô∏è **API Error** - Check your API key."
        else:
            return f"‚ùå **Error** - {str(e)[:100]}"

print("‚úÖ Answer system ready!")
print("‚è±Ô∏è  Response time: 5-15 seconds")
if SHOW_SOURCES:
    print("üìö Source citations enabled")

---
## Step 8B: Initialize Empathy Analyzer

This sets up the empathy tracking system that will analyze your messages.

In [None]:
# ============================================================================
# EMPATHY ANALYZER - Tracks 5 dimensions of empathic communication
# ============================================================================

class EmpathyAnalyzer:
    """Analyzes user messages for empathy across 5 dimensions."""
    
    def __init__(self):
        self.vader = SentimentIntensityAnalyzer()
        self.user_messages = []
        self.empathy_scores = []
        self.conversation_history = []
        
        # Empathy linguistic markers
        self.open_question_words = ['how', 'what', 'why', 'tell', 'describe', 'explain']
        self.emotion_words = [
            'feel', 'feeling', 'felt', 'emotion', 'happy', 'sad', 'angry', 
            'frustrated', 'worried', 'anxious', 'excited', 'disappointed',
            'upset', 'hurt', 'joy', 'fear', 'surprise', 'disgust', 'content',
            'grateful', 'proud', 'ashamed', 'guilty', 'nervous', 'scared'
        ]
        self.perspective_phrases = [
            'you feel', 'you might', 'from your', 'in your', 'your perspective',
            'you seem', 'you appear', 'you sound', 'for you', 'to you',
            'you think', 'you believe', 'you experience', 'your view'
        ]
        self.active_listening_phrases = [
            'tell me more', 'i understand', 'i hear', 'i see', 'help me understand',
            'that makes sense', 'i appreciate', 'thank you for sharing',
            'go on', 'continue', 'interesting', 'i get it', 'i follow'
        ]
    
    def analyze_message(self, message):
        """Analyze a single message for empathy markers."""
        message_lower = message.lower()
        
        # 1. Sentiment/Warmth (0-20 points)
        sentiment = self.vader.polarity_scores(message)
        warmth_score = max(0, min(20, (sentiment['compound'] + 1) * 10))
        
        # 2. Open Questions (0-20 points)
        open_question_count = sum(1 for word in self.open_question_words if word in message_lower)
        has_question = '?' in message
        open_score = min(20, open_question_count * 10) if has_question else 0
        
        # 3. Emotion Words (0-20 points)
        emotion_count = sum(1 for word in self.emotion_words if word in message_lower)
        emotion_score = min(20, emotion_count * 7)
        
        # 4. Perspective-Taking (0-20 points)
        perspective_count = sum(1 for phrase in self.perspective_phrases if phrase in message_lower)
        perspective_score = min(20, perspective_count * 10)
        
        # 5. Active Listening (0-20 points)
        listening_count = sum(1 for phrase in self.active_listening_phrases if phrase in message_lower)
        listening_score = min(20, listening_count * 7)
        
        # Total score
        total_score = warmth_score + open_score + emotion_score + perspective_score + listening_score
        
        return {
            'message': message,
            'warmth': warmth_score,
            'open_questions': open_score,
            'emotion_words': emotion_score,
            'perspective_taking': perspective_score,
            'active_listening': listening_score,
            'total': total_score,
            'sentiment_raw': sentiment['compound']
        }
    
    def add_user_message(self, message, bot_response):
        """Track a user message and bot response."""
        analysis = self.analyze_message(message)
        self.user_messages.append(message)
        self.empathy_scores.append(analysis)
        self.conversation_history.append({
            'user': message,
            'bot': bot_response,
            'empathy': analysis
        })
    
    def get_average_scores(self):
        """Calculate average scores across all messages."""
        if not self.empathy_scores:
            return None
        
        n = len(self.empathy_scores)
        return {
            'warmth': sum(s['warmth'] for s in self.empathy_scores) / n,
            'open_questions': sum(s['open_questions'] for s in self.empathy_scores) / n,
            'emotion_words': sum(s['emotion_words'] for s in self.empathy_scores) / n,
            'perspective_taking': sum(s['perspective_taking'] for s in self.empathy_scores) / n,
            'active_listening': sum(s['active_listening'] for s in self.empathy_scores) / n,
            'total': sum(s['total'] for s in self.empathy_scores) / n,
            'message_count': n
        }
    
    def generate_report(self):
        """Generate comprehensive empathy report."""
        if len(self.empathy_scores) < 10:
            return None
        
        avg = self.get_average_scores()
        total_score = avg['total']
        
        # Interpretation
        if total_score >= 80:
            interpretation = "Excellent - Consistently demonstrates empathic responses"
        elif total_score >= 60:
            interpretation = "Good - Regular empathic responses with room to grow"
        elif total_score >= 40:
            interpretation = "Moderate - Awareness of emotions but inconsistent"
        elif total_score >= 20:
            interpretation = "Developing - Beginning to recognize emotions"
        else:
            interpretation = "Needs Practice - Focus on foundational skills"
        
        # Recommendations
        recommendations = []
        if avg['warmth'] < 15:
            recommendations.append("Use warmer, more supportive language")
        if avg['open_questions'] < 15:
            recommendations.append("Ask more open-ended questions (what/how/why)")
        if avg['emotion_words'] < 15:
            recommendations.append("Acknowledge emotions more explicitly")
        if avg['perspective_taking'] < 15:
            recommendations.append("Practice perspective-taking phrases")
        if avg['active_listening'] < 15:
            recommendations.append("Show more active listening markers")
        
        # Format report
        report = f"""
‚ïî‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïó
‚ïë           EMPATHY TRAINING ANALYSIS REPORT                ‚ïë
‚ïö‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïù

üìä OVERALL EMPATHY SCORE: {total_score:.1f}/100
   {interpretation}

üìà DIMENSION BREAKDOWN:
   ‚Ä¢ Sentiment/Warmth:      {avg['warmth']:.1f}/20 {'‚úÖ' if avg['warmth'] >= 15 else '‚ö†Ô∏è'}
   ‚Ä¢ Open Questions:        {avg['open_questions']:.1f}/20 {'‚úÖ' if avg['open_questions'] >= 15 else '‚ö†Ô∏è'}
   ‚Ä¢ Emotion Recognition:   {avg['emotion_words']:.1f}/20 {'‚úÖ' if avg['emotion_words'] >= 15 else '‚ö†Ô∏è'}
   ‚Ä¢ Perspective-Taking:    {avg['perspective_taking']:.1f}/20 {'‚úÖ' if avg['perspective_taking'] >= 15 else '‚ö†Ô∏è'}
   ‚Ä¢ Active Listening:      {avg['active_listening']:.1f}/20 {'‚úÖ' if avg['active_listening'] >= 15 else '‚ö†Ô∏è'}

üìâ CONVERSATION METRICS:
   ‚Ä¢ Total Messages Analyzed: {avg['message_count']}
   ‚Ä¢ Average Sentiment: {sum(s['sentiment_raw'] for s in self.empathy_scores) / len(self.empathy_scores):.2f} (-1 to +1)
   ‚Ä¢ Questions Asked: {sum(1 for s in self.empathy_scores if s['open_questions'] > 0)}
   ‚Ä¢ Emotion Words Used: {sum(1 for s in self.empathy_scores if s['emotion_words'] > 0)} messages

üí° RECOMMENDATIONS FOR IMPROVEMENT:
"""
        if recommendations:
            for rec in recommendations:
                report += f"   ‚Ä¢ {rec}\n"
        else:
            report += "   ‚Ä¢ Great work! Keep practicing to maintain your skills\n"
        
        report += "\n‚úÖ Report complete - Keep practicing empathic communication!"
        
        return report
    
    def export_to_csv(self):
        """Export conversation data to CSV format."""
        import csv
        from io import StringIO
        
        output = StringIO()
        writer = csv.writer(output)
        
        # Header
        writer.writerow([
            'Message #', 'User Message', 'Bot Response', 
            'Warmth', 'Open Questions', 'Emotion Words', 
            'Perspective-Taking', 'Active Listening', 'Total Score'
        ])
        
        # Data
        for i, conv in enumerate(self.conversation_history, 1):
            emp = conv['empathy']
            writer.writerow([
                i,
                conv['user'],
                conv['bot'],
                f"{emp['warmth']:.1f}",
                f"{emp['open_questions']:.1f}",
                f"{emp['emotion_words']:.1f}",
                f"{emp['perspective_taking']:.1f}",
                f"{emp['active_listening']:.1f}",
                f"{emp['total']:.1f}"
            ])
        
        return output.getvalue()

# Initialize global empathy analyzer
empathy_analyzer = EmpathyAnalyzer()

print("‚úÖ Empathy Analyzer ready!")
print("üìä Tracking 5 dimensions:")
print("   1. Sentiment/Warmth (positive emotional tone)")
print("   2. Open Questions (exploration)")
print("   3. Emotion Recognition (naming feelings)")
print("   4. Perspective-Taking (seeing their view)")
print("   5. Active Listening (engagement)")
print("\nüìù Report will generate after 10 messages")

In [None]:
async def chat_interface(message, history):
    """Handle chat messages with empathy tracking and source citations."""
    # Get answer from AI
    response = await generate_response_async(message, history)
    
    # Add source citations if enabled
    if SHOW_SOURCES and last_sources_used:
        response += "\n\n---\n**üìö Sources:**\n"
        for i, source in enumerate(last_sources_used, 1):
            response += f"{i}. {source}\n"
    
    # Track empathy (user message + bot response)
    empathy_analyzer.add_user_message(message, response)
    
    # Check if we've reached 10 messages - generate report
    message_count = len(empathy_analyzer.user_messages)
    if message_count == 10:
        report = empathy_analyzer.generate_report()
        if report:
            response += "\n\n" + "="*60 + "\n"
            response += report
            response += "\n" + "="*60
            response += "\n\nüíæ **Want to save your data?** Run the export cell below to download as CSV."
    elif message_count < 10:
        # Show progress
        remaining = 10 - message_count
        response += f"\n\n_üìä Empathy tracking: {message_count}/10 messages ({remaining} more for report)_"
    
    return response

# ============================================================================
# STARTER QUESTIONS - OPTIONAL ‚úèÔ∏è
# ============================================================================
# These appear as clickable examples when chat starts
# Change these to match your PDFs and persona

STARTER_QUESTIONS = [
    "What are your main beliefs or values?",
    "How did that experience make you feel?",
    "Tell me more about your perspective on this topic.",
    "You seem passionate about this - what drives that feeling?",
    "From your point of view, what are your greatest achievements?",
]

# Create chat interface
demo = gr.ChatInterface(
    fn=chat_interface,
    title=f"ü§ñ Chat with {PERSONA_NAME} - Empathy Training",
    description=f"""Practice empathic conversation with {PERSONA_NAME}.
    
    üìä **Empathy Assessment Enabled**
    - Your messages are analyzed for empathy markers
    - Report generated after 10 messages
    - Track: warmth, questions, emotions, perspective, listening
    
    {'üìñ Source citations enabled - see which PDFs were used' if SHOW_SOURCES else ''}
    
    ‚è±Ô∏è Response time: 5-15 seconds
    """,
    examples=STARTER_QUESTIONS,
)

# Launch chat
print("=" * 80)
print("üöÄ LAUNCHING EMPATHY TRAINING CHAT")
print("=" * 80)
print("\nüìä EMPATHY ASSESSMENT ACTIVE")
print("   ‚Ä¢ Tracking 5 empathy dimensions")
print("   ‚Ä¢ Report after 10 messages")
print("   ‚Ä¢ CSV export available\n")
print("\n‚ö†Ô∏è  IMPORTANT: Use the PUBLIC LINK below (not Colab interface)\n")
if SHOW_SOURCES:
    print("üìö Sources ON - answers show which PDFs were used\n")
print("üëá COPY THIS LINK AND OPEN IN NEW TAB:\n")

demo.launch(
    share=True,      # Create public link
    inline=False,    # Don't show in Colab (unstable)
    debug=True       # Show errors
)

print("\n" + "=" * 80)
print("‚úÖ Chat is live with empathy tracking!")
print("=" * 80)
print("\nüìå STEPS:")
print("   1. Find 'Running on public URL:' above")
print("   2. Copy the https://xxxxx.gradio.live link")
print("   3. Open in new browser tab")
print("   4. Start chatting empathically!")
print("   5. After 10 messages, view your empathy report")
if SHOW_SOURCES:
    print("   6. Check bottom of answers for sources")
print("\nüí° Link expires after 72 hours of no use\n")

---
## üì• Step 9: Export Conversation Data (Optional)

**Run this after completing your conversation** to download your empathy data as CSV.

This will create a file with:
- All your messages and bot responses
- Empathy scores for each dimension
- Total empathy score per message

You can open this in Excel or Google Sheets for further analysis.

In [None]:
# Export conversation data to CSV
if len(empathy_analyzer.conversation_history) > 0:
    print("üì• Exporting conversation data...\n")
    
    csv_data = empathy_analyzer.export_to_csv()
    
    # Save to file
    from google.colab import files
    import datetime
    
    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"empathy_conversation_{timestamp}.csv"
    
    with open(filename, 'w') as f:
        f.write(csv_data)
    
    print(f"‚úÖ Data exported to: {filename}")
    print(f"üìä Total messages: {len(empathy_analyzer.user_messages)}")
    
    avg_scores = empathy_analyzer.get_average_scores()
    if avg_scores:
        print(f"üìà Average empathy score: {avg_scores['total']:.1f}/100")
    
    print("\nüì• Downloading file...")
    files.download(filename)
    print("‚úÖ Download complete!")
    print("\nüí° You can now open this CSV file in Excel or Google Sheets")
    
else:
    print("‚ö†Ô∏è  No conversation data to export yet!")
    print("üí¨ Have a conversation first, then run this cell")

---
## üîÑ Step 10: Start New Conversation (Optional)

**Run this to practice empathy again** with a fresh conversation.

This will:
- Reset the empathy tracker (0/10 messages)
- Clear previous conversation history
- Launch a new chat interface

**üí° Tip:** Export your current conversation (Step 9) BEFORE running this!

In [None]:
# ============================================================================
# RESET & START NEW CONVERSATION
# ============================================================================

print("üîÑ Resetting empathy tracker...\n")

# Reinitialize empathy analyzer (clears all previous data)
empathy_analyzer = EmpathyAnalyzer()

print("‚úÖ Empathy tracker reset!")
print("   ‚Ä¢ Message counter: 0/10")
print("   ‚Ä¢ Previous conversation cleared")
print("   ‚Ä¢ Ready for fresh practice\n")

# Relaunch chat interface
print("=" * 80)
print("üöÄ LAUNCHING NEW EMPATHY TRAINING CHAT")
print("=" * 80)
print("\nüìä EMPATHY ASSESSMENT ACTIVE")
print("   ‚Ä¢ Tracking 5 empathy dimensions")
print("   ‚Ä¢ Report after 10 messages")
print("   ‚Ä¢ CSV export available\n")
print("\n‚ö†Ô∏è  IMPORTANT: Use the PUBLIC LINK below (not Colab interface)\n")
if SHOW_SOURCES:
    print("üìö Sources ON - answers show which PDFs were used\n")
print("üëá COPY THIS LINK AND OPEN IN NEW TAB:\n")

demo.launch(
    share=True,      # Create public link
    inline=False,    # Don't show in Colab (unstable)
    debug=True       # Show errors
)

print("\n" + "=" * 80)
print("‚úÖ New conversation started!")
print("=" * 80)
print("\nüìå STEPS:")
print("   1. Find 'Running on public URL:' above")
print("   2. Copy the https://xxxxx.gradio.live link")
print("   3. Open in new browser tab")
print("   4. Start your new empathy practice!")
print("\nüí° Remember: Export your previous conversation first if you haven't already\n")

---
## üîß Troubleshooting

### API Key Issues:
- **Error: "Invalid API key"**
  - Get a new key from: https://aistudio.google.com/app/apikey
  - Make sure you copied the entire key
  - Replace `YOUR_API_KEY_HERE` in Step 4

### PDF Issues:
- **"File not found" errors:**
  - Check that Google Drive is mounted (Step 3)
  - Verify PDF file paths are correct
  - Make sure paths start with `/content/drive/MyDrive/`
  
- **"No text extracted":**
  - PDF might be scanned images (not searchable text)
  - PDF might be password-protected
  - Try opening the PDF to verify it has selectable text

### Response Issues:
- **Responses don't match persona:**
  - Make `PERSONA_DESCRIPTION` more detailed and specific
  - Add more example phrases/words they use
  
- **Responses aren't relevant:**
  - Increase `NUM_RETRIEVED_DOCS` (try 5 or 7)
  - Make sure PDFs contain information about the topic
  - Ask more specific questions

### Performance Issues:
- **Colab disconnects or times out:**
  - This is normal for free Colab after ~12 hours
  - Save your work and restart
  - Keep the browser tab active

### Need More Help?
- Check Google Gemini API status: https://status.cloud.google.com/
- Verify free tier limits haven't been exceeded

---
## üéì Understanding the Technology

### What is RAG (Retrieval-Augmented Generation)?
RAG combines two technologies:
1. **Retrieval**: Searching documents for relevant information
2. **Generation**: Using AI to create natural responses

### How This Notebook Works:
1. **PDFs ‚Üí Text**: Extract text from your PDF files
2. **Text ‚Üí Chunks**: Split into smaller, searchable pieces
3. **Chunks ‚Üí Embeddings**: Convert to numerical representations
4. **Store in Database**: Save in ChromaDB for fast searching
5. **User Asks Question**: You type a question
6. **Search Database**: Find most relevant chunks
7. **AI Generates Answer**: Gemini creates response using context
8. **Apply Persona**: Format response in persona's style

### Why This Approach?
- ‚úÖ **Accurate**: Responses based on your actual documents
- ‚úÖ **Up-to-date**: Use any current information in PDFs
- ‚úÖ **Customizable**: Change persona, style, and content easily
- ‚úÖ **Free**: No paid services required
- ‚úÖ **Educational**: Students learn modern AI techniques

### Technologies Used:
- **Google Gemini**: Free AI language model
- **ChromaDB**: Vector database for semantic search
- **Sentence Transformers**: Convert text to embeddings
- **Gradio**: Create chat interface
- **PyPDF**: Read PDF files

---
## üöÄ Next Steps & Future Features

### Ideas for Enhancement:
1. **Add more document types**: Support Word docs, web pages, etc.
2. **Conversation memory**: Remember previous questions in the chat
3. **Source citations**: Show which PDF the answer came from
4. **Multiple personas**: Switch between different personalities
5. **Voice input/output**: Add speech recognition and text-to-speech
6. **Fact-checking mode**: Verify claims against documents
7. **Export conversations**: Save chat history
8. **Advanced search**: Filter by document, date, topic, etc.

### Learning Resources:
- Google Gemini API Docs: https://ai.google.dev/docs
- ChromaDB Documentation: https://docs.trychroma.com/
- Gradio Documentation: https://www.gradio.app/docs/
- RAG Overview: https://python.langchain.com/docs/use_cases/question_answering/

---

**Happy Chatting! üéâ**

*Created for educational purposes. Completely free and customizable.*