# CrewAI Agentic Email Classification with Kaggle Dataset

This notebook demonstrates how to use CrewAI agents with the Kaggle spam email classification dataset.

## Overview
- Download and prepare the Kaggle spam email dataset
- Train a machine learning model for baseline classification
- Set up CrewAI agents for intelligent email processing
- Compare ML predictions with AI agent analysis
- Demonstrate the complete email automation pipeline

## 1. Setup and Imports

In [None]:
# Import required libraries
import os
import sys
import pandas as pd
import numpy as np
from pathlib import Path

# Add the src directory to the path
sys.path.append('src')

# Import our custom modules
from mail_agents.data import DataDownloader
from mail_agents.model import SpamClassifier
from mail_agents.agents import EmailAgents
from mail_agents.config import settings

print("✅ All imports successful!")
print(f"Working directory: {os.getcwd()}")

## 2. Download and Explore the Kaggle Dataset

First, let's download the spam email classification dataset from Kaggle using our built-in downloader.

In [None]:
# Initialize the data downloader
downloader = DataDownloader()

# Download the dataset (this will use kagglehub internally)
print("📥 Downloading Kaggle spam email dataset...")
try:
    data_path = downloader.download_dataset()
    print(f"✅ Dataset downloaded to: {data_path}")
except Exception as e:
    print(f"❌ Error downloading dataset: {e}")
    print("Note: Make sure you have kagglehub configured and authenticated.")

In [None]:
# Load and explore the dataset
print("📊 Loading and exploring the dataset...")
try:
    # Prepare the data using our utility function
    X, y = downloader.prepare_data()
    
    print(f"Dataset shape: {len(X)} emails")
    print(f"Labels distribution:")
    print(y.value_counts())
    
    # Show some sample emails
    print("\n📧 Sample emails:")
    for i in range(3):
        label = "SPAM" if y.iloc[i] == 1 else "HAM"
        email_preview = X.iloc[i][:100] + "..." if len(X.iloc[i]) > 100 else X.iloc[i]
        print(f"\n[{label}] {email_preview}")
        
except Exception as e:
    print(f"❌ Error loading dataset: {e}")
    # For demo purposes, let's create some sample data
    print("🔧 Creating sample data for demonstration...")
    sample_emails = [
        "Congratulations! You've won a $1000 gift card! Click here to claim now!",
        "Hi John, are you free for lunch tomorrow? Let me know!",
        "URGENT: Your account will be suspended unless you verify immediately!",
        "Meeting reminder: Team standup at 10am tomorrow in conference room A",
        "FREE MONEY! No strings attached! Limited time offer!",
        "Thanks for the document. I'll review it and get back to you by Friday."
    ]
    sample_labels = [1, 0, 1, 0, 1, 0]  # 1=spam, 0=ham
    
    X = pd.Series(sample_emails)
    y = pd.Series(sample_labels)
    print(f"✅ Sample dataset created with {len(X)} emails")

## 3. Train the Machine Learning Model

Let's train our spam classification model using the dataset.

In [None]:
# Initialize and train the spam classifier
print("🤖 Training the machine learning model...")

classifier = SpamClassifier(
    model_path=settings.model_path,
    vectorizer_path=settings.vectorizer_path
)

try:
    # Train the model
    metrics = classifier.train(X, y, test_size=0.2, random_state=42)
    
    print("✅ Model training completed!")
    print(f"📊 Model Performance:")
    for metric, value in metrics.items():
        if isinstance(value, float):
            print(f"  {metric}: {value:.4f}")
        else:
            print(f"  {metric}: {value}")
    
    # Save the model
    classifier.save()
    print("💾 Model saved successfully!")
    
except Exception as e:
    print(f"❌ Error training model: {e}")
    print("Note: This might happen with very small datasets. The model needs more diverse data.")

## 4. Set Up CrewAI Agents

Now let's initialize our CrewAI agents for intelligent email processing.

In [None]:
# Initialize the email agents
print("🤖 Setting up CrewAI agents...")

try:
    # Make sure we have the required environment variables
    if not os.getenv('OPENAI_API_KEY'):
        print("⚠️  Warning: OPENAI_API_KEY not found in environment variables.")
        print("   Please set it to use OpenAI-powered agents.")
        print("   For demo purposes, we'll continue with limited functionality.")
    
    # Initialize email agents
    email_agents = EmailAgents()
    print("✅ Email agents initialized successfully!")
    
    # Display available agent capabilities
    print("\n🎯 Available Agent Capabilities:")
    print("  1. Spam Classification - Intelligent email classification with context")
    print("  2. Information Extraction - Extract key details from emails")
    print("  3. Response Drafting - Generate appropriate email responses")
    print("  4. Complete Pipeline - End-to-end email processing")
    
except Exception as e:
    print(f"❌ Error initializing agents: {e}")
    email_agents = None

## 5. Test Individual Predictions

Let's test both the ML model and CrewAI agents on sample emails.

In [None]:
# Test emails for demonstration
test_emails = [
    "Congratulations! You have won $10,000! Click here to claim your prize now!",
    "Hi Sarah, can you please send me the quarterly report by end of day? Thanks!",
    "URGENT: Your PayPal account has been compromised. Verify immediately or lose access!",
    "Meeting reminder: Project review scheduled for tomorrow at 2 PM in Room 301"
]

print("🧪 Testing email classifications...\n")

for i, email in enumerate(test_emails, 1):
    print(f"📧 Test Email #{i}:")
    print(f"   Text: {email}")
    print("   " + "="*50)
    
    # ML Model Prediction
    try:
        if classifier.model is not None:
            ml_predictions = classifier.predict([email])
            ml_result = ml_predictions[0]
            print(f"   🤖 ML Model: {ml_result['prediction']} (confidence: {ml_result['confidence']:.3f})")
        else:
            print("   🤖 ML Model: Not available (training failed)")
    except Exception as e:
        print(f"   🤖 ML Model: Error - {e}")
    
    # CrewAI Agent Analysis (only if properly configured)
    try:
        if email_agents and os.getenv('OPENAI_API_KEY'):
            agent_result = email_agents.classify_email(email)
            print(f"   🎯 CrewAI Agent: {agent_result[:100]}...")
        else:
            print("   🎯 CrewAI Agent: Requires OpenAI API key configuration")
    except Exception as e:
        print(f"   🎯 CrewAI Agent: Error - {e}")
    
    print("\n")

## 6. Advanced Agent Features

Let's explore the advanced capabilities of our CrewAI agents.

In [None]:
# Demonstrate information extraction
business_email = """
Subject: Q4 Budget Review Meeting

Hi team,

I hope this email finds you well. I wanted to schedule our quarterly budget review meeting for next week.

Proposed details:
- Date: March 15th, 2024
- Time: 2:00 PM - 4:00 PM EST
- Location: Conference Room B (or Zoom if remote)
- Attendees: Finance team, Department heads

Please review the attached budget reports before the meeting and come prepared with your department's projections.

Let me know if this time works for everyone.

Best regards,
John Smith
Finance Director
"""

print("📊 Testing Information Extraction...\n")
print(f"📧 Business Email:\n{business_email}")
print("\n" + "="*60)

if email_agents and os.getenv('OPENAI_API_KEY'):
    try:
        extracted_info = email_agents.extract_information(business_email)
        print(f"\n🔍 Extracted Information:\n{extracted_info}")
    except Exception as e:
        print(f"❌ Error extracting information: {e}")
else:
    print("\n⚠️  Information extraction requires OpenAI API key configuration")
    print("\n🔍 Expected Information Types:")
    print("   - Meeting date and time")
    print("   - Location details")
    print("   - Attendee list")
    print("   - Action items")
    print("   - Sender information")

In [None]:
# Demonstrate response drafting
customer_inquiry = """
Subject: Product Return Request

Hello,

I purchased a laptop from your store last week (Order #12345), but unfortunately it arrived with a cracked screen. 
I would like to return it and get a refund or replacement.

Please let me know the next steps.

Thank you,
Alice Johnson
alice.johnson@email.com
"""

print("✍️  Testing Response Drafting...\n")
print(f"📧 Customer Inquiry:\n{customer_inquiry}")
print("\n" + "="*60)

if email_agents and os.getenv('OPENAI_API_KEY'):
    try:
        drafted_response = email_agents.draft_response(
            customer_inquiry, 
            context="Customer service representative responding to product return request"
        )
        print(f"\n✍️  Drafted Response:\n{drafted_response}")
    except Exception as e:
        print(f"❌ Error drafting response: {e}")
else:
    print("\n⚠️  Response drafting requires OpenAI API key configuration")
    print("\n✍️  Expected Response Elements:")
    print("   - Professional greeting")
    print("   - Acknowledgment of issue")
    print("   - Clear next steps")
    print("   - Contact information")
    print("   - Professional closing")

## 7. Complete Pipeline Demonstration

Finally, let's run the complete email processing pipeline that combines all agents.

In [None]:
# Test the complete pipeline
complex_email = """
Subject: Partnership Proposal - Tech Solutions Inc.

Dear Business Development Team,

I hope this message finds you well. My name is Michael Chen, and I'm the VP of Partnerships at Tech Solutions Inc.

We've been following your company's impressive growth in the AI space and believe there's a strong opportunity for collaboration.

Our proposal:
- Joint development of AI-powered customer service tools
- Revenue sharing: 60/40 split
- Initial investment: $500,000
- Timeline: 6-month pilot program

I'd love to schedule a call next week to discuss this further. Are you available Tuesday or Wednesday afternoon?

Please find our company portfolio attached for your review.

Looking forward to hearing from you.

Best regards,
Michael Chen
VP of Partnerships
Tech Solutions Inc.
michael.chen@techsolutions.com
(555) 123-4567
"""

print("🔄 Testing Complete Email Processing Pipeline...\n")
print(f"📧 Complex Business Email:\n{complex_email}")
print("\n" + "="*70)

if email_agents and os.getenv('OPENAI_API_KEY'):
    try:
        pipeline_result = email_agents.process_pipeline(complex_email)
        print(f"\n🔄 Complete Pipeline Analysis:\n{pipeline_result}")
    except Exception as e:
        print(f"❌ Error in pipeline processing: {e}")
else:
    print("\n⚠️  Complete pipeline requires OpenAI API key configuration")
    print("\n🔄 Expected Pipeline Analysis:")
    print("   1. Email Classification (spam/ham with reasoning)")
    print("   2. Information Extraction (key details, contacts, proposals)")
    print("   3. Sentiment Analysis and Priority Assessment")
    print("   4. Recommended Actions and Response Strategy")
    print("   5. Draft Response or Escalation Recommendations")

## 8. Using the CLI Interface

You can also use the command-line interface for batch processing and automation.

In [None]:
# Demonstrate CLI usage (these commands can be run in the terminal)
print("🖥️  Command Line Interface Usage:")
print("\n" + "="*50)

cli_commands = [
    "# Download the Kaggle dataset",
    "uv run python -m src.mail_agents.cli download",
    "",
    "# Train the ML model",
    "uv run python -m src.mail_agents.cli train",
    "",
    "# Evaluate the model",
    "uv run python -m src.mail_agents.cli eval",
    "",
    "# Classify a single email",
    'uv run python -m src.mail_agents.cli predict "Your email text here"',
    "",
    "# Start the API server",
    "uv run python -m src.mail_agents.cli run-api",
    "",
    "# Then you can use the API endpoints:",
    "# POST http://localhost:8000/classify",
    "# POST http://localhost:8000/extract",
    "# POST http://localhost:8000/draft",
    "# POST http://localhost:8000/pipeline"
]

for cmd in cli_commands:
    if cmd.startswith('#'):
        print(f"\n{cmd}")
    else:
        print(f"  {cmd}")

## 9. Setup Instructions

To get the most out of this system, follow these setup steps:

In [None]:
# Setup checklist
print("📋 Setup Checklist for Full Functionality:")
print("\n" + "="*50)

setup_steps = [
    "1. 🔑 Set up OpenAI API Key:",
    "   export OPENAI_API_KEY='your-api-key-here'",
    "   (Add to your .env file for persistence)",
    "",
    "2. 📦 Configure Kaggle API (optional, for automatic downloads):",
    "   - Download kaggle.json from your Kaggle account",
    "   - Place it in ~/.kaggle/kaggle.json",
    "   - chmod 600 ~/.kaggle/kaggle.json",
    "",
    "3. 🎯 Customize Agent Configurations:",
    "   - Edit config/agents.yaml for agent roles and behaviors",
    "   - Edit config/tasks.yaml for task descriptions",
    "",
    "4. 🚀 Run the Full Pipeline:",
    "   uv run python -m src.mail_agents.cli download",
    "   uv run python -m src.mail_agents.cli train",
    "   uv run python -m src.mail_agents.cli run-api",
    "",
    "5. 🌐 Access the API:",
    "   http://localhost:8000/docs (Swagger UI)",
    "   http://localhost:8000/health (Health check)"
]

for step in setup_steps:
    print(step)

# Check current environment status
print("\n" + "="*50)
print("🔍 Current Environment Status:")
print(f"   OpenAI API Key: {'✅ Set' if os.getenv('OPENAI_API_KEY') else '❌ Not set'}")
print(f"   Kaggle Config: {'✅ Available' if Path.home().joinpath('.kaggle', 'kaggle.json').exists() else '❌ Not found'}")
print(f"   Model Trained: {'✅ Yes' if settings.model_path.exists() else '❌ No'}")
print(f"   Config Files: {'✅ Found' if settings.config_dir.exists() else '❌ Missing'}")

## Conclusion

This notebook demonstrated how to:

1. **Download and prepare** the Kaggle spam email classification dataset
2. **Train a machine learning model** for baseline spam detection
3. **Set up CrewAI agents** for intelligent email processing
4. **Compare ML predictions** with AI agent analysis
5. **Use advanced features** like information extraction and response drafting
6. **Run the complete pipeline** for end-to-end email automation

The system combines the speed and accuracy of traditional ML models with the intelligence and context-awareness of AI agents, providing a powerful solution for email automation.

### Next Steps:
- Configure your OpenAI API key for full functionality
- Experiment with different email types and scenarios
- Customize agent roles and behaviors in the config files
- Deploy the API server for production use
- Integrate with email clients or automation systems