# StarCoder Multi-Agent System - Basic Tutorial

This notebook provides a hands-on introduction to the StarCoder Multi-Agent System. You'll learn how to:

1. Set up and start the system
2. Generate Python code using AI agents
3. Review and analyze generated code
4. Understand the workflow and results

## Prerequisites

Before running this notebook, make sure you have:

- Docker and Docker Compose installed
- NVIDIA GPU with CUDA support
- Python 3.11+ with required packages
- StarCoder model running locally

## Setup Instructions

1. **Start StarCoder Service:**
   ```bash
   cd ../local_models
   docker-compose up -d starcoder-chat
   ```

2. **Start Agent Services:**
   ```bash
   cd ../day_07
   docker-compose up -d
   ```

3. **Verify Services:**
   ```bash
   make health
   ```

Let's begin!


In [None]:
# Import required libraries
import asyncio
import sys
import json
from pathlib import Path
import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime

# Add project root to Python path
sys.path.insert(0, str(Path.cwd().parent))

from orchestrator import process_simple_task
from communication.message_schema import OrchestratorRequest

print("✅ Libraries imported successfully!")
print(f"📁 Working directory: {Path.cwd()}")
print(f"🐍 Python version: {sys.version}")


## 1. Basic Code Generation

Let's start with a simple example - generating a function to calculate Fibonacci numbers.


In [None]:
# Define our first task
task_description = "Create a function to calculate the nth Fibonacci number"

print(f"📝 Task: {task_description}")
print("⏳ Generating code...")

# Process the task
result = await process_simple_task(
    task_description=task_description,
    language="python",
    requirements=["Include type hints", "Handle edge cases"]
)

if result.success:
    print("✅ Code generated successfully!")
    print(f"⏱️  Workflow time: {result.workflow_time:.2f}s")
    print(f"📊 Code quality score: {result.review_result.code_quality_score}/10")
else:
    print(f"❌ Task failed: {result.error_message}")


In [None]:
# Display the generated code
if result.success:
    print("📄 Generated Code:")
    print("=" * 50)
    print(result.generation_result.generated_code)
    
    print("\n🧪 Generated Tests:")
    print("=" * 50)
    print(result.generation_result.tests)
    
    print("\n📊 Metadata:")
    print("=" * 50)
    metadata = result.generation_result.metadata
    print(f"• Complexity: {metadata.complexity}")
    print(f"• Lines of code: {metadata.lines_of_code}")
    print(f"• Estimated time: {metadata.estimated_time}")
    print(f"• Dependencies: {', '.join(metadata.dependencies)}")


In [None]:
# Display review results
if result.success:
    print("🔍 Code Review Results:")
    print("=" * 50)
    print(f"Overall Quality Score: {result.review_result.code_quality_score}/10")
    
    print("\n📊 Detailed Metrics:")
    metrics = result.review_result.metrics
    for key, value in metrics.items():
        print(f"• {key}: {value}")
    
    print("\n⚠️  Issues Found:")
    for issue in result.review_result.issues:
        print(f"• {issue}")
    
    print("\n💡 Recommendations:")
    for rec in result.review_result.recommendations:
        print(f"• {rec}")


## 2. Testing Generated Code

Let's test the generated code to make sure it works correctly.


In [None]:
# Execute the generated code
if result.success:
    # Extract and execute the generated function
    exec(result.generation_result.generated_code)
    
    # Test the function
    print("🧪 Testing the generated function:")
    print("=" * 50)
    
    # Test cases
    test_cases = [0, 1, 5, 10, 15]
    
    for n in test_cases:
        try:
            fib_result = fibonacci(n)
            print(f"fibonacci({n}) = {fib_result}")
        except Exception as e:
            print(f"fibonacci({n}) = Error: {e}")
    
    print("\n✅ Function executed successfully!")
else:
    print("❌ Cannot test - code generation failed")


## 3. Custom Requirements

Now let's try a more complex task with specific requirements.


In [None]:
# Define a more complex task
complex_task = "Create a REST API client for a weather service"

requirements = [
    "Use httpx library for HTTP requests",
    "Implement exponential backoff for retries",
    "Add proper error handling for HTTP errors",
    "Include type hints for all functions",
    "Add logging for debugging",
    "Handle rate limiting gracefully",
    "Return structured data (Pydantic models)"
]

print(f"📝 Complex Task: {complex_task}")
print("\n📋 Requirements:")
for i, req in enumerate(requirements, 1):
    print(f"   {i}. {req}")

print("\n⏳ Generating code...")

# Process the complex task
complex_result = await process_simple_task(
    task_description=complex_task,
    language="python",
    requirements=requirements
)

if complex_result.success:
    print("✅ Complex code generated successfully!")
    print(f"⏱️  Workflow time: {complex_result.workflow_time:.2f}s")
    print(f"📊 Code quality score: {complex_result.review_result.code_quality_score}/10")
else:
    print(f"❌ Complex task failed: {complex_result.error_message}")


In [None]:
# Display the complex generated code
if complex_result.success:
    print("📄 Generated Complex Code:")
    print("=" * 50)
    print(complex_result.generation_result.generated_code)
    
    print("\n🧪 Generated Tests:")
    print("=" * 50)
    print(complex_result.generation_result.tests)
    
    print("\n📊 Complex Code Metadata:")
    print("=" * 50)
    metadata = complex_result.generation_result.metadata
    print(f"• Complexity: {metadata.complexity}")
    print(f"• Lines of code: {metadata.lines_of_code}")
    print(f"• Estimated time: {metadata.estimated_time}")
    print(f"• Dependencies: {', '.join(metadata.dependencies)}")


## 4. Performance Analysis

Let's analyze the performance of our code generation tasks.


In [None]:
# Create performance comparison
if result.success and complex_result.success:
    # Prepare data for visualization
    tasks_data = [
        {
            "Task": "Simple Fibonacci",
            "Workflow Time": result.workflow_time,
            "Quality Score": result.review_result.code_quality_score,
            "Lines of Code": result.generation_result.metadata.lines_of_code,
            "Complexity": result.generation_result.metadata.complexity
        },
        {
            "Task": "Complex Weather API",
            "Workflow Time": complex_result.workflow_time,
            "Quality Score": complex_result.review_result.code_quality_score,
            "Lines of Code": complex_result.generation_result.metadata.lines_of_code,
            "Complexity": complex_result.generation_result.metadata.complexity
        }
    ]
    
    df = pd.DataFrame(tasks_data)
    
    print("📊 Performance Comparison:")
    print("=" * 50)
    print(df.to_string(index=False))
    
    # Create visualizations
    fig, axes = plt.subplots(2, 2, figsize=(12, 8))
    fig.suptitle('Code Generation Performance Analysis', fontsize=16)
    
    # Workflow time comparison
    axes[0, 0].bar(df['Task'], df['Workflow Time'], color=['skyblue', 'lightcoral'])
    axes[0, 0].set_title('Workflow Time (seconds)')
    axes[0, 0].set_ylabel('Time (s)')
    
    # Quality score comparison
    axes[0, 1].bar(df['Task'], df['Quality Score'], color=['lightgreen', 'gold'])
    axes[0, 1].set_title('Code Quality Score')
    axes[0, 1].set_ylabel('Score (/10)')
    axes[0, 1].set_ylim(0, 10)
    
    # Lines of code comparison
    axes[1, 0].bar(df['Task'], df['Lines of Code'], color=['plum', 'orange'])
    axes[1, 0].set_title('Lines of Code Generated')
    axes[1, 0].set_ylabel('Lines')
    
    # Complexity comparison
    complexity_map = {'low': 1, 'medium': 2, 'high': 3}
    df['Complexity Numeric'] = df['Complexity'].map(complexity_map)
    axes[1, 1].bar(df['Task'], df['Complexity Numeric'], color=['lightblue', 'darkblue'])
    axes[1, 1].set_title('Code Complexity')
    axes[1, 1].set_ylabel('Complexity Level')
    axes[1, 1].set_yticks([1, 2, 3])
    axes[1, 1].set_yticklabels(['Low', 'Medium', 'High'])
    
    plt.tight_layout()
    plt.show()
    
    print("\n📈 Key Insights:")
    print("=" * 50)
    print(f"• Simple task completed in {result.workflow_time:.2f}s")
    print(f"• Complex task completed in {complex_result.workflow_time:.2f}s")
    print(f"• Quality scores: {result.review_result.code_quality_score:.1f} vs {complex_result.review_result.code_quality_score:.1f}")
    print(f"• Code complexity: {result.generation_result.metadata.complexity} vs {complex_result.generation_result.metadata.complexity}")
else:
    print("❌ Cannot analyze - some tasks failed")


In [None]:
# Save results to JSON file
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
results_file = f"tutorial_basic_results_{timestamp}.json"

results_data = {
    "timestamp": timestamp,
    "simple_task": {
        "description": task_description,
        "success": result.success,
        "workflow_time": result.workflow_time if result.success else 0,
        "quality_score": result.review_result.code_quality_score if result.success else 0,
        "generated_code": result.generation_result.generated_code if result.success else None,
        "tests": result.generation_result.tests if result.success else None,
        "metadata": result.generation_result.metadata.__dict__ if result.success else None
    },
    "complex_task": {
        "description": complex_task,
        "requirements": requirements,
        "success": complex_result.success,
        "workflow_time": complex_result.workflow_time if complex_result.success else 0,
        "quality_score": complex_result.review_result.code_quality_score if complex_result.success else 0,
        "generated_code": complex_result.generation_result.generated_code if complex_result.success else None,
        "tests": complex_result.generation_result.tests if complex_result.success else None,
        "metadata": complex_result.generation_result.metadata.__dict__ if complex_result.success else None
    }
}

with open(results_file, 'w') as f:
    json.dump(results_data, f, indent=2, default=str)

print(f"💾 Results saved to: {results_file}")
print(f"📁 File size: {Path(results_file).stat().st_size} bytes")

# Display summary
print("\n📋 Summary:")
print("=" * 50)
print(f"• Simple task: {'✅ Success' if result.success else '❌ Failed'}")
print(f"• Complex task: {'✅ Success' if complex_result.success else '❌ Failed'}")
print(f"• Total workflow time: {(result.workflow_time if result.success else 0) + (complex_result.workflow_time if complex_result.success else 0):.2f}s")
print(f"• Average quality score: {((result.review_result.code_quality_score if result.success else 0) + (complex_result.review_result.code_quality_score if complex_result.success else 0)) / 2:.1f}/10")
