# End-to-End Demo - Educational Tutor Agent System

This notebook demonstrates a complete workflow of the multi-agent educational tutoring system:

1. **TutorAgent**: Generates structured explanation of "Bayes theorem"
2. **QuizAgent**: Creates quiz questions to test understanding
3. **EvaluatorAgent**: Evaluates the quality of the explanation using LLM-as-judge
4. **Memory Store**: Saves all results for future reference and adaptive learning

**Topic**: Bayes theorem (intermediate level)

In [47]:
# Setup path to import from src directory
import sys
from pathlib import Path

# Import framework and agents
from src.agent_framework import Coordinator
from src.agents import TutorAgent, QuizAgent, EvaluatorAgent
from src.memory import MemoryStore
from src.observability import log_event, get_metrics
import json
import os
from dotenv import load_dotenv

In [48]:
# Add project root to path
project_root = Path().resolve().parent if Path().resolve().name == 'notebooks' else Path().resolve()
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

In [49]:
# Load environment variables
env_path = Path(__file__).parent.parent / '.env' if '__file__' in globals() else Path('.env')
print(f"Environment Path: {env_path}")
load_dotenv(dotenv_path=env_path, verbose=True, encoding='utf-16')

Environment Path: .env


True

In [50]:
print("=" * 60)
print("EDUCATIONAL TUTOR AGENT - END-TO-END DEMO")
print("=" * 60)
print(f"\n✓ Project root: {project_root}")
print(f"✓ GEMINI_API_KEY configured: {os.getenv('GEMINI_API_KEY') is not None}")
print(f"✓ All modules imported successfully\n")

EDUCATIONAL TUTOR AGENT - END-TO-END DEMO

✓ Project root: C:\Users\punno\Documents\GitHub\google-agent-intensive-capstone-project
✓ GEMINI_API_KEY configured: True
✓ All modules imported successfully



## System Architecture

The system uses a multi-agent architecture:

- **Coordinator**: Manages agent-to-agent communication and message routing
- **TutorAgent**: LLM-powered tutor that generates explanations and educational content
- **QuizAgent**: Generates quiz questions and grades student answers
- **EvaluatorAgent**: LLM-as-judge that evaluates content quality
- **MemoryStore**: Persistent storage for user progress and session data

All agents communicate through the Coordinator using structured messages.

In [51]:
# Create coordinator with observability hook
def observability_hook(agent_name, event_type, payload):
    """Hook for logging agent events."""
    log_event(agent_name, event_type, payload)

In [52]:
coordinator = Coordinator(timeout=60.0, observability_hook=observability_hook)

# Create memory store (shared across agents)
memory_store = MemoryStore(storage_path=str(project_root / "data" / "memory_store.json"))

# Create and register agents
# Using gemini-1.5-flash for all agents (reliable and cost-effective)
print("Initializing agents...")
tutor_agent = TutorAgent(name="tutor_agent", model_name="gemini-2.5-flash-lite")
quiz_agent = QuizAgent(name="quiz_agent", model_name="gemini-2.5-flash-lite", memory_store=memory_store)
evaluator_agent = EvaluatorAgent(name="evaluator_agent", model_name="gemini-2.5-flash-lite")

coordinator.register(tutor_agent)
coordinator.register(quiz_agent)
coordinator.register(evaluator_agent)

print("✓ Coordinator created")
print("✓ Agents registered:")
for agent_name in coordinator.list_agents():
    print(f"  - {agent_name}")
print()

[32m2025-11-23 16:26:25.631[0m | [1mINFO    [0m | [36msrc.observability[0m:[36mlog_event[0m:[36m82[0m - [1m[tutor_agent] registered: {'agent_name': 'tutor_agent'}[0m
[32m2025-11-23 16:26:25.635[0m | [1mINFO    [0m | [36msrc.observability[0m:[36mlog_event[0m:[36m82[0m - [1m[quiz_agent] registered: {'agent_name': 'quiz_agent'}[0m
[32m2025-11-23 16:26:25.637[0m | [1mINFO    [0m | [36msrc.observability[0m:[36mlog_event[0m:[36m82[0m - [1m[evaluator_agent] registered: {'agent_name': 'evaluator_agent'}[0m


Initializing agents...
✓ Coordinator created
✓ Agents registered:
  - tutor_agent
  - quiz_agent
  - evaluator_agent



## Step 1: Generate Explanation

We'll ask the TutorAgent to explain "Bayes theorem" at intermediate level.
The agent will provide a structured explanation with summary, step-by-step breakdown,
examples, key equations, and potential difficulties.

In [53]:
# Define the topic for this demo
TOPIC = "Bayes theorem"
LEVEL = "intermediate"
USER_ID = "demo_user"

print("=" * 60)
print("STEP 1: EXPLANATION GENERATION")
print("=" * 60)
print(f"\nTopic: {TOPIC}")
print(f"Level: {LEVEL}\n")

# Request explanation from TutorAgent
explain_message = {
    "action": "explain",
    "payload": {
        "topic": TOPIC,
        "level": LEVEL
    },
    "request_id": "demo-001"
}

STEP 1: EXPLANATION GENERATION

Topic: Bayes theorem
Level: intermediate



In [None]:
print("Requesting explanation from TutorAgent...")
try:
    explain_response = coordinator.send(
        from_agent="tutor_agent",
        to_agent="tutor_agent",
        message=explain_message
    )

    if explain_response["status"] == "ok":
        explanation = explain_response["payload"]
        print("✓ Explanation received successfully\n")
        
        print("SUMMARY:")
        print("-" * 60)
        print(explanation.get('summary', 'N/A'))
        print()
        
        print("STEP-BY-STEP BREAKDOWN:")
        print("-" * 60)
        for i, step in enumerate(explanation.get('step_by_step', [])[:5], 1):
            print(f"{i}. {step}")
        print()
        
        if explanation.get('key_equations'):
            print("KEY EQUATIONS:")
            print("-" * 60)
            for eq in explanation.get('key_equations', []):
                print(f"  • {eq}")
            print()
        
        if explanation.get('examples'):
            print("EXAMPLES:")
            print("-" * 60)
            for i, example in enumerate(explanation.get('examples', [])[:2], 1):
                print(f"\nExample {i}: {example.get('title', 'N/A')}")
                print(f"  {example.get('description', 'N/A')}")
            print()
        
        print(f"Confidence: {explanation.get('confidence', 'N/A')}")
    else:
        error_info = explain_response.get('payload', {})
        error_msg = error_info.get('error', 'Unknown error')
        print(f"✗ Error: {error_info}")
        
        # Check for truncation errors
        if 'truncated' in error_msg.lower() or 'unclosed braces' in error_msg.lower():
            print("\n⚠️  Response appears to be truncated.")
            print("   This usually means the response exceeded max_tokens.")
            print("   The system will automatically retry with error handling.")
            print("   If this persists, try a simpler topic or increase max_tokens.")
        
        explanation = None
except Exception as e:
    print(f"✗ Exception occurred: {e}")
    import traceback
    traceback.print_exc()
    explanation = None

[32m2025-11-23 16:55:53.250[0m | [1mINFO    [0m | [36msrc.observability[0m:[36mlog_event[0m:[36m82[0m - [1m[tutor_agent] message_sent: {'to_agent': 'tutor_agent', 'action': 'explain', 'request_id': 'demo-001'}[0m


Requesting explanation from TutorAgent...


[32m2025-11-23 16:55:58.822[0m | [1mINFO    [0m | [36msrc.observability[0m:[36mlog_event[0m:[36m82[0m - [1m[tutor_agent] message_handled: {'from_agent': 'tutor_agent', 'action': 'explain', 'request_id': 'demo-001', 'elapsed_time': 5.5689332485198975}[0m


Test explain response:  {
  "status": "error",
  "payload": {
    "error": "Failed to parse JSON from response: Expecting ',' delimiter: line 16 column 10 (char 2180)\nResponse: {\n    \"summary\": \"Bayes' Theorem is a fundamental concept in probability theory that describes how to update the probability of a hypothesis based on new evidence. It allows us to revise our beliefs i"
  },
  "request_id": "demo-001",
  "meta": {
    "agent": "tutor_agent",
    "action": "explain",
    "elapsed_time": 5.5689332485198975,
    "from_agent": "tutor_agent"
  }
}
✗ Error: {'error': 'Failed to parse JSON from response: Expecting \',\' delimiter: line 16 column 10 (char 2180)\nResponse: {\n    "summary": "Bayes\' Theorem is a fundamental concept in probability theory that describes how to update the probability of a hypothesis based on new evidence. It allows us to revise our beliefs i'}


## Step 2: Generate Quiz Questions

After receiving the explanation, we'll ask the QuizAgent to generate quiz questions
on the same topic. This tests the student's understanding of the material.

In [55]:
if explanation:
    print("\n" + "=" * 60)
    print("STEP 2: QUIZ GENERATION")
    print("=" * 60)
    print()
    
    # Generate quiz based on the explanation
    quiz_message = {
        "action": "generate_quiz",
        "payload": {
            "topic": TOPIC,
            "difficulty": LEVEL,
            "n_questions": 5
        },
        "request_id": "demo-002"
    }
    
    print("Requesting quiz generation from QuizAgent...")
    quiz_response = coordinator.send(
        from_agent="tutor_agent",
        to_agent="quiz_agent",
        message=quiz_message
    )
    
    if quiz_response["status"] == "ok":
        quiz_data = quiz_response["payload"]
        questions = quiz_data.get("questions", [])
        quiz_id = quiz_data.get("quiz_id", "unknown")
        
        print(f"✓ Quiz generated successfully: {len(questions)} questions")
        print(f"  Quiz ID: {quiz_id}\n")
        print("QUIZ QUESTIONS:")
        print("-" * 60)
        
        for i, q in enumerate(questions, 1):
            print(f"\nQuestion {i}: {q.get('question', 'N/A')}")
            if 'options' in q:
                for j, opt in enumerate(q['options'], 1):
                    marker = "✓" if j-1 == q.get('answer_index', -1) else " "
                    print(f"  {marker} {chr(64+j)}. {opt}")
            print(f"  Explanation: {q.get('explanation', 'N/A')}")
    else:
        print(f"✗ Error: {quiz_response['payload']}")
        quiz_data = None
else:
    print("Skipping quiz generation - explanation not available")
    quiz_data = None

Skipping quiz generation - explanation not available


## Step 3: Evaluate Explanation Quality

We'll use the EvaluatorAgent (LLM-as-judge) to evaluate the quality of the explanation.
This demonstrates automatic quality assessment and helps ensure educational content
meets high standards.

In [56]:
if explanation:
    print("\n" + "=" * 60)
    print("STEP 3: EXPLANATION EVALUATION")
    print("=" * 60)
    print()
    
    # Create source text for evaluation (ground truth reference)
    source_text = """Bayes' theorem is a fundamental theorem in probability theory and statistics.
It describes the probability of an event based on prior knowledge of conditions that might be related to the event.
The formula is: P(A|B) = P(B|A) * P(A) / P(B)
Where:
- P(A|B) is the posterior probability
- P(B|A) is the likelihood
- P(A) is the prior probability
- P(B) is the marginal probability
Bayes' theorem is widely used in machine learning, medical diagnosis, and decision-making."""
    
    # Combine explanation into candidate text
    candidate_text = explanation.get("summary", "") + "\n\n"
    candidate_text += "\n".join(explanation.get("step_by_step", []))
    
    # Evaluate using EvaluatorAgent
    evaluate_message = {
        "action": "evaluate_summary",
        "payload": {
            "source_text": source_text,
            "candidate": candidate_text
        },
        "request_id": "demo-003"
    }
    
    print("Requesting evaluation from EvaluatorAgent...")
    eval_response = coordinator.send(
        from_agent="tutor_agent",
        to_agent="evaluator_agent",
        message=evaluate_message
    )
    
    if eval_response["status"] == "ok":
        evaluation = eval_response["payload"]
        print("✓ Evaluation completed successfully\n")
        
        print("EVALUATION SCORES:")
        print("-" * 60)
        print(f"Accuracy:  {evaluation.get('accuracy', 0):.2f} / 1.0")
        print(f"Clarity:   {evaluation.get('clarity', 0):.2f} / 1.0")
        print(f"Usefulness: {evaluation.get('usefulness', 0):.2f} / 1.0")
        
        if 'raw_scores' in evaluation:
            raw = evaluation['raw_scores']
            print(f"\nRaw Scores (1-5 scale):")
            print(f"  Accuracy: {raw.get('accuracy_score', 'N/A')}/5")
            print(f"  Clarity: {raw.get('clarity_score', 'N/A')}/5")
            print(f"  Usefulness: {raw.get('usefulness_score', 'N/A')}/5")
            print(f"  Overall: {raw.get('overall_score', 'N/A')}/5")
        
        hallucinations = evaluation.get('hallucinations', [])
        if hallucinations:
            print(f"\n⚠️  Hallucinations detected: {len(hallucinations)}")
            for hall in hallucinations[:3]:
                print(f"  - {hall}")
        else:
            print("\n✓ No hallucinations detected")
        
        print(f"\nEvaluation method: {evaluation.get('method', 'unknown')}")
    else:
        print(f"✗ Error: {eval_response['payload']}")
        evaluation = None
else:
    print("Skipping evaluation - explanation not available")
    evaluation = None

Skipping evaluation - explanation not available


## Step 4: Save Results to Memory

We'll save all results (explanation, quiz, evaluation) to the memory store.
This enables:
- **Session continuity**: Track user progress across sessions
- **Adaptive learning**: Identify weak areas from wrong answers
- **Personalization**: Adapt future content based on user history

In [57]:
print("\n" + "=" * 60)
print("STEP 4: MEMORY STORAGE")
print("=" * 60)
print()

# Save explanation
if explanation:
    memory_store.save(f"user:{USER_ID}:session:bayes:explanation", {
        "topic": TOPIC,
        "level": LEVEL,
        "explanation": explanation,
        "timestamp": memory_store._get_timestamp(),
        "tags": ["explanation", "bayes-theorem", LEVEL]
    })
    print("✓ Explanation saved to memory")

# Save quiz
if quiz_data:
    memory_store.save(f"user:{USER_ID}:session:bayes:quiz", {
        "topic": TOPIC,
        "quiz_data": quiz_data,
        "timestamp": memory_store._get_timestamp(),
        "tags": ["quiz", "bayes-theorem", LEVEL]
    })
    print("✓ Quiz saved to memory")

# Save evaluation
if evaluation:
    memory_store.save(f"user:{USER_ID}:session:bayes:evaluation", {
        "topic": TOPIC,
        "evaluation": evaluation,
        "timestamp": memory_store._get_timestamp(),
        "tags": ["evaluation", "bayes-theorem"]
    })
    print("✓ Evaluation saved to memory")

# Save session summary
session_summary = {
    "session_id": "bayes-demo",
    "topic": TOPIC,
    "level": LEVEL,
    "actions": ["explain", "generate_quiz", "evaluate"],
    "timestamp": memory_store._get_timestamp(),
    "tags": ["session", "bayes-theorem", "demo"]
}
memory_store.save(f"user:{USER_ID}:session:bayes:summary", session_summary)
print("✓ Session summary saved to memory")

# Verify saved data
print("\nVerifying saved data:")
saved_explanation = memory_store.load(f"user:{USER_ID}:session:bayes:explanation")
saved_quiz = memory_store.load(f"user:{USER_ID}:session:bayes:quiz")
saved_eval = memory_store.load(f"user:{USER_ID}:session:bayes:evaluation")

print(f"  Explanation: {'✓' if saved_explanation else '✗'}")
print(f"  Quiz: {'✓' if saved_quiz else '✗'}")
print(f"  Evaluation: {'✓' if saved_eval else '✗'}")

# Search by tag
bayes_items = memory_store.search_by_tag("bayes-theorem")
print(f"\n✓ Found {len(bayes_items)} items tagged 'bayes-theorem'")


STEP 4: MEMORY STORAGE

✓ Session summary saved to memory

Verifying saved data:
  Explanation: ✗
  Quiz: ✗
  Evaluation: ✗

✓ Found 1 items tagged 'bayes-theorem'


## Step 5: Grade Student Answers (Optional Demo)

Let's demonstrate quiz grading by answering one of the quiz questions.
The QuizAgent will grade the answer and save incorrect answers to memory.

In [58]:
if quiz_data and quiz_data.get("questions"):
    print("\n" + "=" * 60)
    print("STEP 5: QUIZ GRADING DEMO")
    print("=" * 60)
    print()
    
    # Get first question
    first_question = quiz_data["questions"][0]
    question_id = first_question["id"]
    quiz_id = quiz_data.get("quiz_id")
    
    print(f"Question: {first_question.get('question', 'N/A')}")
    print(f"Correct answer: {first_question.get('correct_answer', 'N/A')}\n")
    
    # Test with correct answer
    print("Testing with CORRECT answer...")
    grade_message_correct = {
        "action": "grade_answer",
        "payload": {
            "question_id": question_id,
            "student_answer": first_question.get("correct_answer", ""),
            "quiz_id": quiz_id,
            "user_id": USER_ID
        },
        "request_id": "demo-004"
    }
    
    grade_response_correct = coordinator.send(
        from_agent="tutor_agent",
        to_agent="quiz_agent",
        message=grade_message_correct
    )
    
    if grade_response_correct["status"] == "ok":
        result = grade_response_correct["payload"]
        print(f"  Correct: {result['correct']}")
        print(f"  Score: {result['score']}")
        print(f"  Explanation: {result['explanation'][:100]}...\n")
    
    # Test with incorrect answer
    print("Testing with INCORRECT answer...")
    incorrect_answer = "Wrong answer for testing"
    grade_message_incorrect = {
        "action": "grade_answer",
        "payload": {
            "question_id": question_id,
            "student_answer": incorrect_answer,
            "quiz_id": quiz_id,
            "user_id": USER_ID
        },
        "request_id": "demo-005"
    }
    
    grade_response_incorrect = coordinator.send(
        from_agent="tutor_agent",
        to_agent="quiz_agent",
        message=grade_message_incorrect
    )
    
    if grade_response_incorrect["status"] == "ok":
        result = grade_response_incorrect["payload"]
        print(f"  Correct: {result['correct']}")
        print(f"  Score: {result['score']}")
        print(f"  Explanation: {result['explanation'][:100]}...")
        
        if not result['correct']:
            print("\n✓ Incorrect answer saved to memory for adaptive learning")
            
            # Check wrong answers in memory
            wrong_answers = memory_store.load(f"user:{USER_ID}:wrong_answers", default=[])
            print(f"  Total wrong answers tracked: {len(wrong_answers)}")
else:
    print("Skipping quiz grading - quiz not available")

Skipping quiz grading - quiz not available


## Step 6: Observability and Metrics

The system tracks all agent interactions through the observability module.
Let's view the metrics and logs from this session.

In [59]:
print("\n" + "=" * 60)
print("STEP 6: OBSERVABILITY & METRICS")
print("=" * 60)
print()

# Get metrics
metrics = get_metrics()
if metrics:
    print("Agent Event Metrics:")
    print("-" * 60)
    for metric, count in sorted(metrics.items()):
        print(f"  {metric}: {count}")
else:
    print("No metrics recorded yet")

print("\n✓ Logs written to: data/logs/")
print("✓ Metrics tracked for all agent interactions")


STEP 6: OBSERVABILITY & METRICS

Agent Event Metrics:
------------------------------------------------------------
  evaluator_agent:registered: 6
  quiz_agent:registered: 6
  total:message_handled: 7
  total:message_sent: 7
  total:registered: 18
  tutor_agent:message_handled: 7
  tutor_agent:message_sent: 7
  tutor_agent:registered: 6

✓ Logs written to: data/logs/
✓ Metrics tracked for all agent interactions


## Summary

This end-to-end demonstration showcased:

1. ✅ **Multi-Agent Coordination**: Three specialized agents working together
2. ✅ **Explanation Generation**: Structured educational content with examples
3. ✅ **Quiz Creation**: Assessment questions to test understanding
4. ✅ **Quality Evaluation**: LLM-as-judge evaluation of content quality
5. ✅ **Memory Persistence**: All results saved for future reference
6. ✅ **Adaptive Learning**: Wrong answers tracked for personalized tutoring

### Key Features Demonstrated:

- **Agent-to-Agent Communication**: Seamless message passing via Coordinator
- **Structured Outputs**: JSON-formatted responses for easy parsing
- **LLM Integration**: Google Gemini API for content generation
- **Quality Assurance**: Automatic evaluation of generated content
- **Persistent Memory**: Session data saved for continuity
- **Observability**: Full logging and metrics tracking

### Next Steps:

- Extend with SearchAgent for finding educational resources
- Implement user profiles for personalized learning paths
- Add more sophisticated adaptive learning algorithms
- Integrate with external educational content APIs

In [60]:
print("\n" + "=" * 60)
print("COMPLETE SESSION RESULTS")
print("=" * 60)
print()

# Compile all results
results = {
    "session_info": {
        "topic": TOPIC,
        "level": LEVEL,
        "user_id": USER_ID,
        "agents_used": coordinator.list_agents(),
        "timestamp": memory_store._get_timestamp()
    },
    "explanation": explanation if explanation else None,
    "quiz": quiz_data if quiz_data else None,
    "evaluation": evaluation if evaluation else None
}

# Display as formatted JSON
print(json.dumps(results, indent=2, default=str))

# Close memory store
memory_store.close()
print("\n✓ Memory store closed")
print("✓ Demo completed successfully!")


COMPLETE SESSION RESULTS

{
  "session_info": {
    "topic": "Bayes theorem",
    "level": "intermediate",
    "user_id": "demo_user",
    "agents_used": [
      "tutor_agent",
      "quiz_agent",
      "evaluator_agent"
    ],
    "timestamp": "2025-11-23T16:26:31.959063"
  },
  "explanation": null,
  "quiz": null,
  "evaluation": null
}

✓ Memory store closed
✓ Demo completed successfully!
