![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)

# Context Summarization: Intelligent Compression

## Learning Objectives (35 minutes)
By the end of this notebook, you will be able to:
1. **Understand** when and why to summarize context vs. pruning
2. **Implement** multi-level summarization strategies
3. **Create** structured summary templates for different information types
4. **Design** progressive summarization for long conversations
5. **Measure** information preservation and compression effectiveness

## Prerequisites
- Completed previous notebooks in Section 5
- Understanding of context pruning techniques
- Familiarity with LLM summarization capabilities

---

## Introduction

**Context Summarization** is the practice of compressing accumulated context into concise summaries while preserving essential information. Unlike pruning (which removes information), summarization condenses information into a more compact form.

### When to Summarize vs. Prune

**Summarize when:**
- Information is valuable but verbose
- You need to preserve decision context
- Conversations contain important insights
- Academic progress needs tracking

**Prune when:**
- Information is outdated or irrelevant
- Content is duplicated
- Context is contaminated
- Storage needs optimization

### Our Solution: Multi-Level Summarization

We'll implement:
1. **Conversation summaries**: Key decisions and insights
2. **Academic progress summaries**: Completed courses and goals
3. **Preference profiles**: Consolidated student preferences
4. **Progressive summarization**: Hierarchical compression

## Environment Setup

In [None]:
# Environment setup
import os
import asyncio
import json
from typing import List, Dict, Any, Optional, Tuple
from dataclasses import dataclass, field
from datetime import datetime, timedelta
from enum import Enum
import re
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

REDIS_URL = os.getenv("REDIS_URL", "redis://localhost:6379")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

print("🔧 Environment Setup")
print("=" * 30)
print(f"Redis URL: {REDIS_URL}")
print(f"OpenAI API Key: {'✅ Set' if OPENAI_API_KEY else '❌ Not set'}")

In [None]:
# Import required modules
try:
    from openai import OpenAI
    import redis
    from redis_context_course.models import StudentProfile
    from redis_context_course.course_manager import CourseManager
    from redis_context_course.redis_config import redis_config
    
    # Initialize OpenAI client
    if OPENAI_API_KEY:
        openai_client = OpenAI(api_key=OPENAI_API_KEY)
        print("✅ OpenAI client initialized")
    else:
        openai_client = None
        print("⚠️  OpenAI client not available (API key not set)")
    
    # Redis connection
    redis_client = redis.from_url(REDIS_URL)
    if redis_config.health_check():
        print("✅ Redis connection healthy")
    else:
        print("❌ Redis connection failed")
    
    print("✅ Core modules imported successfully")
    
except ImportError as e:
    print(f"❌ Import failed: {e}")
    print("Please ensure you've completed the setup from previous sections.")

## Summarization Framework

Let's create a comprehensive framework for different types of summarization:

In [None]:
class SummaryType(Enum):
    """Types of summaries we can create."""
    CONVERSATION = "conversation"
    ACADEMIC_PROGRESS = "academic_progress"
    PREFERENCE_PROFILE = "preference_profile"
    COURSE_EXPLORATION = "course_exploration"
    CAREER_PLANNING = "career_planning"
    SESSION_RECAP = "session_recap"

@dataclass
class SummaryTemplate:
    """Template for structured summaries."""
    summary_type: SummaryType
    required_fields: List[str]
    optional_fields: List[str] = field(default_factory=list)
    max_length: int = 500
    format_instructions: str = ""
    
    def get_prompt_template(self) -> str:
        """Get the prompt template for this summary type."""
        base_prompt = f"Create a {self.summary_type.value} summary with the following structure:\n\n"
        
        for field in self.required_fields:
            base_prompt += f"• {field.replace('_', ' ').title()}: [Required]\n"
        
        for field in self.optional_fields:
            base_prompt += f"• {field.replace('_', ' ').title()}: [Optional]\n"
        
        base_prompt += f"\nMaximum length: {self.max_length} characters\n"
        
        if self.format_instructions:
            base_prompt += f"\nFormat instructions: {self.format_instructions}\n"
        
        return base_prompt

@dataclass
class Summary:
    """Represents a generated summary."""
    id: str
    summary_type: SummaryType
    content: str
    source_data: List[str]  # IDs of source records
    timestamp: datetime
    student_id: str
    compression_ratio: float = 0.0
    metadata: Dict[str, Any] = field(default_factory=dict)
    
    def calculate_compression_ratio(self, original_length: int):
        """Calculate compression ratio."""
        if original_length > 0:
            self.compression_ratio = len(self.content) / original_length
        else:
            self.compression_ratio = 0.0

# Define summary templates for different types
SUMMARY_TEMPLATES = {
    SummaryType.CONVERSATION: SummaryTemplate(
        summary_type=SummaryType.CONVERSATION,
        required_fields=["key_decisions", "main_topics", "action_items"],
        optional_fields=["questions_asked", "preferences_mentioned"],
        max_length=400,
        format_instructions="Use bullet points for clarity. Focus on actionable insights."
    ),
    
    SummaryType.ACADEMIC_PROGRESS: SummaryTemplate(
        summary_type=SummaryType.ACADEMIC_PROGRESS,
        required_fields=["completed_courses", "current_gpa", "major_progress"],
        optional_fields=["honors_achievements", "academic_goals", "graduation_timeline"],
        max_length=300,
        format_instructions="Include course codes and grades. Highlight major milestones."
    ),
    
    SummaryType.PREFERENCE_PROFILE: SummaryTemplate(
        summary_type=SummaryType.PREFERENCE_PROFILE,
        required_fields=["course_format_preferences", "schedule_preferences", "difficulty_preferences"],
        optional_fields=["subject_interests", "learning_style", "career_interests"],
        max_length=250,
        format_instructions="Consolidate similar preferences. Note any changes over time."
    ),
    
    SummaryType.COURSE_EXPLORATION: SummaryTemplate(
        summary_type=SummaryType.COURSE_EXPLORATION,
        required_fields=["courses_viewed", "search_patterns", "interest_areas"],
        optional_fields=["comparison_criteria", "decision_factors", "rejected_courses"],
        max_length=350,
        format_instructions="Group by subject area. Note selection criteria."
    ),
    
    SummaryType.CAREER_PLANNING: SummaryTemplate(
        summary_type=SummaryType.CAREER_PLANNING,
        required_fields=["career_goals", "target_industries", "skill_development_needs"],
        optional_fields=["internship_interests", "networking_activities", "timeline_goals"],
        max_length=400,
        format_instructions="Connect career goals to academic planning. Include timeline."
    )
}

print(f"✅ Summarization framework initialized with {len(SUMMARY_TEMPLATES)} templates")

## Context Summarizer Implementation

Now let's create the main summarization engine:

In [None]:
class ContextSummarizer:
    """Intelligent context summarization engine."""
    
    def __init__(self, openai_client: Optional[OpenAI] = None):
        self.openai_client = openai_client
        self.templates = SUMMARY_TEMPLATES
        self.summarization_stats = {
            "total_summaries": 0,
            "by_type": {},
            "total_compression": 0.0
        }
    
    async def create_summary(self, 
                           summary_type: SummaryType,
                           source_content: List[str],
                           student_id: str,
                           additional_context: str = "") -> Summary:
        """Create a summary of the given content."""
        
        template = self.templates.get(summary_type)
        if not template:
            raise ValueError(f"No template found for summary type: {summary_type}")
        
        # Prepare content for summarization
        combined_content = "\n\n".join(source_content)
        original_length = len(combined_content)
        
        # Generate summary
        if self.openai_client:
            summary_content = await self._generate_ai_summary(
                template, combined_content, additional_context
            )
        else:
            summary_content = self._generate_mock_summary(
                template, combined_content, additional_context
            )
        
        # Create summary object
        summary = Summary(
            id=f"{summary_type.value}_{student_id}_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
            summary_type=summary_type,
            content=summary_content,
            source_data=[f"content_{i}" for i in range(len(source_content))],
            timestamp=datetime.now(),
            student_id=student_id
        )
        
        # Calculate compression ratio
        summary.calculate_compression_ratio(original_length)
        
        # Update statistics
        self._update_stats(summary)
        
        return summary
    
    async def _generate_ai_summary(self, 
                                 template: SummaryTemplate, 
                                 content: str, 
                                 additional_context: str) -> str:
        """Generate summary using OpenAI."""
        
        prompt = template.get_prompt_template()
        prompt += f"\nContent to summarize:\n{content}"
        
        if additional_context:
            prompt += f"\nAdditional context: {additional_context}"
        
        prompt += "\n\nGenerate a structured summary following the template above:"
        
        try:
            response = self.openai_client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[
                    {"role": "system", "content": "You are an expert at creating concise, structured summaries for academic contexts."},
                    {"role": "user", "content": prompt}
                ],
                max_tokens=template.max_length // 2,  # Rough token estimation
                temperature=0.3
            )
            
            return response.choices[0].message.content.strip()
            
        except Exception as e:
            print(f"❌ AI summarization failed: {e}")
            return self._generate_mock_summary(template, content, additional_context)
    
    def _generate_mock_summary(self, 
                             template: SummaryTemplate, 
                             content: str, 
                             additional_context: str) -> str:
        """Generate a mock summary for testing without OpenAI."""
        
        # Extract key information based on template type
        if template.summary_type == SummaryType.CONVERSATION:
            return self._mock_conversation_summary(content)
        elif template.summary_type == SummaryType.ACADEMIC_PROGRESS:
            return self._mock_academic_summary(content)
        elif template.summary_type == SummaryType.PREFERENCE_PROFILE:
            return self._mock_preference_summary(content)
        elif template.summary_type == SummaryType.COURSE_EXPLORATION:
            return self._mock_course_exploration_summary(content)
        elif template.summary_type == SummaryType.CAREER_PLANNING:
            return self._mock_career_summary(content)
        else:
            return f"Mock summary for {template.summary_type.value}: {content[:100]}..."
    
    def _mock_conversation_summary(self, content: str) -> str:
        """Generate mock conversation summary."""
        return """Key Decisions:
• Student decided to focus on machine learning courses
• Chose to prioritize online format due to work schedule

Main Topics:
• Course selection for AI specialization
• Prerequisites and academic planning
• Career goals in data science

Action Items:
• Research CS401 and CS402 course details
• Check prerequisites for advanced courses
• Plan course sequence for next 2 semesters"""
    
    def _mock_academic_summary(self, content: str) -> str:
        """Generate mock academic progress summary."""
        return """Completed Courses:
• CS101: Introduction to Programming (A)
• CS201: Data Structures (B+)
• MATH201: Calculus II (A-)

Current GPA: 3.7

Major Progress:
• Computer Science major: 45% complete
• Core requirements: 8/12 courses completed
• Electives: 2/6 courses completed

Graduation Timeline: Spring 2026 (on track)"""
    
    def _mock_preference_summary(self, content: str) -> str:
        """Generate mock preference profile summary."""
        return """Course Format Preferences:
• Strongly prefers online courses (work schedule)
• Accepts hybrid format for lab courses
• Avoids early morning classes

Schedule Preferences:
• Evening classes preferred (after 6 PM)
• Weekend courses acceptable
• Flexible with asynchronous content

Difficulty Preferences:
• Comfortable with intermediate to advanced courses
• Prefers challenging but manageable workload
• Values practical, hands-on learning"""
    
    def _mock_course_exploration_summary(self, content: str) -> str:
        """Generate mock course exploration summary."""
        return """Courses Viewed:
• CS401: Machine Learning Fundamentals
• CS402: Advanced Machine Learning
• CS403: Deep Learning Applications
• STAT301: Statistical Analysis

Search Patterns:
• Focused on AI/ML related courses
• Interested in practical applications
• Comparing prerequisite requirements

Interest Areas:
• Machine learning and AI
• Data science applications
• Statistical modeling"""
    
    def _mock_career_summary(self, content: str) -> str:
        """Generate mock career planning summary."""
        return """Career Goals:
• Data Scientist at tech company
• Machine Learning Engineer role
• Research opportunities in AI

Target Industries:
• Technology and software
• Healthcare analytics
• Financial services

Skill Development Needs:
• Advanced Python programming
• Statistical modeling expertise
• Cloud computing platforms
• Portfolio development"""
    
    def _update_stats(self, summary: Summary):
        """Update summarization statistics."""
        self.summarization_stats["total_summaries"] += 1
        
        summary_type = summary.summary_type.value
        if summary_type not in self.summarization_stats["by_type"]:
            self.summarization_stats["by_type"][summary_type] = 0
        self.summarization_stats["by_type"][summary_type] += 1
        
        self.summarization_stats["total_compression"] += summary.compression_ratio
    
    def get_summarization_stats(self) -> Dict[str, Any]:
        """Get summarization statistics."""
        stats = self.summarization_stats.copy()
        if stats["total_summaries"] > 0:
            stats["average_compression"] = stats["total_compression"] / stats["total_summaries"]
        else:
            stats["average_compression"] = 0.0
        return stats

# Initialize the context summarizer
context_summarizer = ContextSummarizer(openai_client)

print("✅ Context summarizer initialized")

## Demonstration: Multi-Level Summarization

Let's create sample content and demonstrate different types of summarization:

In [None]:
# Create sample content for different summary types
sample_content = {
    SummaryType.CONVERSATION: [
        "Student asked about machine learning courses available next semester",
        "Discussed prerequisites for CS401 - student has completed CS201 and MATH201",
        "Student expressed preference for online courses due to work schedule",
        "Recommended CS401 as starting point, then CS402 for advanced topics",
        "Student decided to enroll in CS401 and STAT301 for spring semester",
        "Action item: Student will check course availability and register early",
        "Student asked about career prospects in AI and data science",
        "Discussed importance of building portfolio with practical projects"
    ],
    
    SummaryType.ACADEMIC_PROGRESS: [
        "Completed CS101: Introduction to Programming with grade A in Fall 2023",
        "Completed CS201: Data Structures and Algorithms with grade B+ in Spring 2024",
        "Completed MATH201: Calculus II with grade A- in Spring 2024",
        "Currently enrolled in CS301: Database Systems and MATH301: Statistics",
        "Current cumulative GPA: 3.7 with 45 credit hours completed",
        "Computer Science major progress: 8 out of 12 core courses completed",
        "Elective progress: 2 out of 6 required electives completed",
        "On track for graduation in Spring 2026",
        "Dean's List recognition for Spring 2024 semester"
    ],
    
    SummaryType.PREFERENCE_PROFILE: [
        "Student strongly prefers online course format due to full-time work schedule",
        "Prefers evening classes after 6 PM when possible",
        "Comfortable with asynchronous learning and recorded lectures",
        "Avoids early morning classes (before 10 AM)",
        "Interested in hands-on, practical learning over theoretical approaches",
        "Prefers intermediate to advanced difficulty level",
        "Values courses with real-world applications and project-based learning",
        "Open to hybrid format for lab-intensive courses",
        "Prefers smaller class sizes for better interaction"
    ],
    
    SummaryType.COURSE_EXPLORATION: [
        "Searched for 'machine learning' courses multiple times",
        "Viewed detailed information for CS401: Machine Learning Fundamentals",
        "Compared CS401 vs CS402: Advanced Machine Learning",
        "Checked prerequisites for CS403: Deep Learning Applications",
        "Explored STAT301: Statistical Analysis as supporting course",
        "Looked into CS404: Natural Language Processing",
        "Researched course reviews and difficulty ratings",
        "Compared online vs in-person sections for CS401",
        "Added CS401 and STAT301 to course wishlist"
    ],
    
    SummaryType.CAREER_PLANNING: [
        "Student interested in data scientist role at technology companies",
        "Exploring machine learning engineer positions",
        "Considering research opportunities in artificial intelligence",
        "Target industries include tech, healthcare analytics, and finance",
        "Needs to develop advanced Python programming skills",
        "Wants to gain experience with cloud computing platforms",
        "Plans to build portfolio with machine learning projects",
        "Interested in internship opportunities for summer 2025",
        "Considering graduate school for advanced AI research"
    ]
}

print("📚 Sample content created for demonstration")
print(f"📋 Content types: {list(sample_content.keys())}")

# Test each summary type
print("\n🧪 Testing Different Summary Types")
print("=" * 60)

summaries = {}

for summary_type, content_list in sample_content.items():
    print(f"\n🎯 Creating {summary_type.value} summary...")
    
    # Calculate original content length
    original_content = "\n".join(content_list)
    original_length = len(original_content)
    
    # Create summary
    summary = await context_summarizer.create_summary(
        summary_type=summary_type,
        source_content=content_list,
        student_id="test_student",
        additional_context="Student is working full-time while pursuing CS degree"
    )
    
    summaries[summary_type] = summary
    
    # Display results
    print(f"📊 Compression: {original_length} → {len(summary.content)} chars ({summary.compression_ratio:.1%})")
    print(f"📝 Summary:")
    print(summary.content)
    print("-" * 50)

## Progressive Summarization

Let's implement progressive summarization for handling very long conversations:

In [None]:
class ProgressiveSummarizer:
    """Implements progressive summarization for long conversations."""
    
    def __init__(self, context_summarizer: ContextSummarizer):
        self.context_summarizer = context_summarizer
        self.chunk_size = 10  # Number of messages per chunk
        self.summary_levels = {
            "level_1": "detailed",    # Chunk-level summaries
            "level_2": "condensed",   # Multi-chunk summaries
            "level_3": "executive"    # High-level overview
        }
    
    async def create_progressive_summary(self, 
                                       conversation_messages: List[str],
                                       student_id: str) -> Dict[str, Any]:
        """Create progressive summaries at multiple levels."""
        
        total_messages = len(conversation_messages)
        print(f"📊 Processing {total_messages} conversation messages")
        
        # Level 1: Chunk-level summaries
        level_1_summaries = []
        chunks = self._chunk_messages(conversation_messages)
        
        print(f"🔄 Level 1: Creating {len(chunks)} chunk summaries...")
        for i, chunk in enumerate(chunks):
            chunk_summary = await self.context_summarizer.create_summary(
                summary_type=SummaryType.CONVERSATION,
                source_content=chunk,
                student_id=student_id,
                additional_context=f"Conversation chunk {i+1} of {len(chunks)}"
            )
            level_1_summaries.append(chunk_summary)
        
        # Level 2: Multi-chunk summaries (if we have many chunks)
        level_2_summaries = []
        if len(level_1_summaries) > 4:
            print(f"🔄 Level 2: Creating condensed summaries...")
            
            # Group level 1 summaries
            summary_groups = self._group_summaries(level_1_summaries, group_size=3)
            
            for i, group in enumerate(summary_groups):
                group_content = [s.content for s in group]
                condensed_summary = await self.context_summarizer.create_summary(
                    summary_type=SummaryType.SESSION_RECAP,
                    source_content=group_content,
                    student_id=student_id,
                    additional_context=f"Condensed summary group {i+1}"
                )
                level_2_summaries.append(condensed_summary)
        
        # Level 3: Executive summary
        print(f"🔄 Level 3: Creating executive summary...")
        
        # Use level 2 summaries if available, otherwise level 1
        source_summaries = level_2_summaries if level_2_summaries else level_1_summaries
        executive_content = [s.content for s in source_summaries]
        
        executive_summary = await self.context_summarizer.create_summary(
            summary_type=SummaryType.SESSION_RECAP,
            source_content=executive_content,
            student_id=student_id,
            additional_context="Executive summary of entire conversation"
        )
        
        # Calculate overall compression
        original_length = sum(len(msg) for msg in conversation_messages)
        final_length = len(executive_summary.content)
        overall_compression = final_length / original_length if original_length > 0 else 0
        
        return {
            "original_messages": total_messages,
            "original_length": original_length,
            "level_1_summaries": level_1_summaries,
            "level_2_summaries": level_2_summaries,
            "executive_summary": executive_summary,
            "overall_compression": overall_compression,
            "compression_stages": {
                "level_1": len(level_1_summaries),
                "level_2": len(level_2_summaries),
                "level_3": 1
            }
        }
    
    def _chunk_messages(self, messages: List[str]) -> List[List[str]]:
        """Split messages into chunks for processing."""
        chunks = []
        for i in range(0, len(messages), self.chunk_size):
            chunk = messages[i:i + self.chunk_size]
            chunks.append(chunk)
        return chunks
    
    def _group_summaries(self, summaries: List[Summary], group_size: int = 3) -> List[List[Summary]]:
        """Group summaries for higher-level summarization."""
        groups = []
        for i in range(0, len(summaries), group_size):
            group = summaries[i:i + group_size]
            groups.append(group)
        return groups

# Initialize progressive summarizer
progressive_summarizer = ProgressiveSummarizer(context_summarizer)

# Create a long conversation for testing
long_conversation = [
    "Hi, I need help planning my courses for next semester",
    "I'm interested in machine learning and AI courses",
    "What prerequisites do I need for CS401?",
    "I've completed CS201 and MATH201 already",
    "Are there any online sections available?",
    "I work full-time so I need flexible scheduling",
    "What about CS402? Is that too advanced for me?",
    "I want to become a data scientist after graduation",
    "Should I take statistics courses too?",
    "STAT301 looks interesting for data analysis",
    "How difficult is the workload for these courses?",
    "I can dedicate about 20 hours per week to studies",
    "What programming languages will I need to know?",
    "I'm comfortable with Python and Java",
    "Are there any project-based courses?",
    "I learn better with hands-on experience",
    "What about internship opportunities?",
    "I'd like to gain practical experience",
    "Can you help me create a 2-year plan?",
    "I want to graduate by Spring 2026",
    "What electives would complement my major?",
    "I'm also interested in cybersecurity",
    "Should I consider a minor in mathematics?",
    "How important is GPA for data science jobs?",
    "I currently have a 3.7 GPA",
    "What companies recruit from our program?",
    "I'd prefer to work in healthcare or finance",
    "Are there any networking events I should attend?",
    "I want to build professional connections",
    "Thank you for all the helpful advice!"
]

print(f"📚 Created long conversation with {len(long_conversation)} messages")

# Test progressive summarization
print("\n🧪 Testing Progressive Summarization")
print("=" * 60)

progressive_result = await progressive_summarizer.create_progressive_summary(
    long_conversation, "test_student"
)

# Display results
print(f"\n📊 Progressive Summarization Results:")
print(f"   Original: {progressive_result['original_messages']} messages, {progressive_result['original_length']} chars")
print(f"   Level 1: {len(progressive_result['level_1_summaries'])} chunk summaries")
print(f"   Level 2: {len(progressive_result['level_2_summaries'])} condensed summaries")
print(f"   Level 3: 1 executive summary")
print(f"   Overall compression: {progressive_result['overall_compression']:.1%}")

print(f"\n📝 Executive Summary:")
print(progressive_result['executive_summary'].content)

print(f"\n🔍 Sample Level 1 Summary:")
if progressive_result['level_1_summaries']:
    print(progressive_result['level_1_summaries'][0].content[:200] + "...")

## 🧪 Hands-on Exercise: Design Your Summarization Strategy

Now it's your turn to experiment with context summarization:

In [None]:
# Exercise: Create your own summarization strategy
print("🧪 Exercise: Design Your Context Summarization Strategy")
print("=" * 60)

# TODO: Create a custom summarization approach
class AdaptiveSummarizer:
    """Adaptive summarization that adjusts based on content type and context."""
    
    def __init__(self, context_summarizer: ContextSummarizer):
        self.context_summarizer = context_summarizer
        self.content_analyzers = {
            "academic": self._analyze_academic_content,
            "career": self._analyze_career_content,
            "preference": self._analyze_preference_content
        }
    
    def analyze_content_type(self, content: List[str]) -> str:
        """Analyze content to determine the best summarization approach."""
        combined_content = " ".join(content).lower()
        
        # Count keywords for different content types
        academic_keywords = ["course", "grade", "gpa", "semester", "credit", "prerequisite", "graduation"]
        career_keywords = ["job", "career", "internship", "industry", "skill", "experience", "work"]
        preference_keywords = ["prefer", "like", "want", "avoid", "format", "schedule", "online"]
        
        academic_score = sum(1 for keyword in academic_keywords if keyword in combined_content)
        career_score = sum(1 for keyword in career_keywords if keyword in combined_content)
        preference_score = sum(1 for keyword in preference_keywords if keyword in combined_content)
        
        # Determine dominant content type
        scores = {
            "academic": academic_score,
            "career": career_score,
            "preference": preference_score
        }
        
        return max(scores.items(), key=lambda x: x[1])[0]
    
    async def create_adaptive_summary(self, 
                                    content: List[str], 
                                    student_id: str,
                                    context: str = "") -> Dict[str, Any]:
        """Create summary adapted to content type."""
        
        # Analyze content type
        content_type = self.analyze_content_type(content)
        print(f"🔍 Detected content type: {content_type}")
        
        # Apply content-specific analysis
        analysis = self.content_analyzers[content_type](content)
        
        # Choose appropriate summary type
        summary_type_mapping = {
            "academic": SummaryType.ACADEMIC_PROGRESS,
            "career": SummaryType.CAREER_PLANNING,
            "preference": SummaryType.PREFERENCE_PROFILE
        }
        
        summary_type = summary_type_mapping[content_type]
        
        # Create enhanced context with analysis
        enhanced_context = f"{context}. Content analysis: {analysis['summary']}"
        
        # Generate summary
        summary = await self.context_summarizer.create_summary(
            summary_type=summary_type,
            source_content=content,
            student_id=student_id,
            additional_context=enhanced_context
        )
        
        return {
            "content_type": content_type,
            "analysis": analysis,
            "summary": summary,
            "adaptation_reason": f"Optimized for {content_type} content"
        }
    
    def _analyze_academic_content(self, content: List[str]) -> Dict[str, Any]:
        """Analyze academic-focused content."""
        courses_mentioned = []
        grades_mentioned = []
        
        for item in content:
            # Simple pattern matching for courses (CS101, MATH201, etc.)
            import re
            course_pattern = r'[A-Z]{2,4}\d{3}'
            courses = re.findall(course_pattern, item)
            courses_mentioned.extend(courses)
            
            # Look for grade mentions
            if any(grade in item for grade in ['A', 'B', 'C', 'D', 'F', 'GPA']):
                grades_mentioned.append(item)
        
        return {
            "courses_found": list(set(courses_mentioned)),
            "grade_references": len(grades_mentioned),
            "summary": f"Found {len(set(courses_mentioned))} courses and {len(grades_mentioned)} grade references"
        }
    
    def _analyze_career_content(self, content: List[str]) -> Dict[str, Any]:
        """Analyze career-focused content."""
        career_terms = []
        industries = []
        
        career_keywords = ["data scientist", "engineer", "analyst", "developer", "researcher"]
        industry_keywords = ["tech", "healthcare", "finance", "education", "government"]
        
        combined_content = " ".join(content).lower()
        
        for term in career_keywords:
            if term in combined_content:
                career_terms.append(term)
        
        for industry in industry_keywords:
            if industry in combined_content:
                industries.append(industry)
        
        return {
            "career_roles": career_terms,
            "target_industries": industries,
            "summary": f"Identified {len(career_terms)} career roles and {len(industries)} industries"
        }
    
    def _analyze_preference_content(self, content: List[str]) -> Dict[str, Any]:
        """Analyze preference-focused content."""
        preferences = {
            "format": [],
            "schedule": [],
            "difficulty": []
        }
        
        for item in content:
            item_lower = item.lower()
            
            if any(word in item_lower for word in ["online", "hybrid", "in-person"]):
                preferences["format"].append(item)
            
            if any(word in item_lower for word in ["morning", "evening", "weekend", "schedule"]):
                preferences["schedule"].append(item)
            
            if any(word in item_lower for word in ["easy", "difficult", "challenging", "advanced"]):
                preferences["difficulty"].append(item)
        
        return {
            "preference_categories": {k: len(v) for k, v in preferences.items()},
            "total_preferences": sum(len(v) for v in preferences.values()),
            "summary": f"Found preferences in {len([k for k, v in preferences.items() if v])} categories"
        }

# Test adaptive summarization
adaptive_summarizer = AdaptiveSummarizer(context_summarizer)

# Test with different content types
test_contents = {
    "Academic Content": [
        "Completed CS201 with grade A last semester",
        "Currently enrolled in CS301 and MATH301",
        "Need to maintain 3.5 GPA for scholarship",
        "Planning to take CS401 next semester"
    ],
    "Career Content": [
        "Interested in data scientist positions",
        "Want to work in healthcare or tech industry",
        "Need to develop machine learning skills",
        "Looking for internship opportunities"
    ],
    "Preference Content": [
        "Prefer online courses due to work schedule",
        "Like evening classes after 6 PM",
        "Avoid early morning sessions",
        "Comfortable with challenging coursework"
    ]
}

print("\n🎯 Testing Adaptive Summarization:")
for content_name, content_list in test_contents.items():
    print(f"\n📝 {content_name}:")
    
    result = await adaptive_summarizer.create_adaptive_summary(
        content_list, "test_student", "Student working toward CS degree"
    )
    
    print(f"   Content Type: {result['content_type']}")
    print(f"   Analysis: {result['analysis']['summary']}")
    print(f"   Compression: {result['summary'].compression_ratio:.1%}")
    print(f"   Adaptation: {result['adaptation_reason']}")

print("\n🤔 Reflection Questions:")
print("1. How does adaptive summarization improve information preservation?")
print("2. What are the trade-offs between generic and specialized summaries?")
print("3. How would you handle mixed content types in a single conversation?")
print("4. What other content analysis techniques could improve summarization?")

print("\n🔧 Your Turn: Try These Modifications:")
print("   • Add sentiment analysis to summaries")
print("   • Implement priority-based summarization")
print("   • Create domain-specific summary templates")
print("   • Add user feedback to improve summary quality")

## Key Takeaways

From this exploration of context summarization, you've learned:

### 🎯 **Core Concepts**
- **Summarization vs. pruning**: Compression vs. removal strategies
- **Multi-level summarization**: Different granularities for different needs
- **Structured templates**: Consistent format for different information types
- **Progressive compression**: Hierarchical summarization for long content

### 🛠️ **Implementation Patterns**
- **Template-based summarization** for consistent structure
- **Content-type adaptation** for specialized summarization
- **Progressive chunking** for handling long conversations
- **Compression ratio tracking** for performance monitoring

### 📊 **Performance Benefits**
- **Significant compression** (typically 70-90% reduction)
- **Information preservation** of key decisions and insights
- **Structured output** for easy consumption and retrieval
- **Scalable processing** for conversations of any length

### 🔄 **Summarization Strategies**
- **Conversation summaries**: Key decisions and action items
- **Academic progress**: Courses, grades, and milestones
- **Preference profiles**: Consolidated student preferences
- **Progressive summarization**: Multi-level compression
- **Adaptive summarization**: Content-type specific approaches

### 📈 **Quality Factors**
- **Completeness**: All important information preserved
- **Accuracy**: Faithful representation of original content
- **Conciseness**: Maximum compression with minimal loss
- **Structure**: Organized format for easy consumption
- **Relevance**: Focus on actionable and important information

### 🚀 **Next Steps**
In the next notebook, we'll explore **Context Offloading** - how to move information out of the main context window into external storage systems while maintaining intelligent access patterns.

The summarization techniques you've learned provide the foundation for creating compact, structured representations that can be efficiently stored and retrieved.

---

**Ready to continue?** Move on to `05_context_offloading.ipynb` to learn about external context storage and scratchpad patterns!