# Story Generation Exercises

Practice implementing custom renderers, metadata tracking, and story generation patterns.

## Exercise 1: Build Custom CSV Renderer

Create a `CSVStoryRenderer` that exports pipeline story metadata as CSV format.

Requirements:
- Export one row per node execution
- Include columns: node_name, status, duration, rows_in, rows_out, operation
- Include a summary row with pipeline totals
- Implement `render()` and `render_to_file()` methods

In [None]:
from odibi.story.metadata import PipelineStoryMetadata, NodeExecutionMetadata
import csv
from io import StringIO
from pathlib import Path

class CSVStoryRenderer:
    """Renders pipeline stories as CSV."""
    
    def render(self, metadata: PipelineStoryMetadata) -> str:
        """Render story as CSV string."""
        # TODO: Implement CSV rendering
        # Hint: Use csv.DictWriter with StringIO
        pass
    
    def render_to_file(self, metadata: PipelineStoryMetadata, output_path: str) -> str:
        """Render story and save to CSV file."""
        # TODO: Implement file writing
        pass

# Test your implementation
test_metadata = PipelineStoryMetadata(
    pipeline_name="test_pipeline",
    duration=10.5
)

test_metadata.add_node(NodeExecutionMetadata(
    node_name="extract",
    operation="read_csv",
    status="success",
    duration=3.5,
    rows_in=0,
    rows_out=1000
))

test_metadata.add_node(NodeExecutionMetadata(
    node_name="transform",
    operation="filter",
    status="success",
    duration=2.0,
    rows_in=1000,
    rows_out=850
))

renderer = CSVStoryRenderer()
csv_output = renderer.render(test_metadata)
print(csv_output)

## Exercise 2: Add New Metadata Fields

Extend `NodeExecutionMetadata` to track memory usage.

Requirements:
- Add `memory_mb` field (memory used in MB)
- Add `peak_memory_mb` field (peak memory during execution)
- Add method `calculate_memory_efficiency()` that returns rows_out / memory_mb
- Update `to_dict()` to include new fields

In [None]:
from dataclasses import dataclass, field
from typing import Optional, List, Dict, Any

@dataclass
class EnhancedNodeExecutionMetadata:
    """Extended metadata with memory tracking."""
    
    node_name: str
    operation: str
    status: str
    duration: float
    
    # Existing fields
    rows_in: Optional[int] = None
    rows_out: Optional[int] = None
    rows_change: Optional[int] = None
    rows_change_pct: Optional[float] = None
    
    # TODO: Add memory fields
    
    def calculate_row_change(self):
        """Calculate row count change metrics."""
        if self.rows_in is not None and self.rows_out is not None:
            self.rows_change = self.rows_out - self.rows_in
            if self.rows_in > 0:
                self.rows_change_pct = (self.rows_change / self.rows_in) * 100
            else:
                self.rows_change_pct = 0.0 if self.rows_out == 0 else 100.0
    
    def calculate_memory_efficiency(self) -> Optional[float]:
        """Calculate rows processed per MB of memory."""
        # TODO: Implement memory efficiency calculation
        pass
    
    def to_dict(self) -> Dict[str, Any]:
        """Convert to dictionary."""
        # TODO: Include all fields including new memory fields
        pass

# Test your implementation
node = EnhancedNodeExecutionMetadata(
    node_name="memory_test",
    operation="transform",
    status="success",
    duration=5.0,
    rows_out=100000,
    # TODO: Add memory values
)

efficiency = node.calculate_memory_efficiency()
print(f"Memory Efficiency: {efficiency:.2f} rows/MB")
print(f"\nMetadata Dict:")
import json
print(json.dumps(node.to_dict(), indent=2))

## Exercise 3: Create HTML Theme

Create a custom theme for your organization with specific branding.

Requirements:
- Use your organization's color palette
- Include company logo URL
- Add custom CSS for:
  - Rounded corners on cards
  - Gradient background on headers
  - Custom hover effects
- Include company name and footer text

In [None]:
from odibi.story.themes import StoryTheme

def create_organization_theme():
    """Create a branded theme for your organization."""
    
    # TODO: Define your organization's theme
    theme = StoryTheme(
        name="my_org_theme",
        # TODO: Add color scheme
        # TODO: Add typography
        # TODO: Add branding
        # TODO: Add custom CSS
    )
    
    return theme

# Test your theme
my_theme = create_organization_theme()
print("Theme CSS:")
print(my_theme.to_css_string())
print("\nCSS Variables:")
for var, value in my_theme.to_css_vars().items():
    print(f"  {var}: {value}")

## Exercise 4: Implement Story Diff

Create a function that compares two pipeline runs and generates a diff report.

Requirements:
- Compare two `PipelineStoryMetadata` objects
- Identify:
  - Performance changes (duration differences)
  - Data volume changes (row count differences)
  - New/removed nodes
  - Status changes (success -> failure, etc.)
- Generate a markdown report highlighting differences

In [None]:
from typing import List, Tuple

class StoryDiff:
    """Compare two pipeline story executions."""
    
    def __init__(self, before: PipelineStoryMetadata, after: PipelineStoryMetadata):
        self.before = before
        self.after = after
    
    def compare_performance(self) -> Dict[str, Any]:
        """Compare overall performance metrics."""
        # TODO: Compare durations, success rates, etc.
        pass
    
    def compare_nodes(self) -> Dict[str, List[str]]:
        """Identify node changes."""
        # TODO: Find added, removed, and changed nodes
        pass
    
    def compare_data_volume(self) -> Dict[str, Any]:
        """Compare data processing volumes."""
        # TODO: Compare row counts across pipeline
        pass
    
    def generate_report(self) -> str:
        """Generate markdown diff report."""
        # TODO: Create comprehensive diff report
        pass

# Test with two different runs
run1 = PipelineStoryMetadata(
    pipeline_name="test",
    duration=10.0,
    started_at="2024-01-15T10:00:00"
)
run1.add_node(NodeExecutionMetadata(
    node_name="extract",
    operation="read",
    status="success",
    duration=3.0,
    rows_out=1000
))
run1.add_node(NodeExecutionMetadata(
    node_name="transform",
    operation="filter",
    status="success",
    duration=7.0,
    rows_in=1000,
    rows_out=900
))

run2 = PipelineStoryMetadata(
    pipeline_name="test",
    duration=8.0,
    started_at="2024-01-15T11:00:00"
)
run2.add_node(NodeExecutionMetadata(
    node_name="extract",
    operation="read",
    status="success",
    duration=2.5,
    rows_out=1000
))
run2.add_node(NodeExecutionMetadata(
    node_name="transform",
    operation="filter",
    status="success",
    duration=5.5,
    rows_in=1000,
    rows_out=900
))

diff = StoryDiff(run1, run2)
report = diff.generate_report()
print(report)

## Exercise 5: Story Analytics Dashboard

Create a function that analyzes multiple story files and generates summary statistics.

Requirements:
- Read multiple JSON story files from a directory
- Calculate aggregate metrics:
  - Average pipeline duration
  - Most common failure points
  - Slowest nodes across all runs
  - Total data processed
- Generate a summary report with trends

In [None]:
from pathlib import Path
from collections import Counter, defaultdict
from typing import List
import json

class StoryAnalyzer:
    """Analyze multiple pipeline story executions."""
    
    def __init__(self):
        self.stories: List[Dict] = []
    
    def load_from_directory(self, directory: str):
        """Load all JSON story files from directory."""
        # TODO: Read all .json files and parse them
        pass
    
    def add_story(self, story_dict: Dict):
        """Add a story to the analysis."""
        self.stories.append(story_dict)
    
    def get_average_duration(self) -> float:
        """Calculate average pipeline duration."""
        # TODO: Calculate average
        pass
    
    def get_failure_hotspots(self) -> List[Tuple[str, int]]:
        """Find nodes that fail most frequently."""
        # TODO: Count failures by node name
        pass
    
    def get_slowest_nodes(self, top_n: int = 5) -> List[Tuple[str, float]]:
        """Find slowest nodes across all runs."""
        # TODO: Aggregate node durations
        pass
    
    def get_total_data_processed(self) -> int:
        """Calculate total rows processed across all runs."""
        # TODO: Sum all rows_processed
        pass
    
    def generate_dashboard(self) -> str:
        """Generate markdown analytics dashboard."""
        # TODO: Create comprehensive analytics report
        pass

# Test with sample data
analyzer = StoryAnalyzer()

# Add sample stories
for i in range(3):
    sample_story = {
        "pipeline_name": "test",
        "duration": 10.0 + i,
        "total_rows_processed": 5000,
        "nodes": [
            {"node_name": "extract", "status": "success", "duration": 3.0},
            {"node_name": "transform", "status": "failed" if i == 1 else "success", "duration": 7.0}
        ]
    }
    analyzer.add_story(sample_story)

dashboard = analyzer.generate_dashboard()
print(dashboard)

## Bonus Exercise: Interactive Story Viewer

Create a simple interactive story viewer using widgets (if in Jupyter).

Requirements:
- Dropdown to select from multiple story files
- Display story metadata in formatted output
- Show node-by-node breakdown
- Include expandable sections for details

In [None]:
# Optional: Requires ipywidgets
try:
    from ipywidgets import interact, Dropdown, Output
    from IPython.display import display, HTML, Markdown
    
    def create_story_viewer(stories: List[PipelineStoryMetadata]):
        """Create interactive story viewer."""
        # TODO: Implement interactive viewer
        pass
    
    # Test implementation
    print("Interactive viewer implementation here")
    
except ImportError:
    print("Interactive viewer requires: pip install ipywidgets")