# 11 - CLI Story Tools: Generate, Diff, List

## üß≠ Goal

Master ODIBI's CLI story commands to generate, compare, and manage pipeline execution stories.

This notebook will:
- Demonstrate `odibi story generate`, `odibi story list`, and `odibi story diff` commands
- Use subprocess to call CLI commands (with fallback to Python functions)
- Show how to compare pipeline runs across configuration changes
- Practice the story diffing workflow for pipeline debugging
- Track changes between pipeline executions

**Estimated time:** 30 seconds

---

## üß± Core Concepts

**Story Generation Workflow:**
```bash
# Generate a story from pipeline run
odibi story generate --run-dir ./runs/run_1 --output story.json

# List all available stories
odibi story list --runs-dir ./runs

# Compare two story runs
odibi story diff --run-1 ./runs/run_1 --run-2 ./runs/run_2
```

**Why Stories Matter:**
- Track what changed between pipeline runs
- Debug configuration changes
- Understand data transformations
- Audit pipeline behavior

## üîß Setup

In [None]:
# ‚úÖ Environment Setup
import sys
import os
from pathlib import Path
import pandas as pd
import json
import subprocess
import shutil
import time

# Navigate to project root
project_root = Path.cwd().parent if Path.cwd().name == 'walkthroughs' else Path.cwd()
os.chdir(project_root)

# Create artifacts directory
artifacts_dir = Path('walkthroughs/.artifacts/11_cli')
artifacts_dir.mkdir(parents=True, exist_ok=True)

# Create run directories
run1_dir = artifacts_dir / 'runs' / 'run_1'
run2_dir = artifacts_dir / 'runs' / 'run_2'
run1_dir.mkdir(parents=True, exist_ok=True)
run2_dir.mkdir(parents=True, exist_ok=True)

# Check if CLI is available
def check_cli():
    """Check if odibi CLI is available."""
    try:
        result = subprocess.run(['odibi', '--version'], capture_output=True, text=True, timeout=5)
        return result.returncode == 0
    except (FileNotFoundError, subprocess.TimeoutExpired):
        return False

cli_available = check_cli()

print(f"‚úÖ Environment ready")
print(f"üìÅ Artifacts: {artifacts_dir}")
print(f"üîß CLI available: {cli_available}")
if not cli_available:
    print("‚ö†Ô∏è  Will use Python fallback functions")

## üé® Create: Sample Pipeline Configurations

In [None]:
# Create first configuration (conservative threshold)
config_v1 = {
    "pipeline": "sales_analysis",
    "version": "1.0",
    "parameters": {
        "threshold": 100,
        "filter_type": "greater_than",
        "aggregation": "sum"
    },
    "nodes": [
        {"id": "load", "operation": "load_csv", "params": {"path": "data.csv"}},
        {"id": "filter", "operation": "filter_threshold", "params": {"threshold": 100}},
        {"id": "aggregate", "operation": "group_by", "params": {"by": "category"}}
    ]
}

# Create second configuration (higher threshold)
config_v2 = {
    "pipeline": "sales_analysis",
    "version": "2.0",
    "parameters": {
        "threshold": 250,  # Changed!
        "filter_type": "greater_than",
        "aggregation": "sum"
    },
    "nodes": [
        {"id": "load", "operation": "load_csv", "params": {"path": "data.csv"}},
        {"id": "filter", "operation": "filter_threshold", "params": {"threshold": 250}},  # Changed!
        {"id": "aggregate", "operation": "group_by", "params": {"by": "category"}}
    ]
}

# Save configurations
with open(artifacts_dir / 'config_v1.yaml', 'w') as f:
    f.write(f"# Configuration v1.0 - Threshold: {config_v1['parameters']['threshold']}\n")
    json.dump(config_v1, f, indent=2)

with open(artifacts_dir / 'config_v2.yaml', 'w') as f:
    f.write(f"# Configuration v2.0 - Threshold: {config_v2['parameters']['threshold']}\n")
    json.dump(config_v2, f, indent=2)

print("üìù Created pipeline configurations:")
print(f"   v1: threshold={config_v1['parameters']['threshold']}")
print(f"   v2: threshold={config_v2['parameters']['threshold']}")
print(f"\n‚úÖ Configs saved to {artifacts_dir}")

## ‚ñ∂Ô∏è Run: Generate Stories (with CLI or Fallback)

In [None]:
# Create sample story data for both runs
def create_story(run_name, threshold, row_count):
    """Generate a sample story JSON."""
    return {
        "run_id": run_name,
        "timestamp": time.strftime("%Y-%m-%d %H:%M:%S"),
        "config": {
            "threshold": threshold,
            "filter_type": "greater_than"
        },
        "execution": [
            {"node": "load", "status": "success", "rows_in": 1000, "rows_out": 1000},
            {"node": "filter", "status": "success", "rows_in": 1000, "rows_out": row_count},
            {"node": "aggregate", "status": "success", "rows_in": row_count, "rows_out": 5}
        ],
        "summary": {
            "total_rows_processed": 1000,
            "final_rows": 5,
            "filtered_out": 1000 - row_count
        }
    }

# Generate stories using CLI or fallback
def generate_story_cli(run_dir, config, threshold, row_count):
    """Generate story using CLI or Python fallback."""
    story_path = run_dir / 'story.json'
    
    if cli_available:
        try:
            # Try CLI command
            cmd = ['odibi', 'story', 'generate', '--run-dir', str(run_dir), '--output', str(story_path)]
            result = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
            if result.returncode == 0:
                print(f"‚úÖ CLI: Generated story for {run_dir.name}")
                return True
        except (subprocess.TimeoutExpired, Exception) as e:
            print(f"‚ö†Ô∏è  CLI failed: {e}")
    
    # Fallback: Generate story directly
    story = create_story(run_dir.name, threshold, row_count)
    with open(story_path, 'w') as f:
        json.dump(story, f, indent=2)
    print(f"‚úÖ Python: Generated story for {run_dir.name}")
    return False

# Generate both stories
print("üîÑ Generating stories...\n")
used_cli_1 = generate_story_cli(run1_dir, config_v1, 100, 750)
time.sleep(0.1)  # Ensure different timestamps
used_cli_2 = generate_story_cli(run2_dir, config_v2, 250, 400)

print(f"\nüì¶ Stories created:")
print(f"   - {run1_dir / 'story.json'}")
print(f"   - {run2_dir / 'story.json'}")
print(f"\nüîß Method: {'CLI' if (used_cli_1 or used_cli_2) else 'Python fallback'}")

## üìã List: Discover All Available Stories

In [None]:
# List stories using CLI or fallback
def list_stories_cli(runs_dir):
    """List stories using CLI or Python fallback."""
    output_path = artifacts_dir / 'story_list.txt'
    
    if cli_available:
        try:
            cmd = ['odibi', 'story', 'list', '--runs-dir', str(runs_dir)]
            result = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
            if result.returncode == 0:
                with open(output_path, 'w') as f:
                    f.write(result.stdout)
                print("‚úÖ CLI: Listed all stories")
                print(result.stdout)
                return
        except (subprocess.TimeoutExpired, Exception) as e:
            print(f"‚ö†Ô∏è  CLI failed: {e}")
    
    # Fallback: List stories with Python
    stories = []
    for run_dir in runs_dir.iterdir():
        if run_dir.is_dir():
            story_file = run_dir / 'story.json'
            if story_file.exists():
                with open(story_file) as f:
                    story = json.load(f)
                stories.append({
                    'run': run_dir.name,
                    'timestamp': story.get('timestamp', 'unknown'),
                    'threshold': story.get('config', {}).get('threshold', 'N/A'),
                    'rows_processed': story.get('summary', {}).get('total_rows_processed', 0)
                })
    
    output = "üìã Available Stories:\n\n"
    for s in stories:
        output += f"  ‚Ä¢ {s['run']:10} | {s['timestamp']} | threshold={s['threshold']:3} | rows={s['rows_processed']}\n"
    
    print(output)
    with open(output_path, 'w') as f:
        f.write(output)
    print("‚úÖ Python: Listed all stories")

print("üîç Listing all available stories...\n")
list_stories_cli(artifacts_dir / 'runs')
print(f"\nüíæ List saved to: {artifacts_dir / 'story_list.txt'}")

## üîç Diff: Compare Two Pipeline Runs

In [None]:
# Diff stories using CLI or fallback
def diff_stories_cli(run1_dir, run2_dir):
    """Diff two stories using CLI or Python fallback."""
    output_path = artifacts_dir / 'diff_run_1_run_2.txt'
    
    if cli_available:
        try:
            cmd = ['odibi', 'story', 'diff', '--run-1', str(run1_dir), '--run-2', str(run2_dir)]
            result = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
            if result.returncode == 0:
                with open(output_path, 'w') as f:
                    f.write(result.stdout)
                print("‚úÖ CLI: Generated diff")
                print(result.stdout)
                return
        except (subprocess.TimeoutExpired, Exception) as e:
            print(f"‚ö†Ô∏è  CLI failed: {e}")
    
    # Fallback: Diff stories with Python
    with open(run1_dir / 'story.json') as f:
        story1 = json.load(f)
    with open(run2_dir / 'story.json') as f:
        story2 = json.load(f)
    
    diff_output = f"üîç Story Diff: {run1_dir.name} ‚Üí {run2_dir.name}\n\n"
    diff_output += "="*60 + "\n\n"
    
    # Compare configurations
    diff_output += "üìù Configuration Changes:\n"
    threshold1 = story1.get('config', {}).get('threshold', 'N/A')
    threshold2 = story2.get('config', {}).get('threshold', 'N/A')
    
    if threshold1 != threshold2:
        diff_output += f"  ‚ö†Ô∏è  threshold changed: {threshold1} ‚Üí {threshold2}\n"
    else:
        diff_output += f"  ‚úì threshold unchanged: {threshold1}\n"
    
    diff_output += "\nüìä Execution Comparison:\n"
    
    # Compare execution nodes
    for i, (node1, node2) in enumerate(zip(story1.get('execution', []), story2.get('execution', []))):
        node_name = node1.get('node', f'node_{i}')
        rows1 = node1.get('rows_out', 0)
        rows2 = node2.get('rows_out', 0)
        
        if rows1 != rows2:
            diff_output += f"  ‚ö†Ô∏è  {node_name}: {rows1} ‚Üí {rows2} rows (difference: {rows2-rows1:+d})\n"
        else:
            diff_output += f"  ‚úì {node_name}: {rows1} rows (unchanged)\n"
    
    # Summary comparison
    diff_output += "\nüìà Summary Differences:\n"
    filtered1 = story1.get('summary', {}).get('filtered_out', 0)
    filtered2 = story2.get('summary', {}).get('filtered_out', 0)
    diff_output += f"  ‚Ä¢ Filtered out: {filtered1} ‚Üí {filtered2} (difference: {filtered2-filtered1:+d})\n"
    
    print(diff_output)
    with open(output_path, 'w') as f:
        f.write(diff_output)
    print("‚úÖ Python: Generated diff")

print("‚öñÔ∏è  Comparing run_1 vs run_2...\n")
diff_stories_cli(run1_dir, run2_dir)
print(f"\nüíæ Diff saved to: {artifacts_dir / 'diff_run_1_run_2.txt'}")

## üîé Inspect: View Story Details

In [None]:
# Load and display both stories
print("üìñ Story Details:\n")
print("=" * 60)

for run_dir in [run1_dir, run2_dir]:
    with open(run_dir / 'story.json') as f:
        story = json.load(f)
    
    print(f"\nüè∑Ô∏è  {story['run_id'].upper()}")
    print(f"‚è∞ Timestamp: {story['timestamp']}")
    print(f"‚öôÔ∏è  Threshold: {story['config']['threshold']}")
    print(f"üìä Processed: {story['summary']['total_rows_processed']} rows")
    print(f"üóëÔ∏è  Filtered: {story['summary']['filtered_out']} rows")
    print(f"‚úÖ Final: {story['summary']['final_rows']} rows")
    print("-" * 60)

print("\n‚úÖ Both stories loaded and inspected")

## ‚úÖ Self-Check

In [None]:
start_time = time.time()

try:
    # Check run directories exist
    assert run1_dir.exists(), f"Run 1 directory not found: {run1_dir}"
    assert run2_dir.exists(), f"Run 2 directory not found: {run2_dir}"
    
    # Check story files exist
    assert (run1_dir / 'story.json').exists(), "story.json not found in run_1"
    assert (run2_dir / 'story.json').exists(), "story.json not found in run_2"
    
    # Check diff file exists
    diff_file = artifacts_dir / 'diff_run_1_run_2.txt'
    assert diff_file.exists(), "diff_run_1_run_2.txt not found"
    
    # Validate diff contains meaningful content
    with open(diff_file) as f:
        diff_content = f.read().lower()
    assert 'changed' in diff_content or 'difference' in diff_content, "Diff file doesn't contain change indicators"
    
    # Check list output exists
    list_file = artifacts_dir / 'story_list.txt'
    assert list_file.exists(), "story_list.txt not found"
    
    # Validate list contains both runs
    with open(list_file) as f:
        list_content = f.read()
    assert 'run_1' in list_content, "run_1 not found in story list"
    assert 'run_2' in list_content, "run_2 not found in story list"
    
    # Check config files exist
    assert (artifacts_dir / 'config_v1.yaml').exists(), "config_v1.yaml not found"
    assert (artifacts_dir / 'config_v2.yaml').exists(), "config_v2.yaml not found"
    
    # Validate story JSON structure
    with open(run1_dir / 'story.json') as f:
        story1 = json.load(f)
    assert 'run_id' in story1, "Missing 'run_id' in story"
    assert 'execution' in story1, "Missing 'execution' in story"
    assert 'summary' in story1, "Missing 'summary' in story"
    
    # Check runtime
    elapsed = time.time() - start_time
    assert elapsed < 30, f"Runtime {elapsed:.1f}s exceeds 30s budget"
    
    print("üéâ Walkthrough verified successfully!")
    print(f"‚è±Ô∏è  Runtime: {elapsed:.2f}s")
    print(f"üìÅ Artifacts created: {len(list(artifacts_dir.rglob('*')))} files")
    print(f"üîß CLI used: {cli_available}")
    print(f"‚úÖ All checks passed!")
    
except AssertionError as e:
    print(f"‚ùå Walkthrough failed: {e}")
    raise
except Exception as e:
    print(f"‚ùå Unexpected error: {e}")
    raise

## üß† Reflection

### What You Learned

1. **Story Generation**: How to generate execution stories from pipeline runs using CLI or Python
2. **Story Listing**: Discovering all available pipeline run stories in a directory
3. **Story Diffing**: Comparing two pipeline runs to identify configuration and data changes
4. **CLI Fallback Pattern**: Gracefully handling CLI unavailability with Python fallbacks

### Where This Fits in ODIBI

```
Pipeline Development Cycle:
Run Pipeline ‚Üí Generate Story ‚Üí List Stories ‚Üí Compare Runs ‚Üí Debug Changes
                     ‚Üë              ‚Üë              ‚Üë
              This notebook covered these steps!
```

Stories are **audit trails** for your pipeline executions. They help you understand what changed, why results differ, and how configurations impact data processing.

### Key Insights

- **CLI + Fallback**: Always provide fallback when external commands might fail
- **Diff-Driven Development**: Use story diffs to understand pipeline evolution
- **Automation**: Story generation can be automated in CI/CD pipelines
- **Debugging**: Comparing stories reveals subtle configuration bugs

---

## ‚è≠ Next Steps

**Continue to:** [12_advanced_pipeline_composition.ipynb](12_advanced_pipeline_composition.ipynb)

Learn how to compose complex pipelines with branching, merging, and conditional execution.

**Deep dive:**
- Read `odibi/cli/story.py` - CLI story command implementation
- Read `odibi/story/generator.py` - Story generation logic
- Read `odibi/story/diff.py` - Story diffing algorithm