# Exercises: CLI Validation

**Practice building CLIs with Click**

---

## Exercise 1: Build `odibi list` Command

Create a command that lists available resources.

**Requirements:**
- `odibi list connections` - List all connections
- `odibi list pipelines` - List all pipelines
- `odibi list schemas` - List available schemas
- Add `--format` option (table, json, yaml)
- Add `--filter` option to filter by name pattern

**Example:**
```bash
$ odibi list connections
Connections:
  • prod_db (databricks)
  • staging_db (databricks)
  • warehouse (snowflake)

$ odibi list pipelines --filter "sales*"
Pipelines:
  • sales_etl (3 steps)
  • sales_report (5 steps)
```

In [None]:
import click
import yaml
import json
from typing import Dict, Any, List
import fnmatch

# Sample config for testing
SAMPLE_CONFIG = """
connections:
  prod_db:
    type: databricks
    catalog: main
  staging_db:
    type: databricks
    catalog: staging
  warehouse:
    type: snowflake
    account: abc123

pipelines:
  sales_etl:
    steps:
      - extract:
          sql: "SELECT * FROM sales"
      - transform:
          code: "df['revenue'] = df['price'] * df['quantity']"
      - load:
          table: "sales_clean"
  
  sales_report:
    steps:
      - extract:
          sql: "SELECT * FROM sales_clean"
      - aggregate:
          groupby: ["region"]
      - transform:
          code: "df['margin'] = df['revenue'] - df['cost']"
      - validate:
          checks: ["no_nulls"]
      - load:
          table: "sales_summary"
  
  customer_analysis:
    steps:
      - extract:
          sql: "SELECT * FROM customers"
      - transform:
          code: "df['lifetime_value'] = df['total_purchases'] * df['avg_order']"
"""

# YOUR CODE HERE
# Create the 'list' command group with subcommands
# Implement connections, pipelines, and schemas subcommands
# Add --format and --filter options











# Test your implementation
if __name__ == "__main__":
    from click.testing import CliRunner
    import tempfile
    import os
    
    runner = CliRunner()
    
    with runner.isolated_filesystem():
        # Create test config
        with open('config.yaml', 'w') as f:
            f.write(SAMPLE_CONFIG)
        
        # Test commands
        print("=" * 50)
        print("Test 1: List connections")
        print("=" * 50)
        # result = runner.invoke(odibi, ['list', 'connections', 'config.yaml'])
        # print(result.output)
        
        print("\n" + "=" * 50)
        print("Test 2: List pipelines with filter")
        print("=" * 50)
        # result = runner.invoke(odibi, ['list', 'pipelines', 'config.yaml', '--filter', 'sales*'])
        # print(result.output)
        
        print("\n" + "=" * 50)
        print("Test 3: List connections as JSON")
        print("=" * 50)
        # result = runner.invoke(odibi, ['list', 'connections', 'config.yaml', '--format', 'json'])
        # print(result.output)

---

## Exercise 2: Add Progress Bars

Enhance the `run` command with rich progress indication.

**Requirements:**
- Show progress bar for pipeline execution
- Display current step name
- Show timing for each step
- Add `--quiet` mode (no progress)
- Add `--summary` flag to show stats at end

**Example:**
```bash
$ odibi run config.yaml
Running pipeline: sales_etl
Progress [################................] 50% extract (1.2s)

Summary:
  Total time: 4.8s
  Steps completed: 3
  Rows processed: 10,234
```

In [None]:
import click
import time
from typing import List, Dict, Any

# Mock pipeline execution
class Pipeline:
    def __init__(self, name: str, steps: List[str]):
        self.name = name
        self.steps = steps
    
    def run_step(self, step_name: str) -> Dict[str, Any]:
        """Simulate step execution."""
        import random
        time.sleep(random.uniform(0.5, 2.0))
        return {
            'rows_processed': random.randint(1000, 50000),
            'duration': random.uniform(0.5, 2.0)
        }

# YOUR CODE HERE
# Create 'run' command with progress bars
# Add --quiet and --summary options
# Track timing and statistics











# Test your implementation
if __name__ == "__main__":
    runner = CliRunner()
    
    with runner.isolated_filesystem():
        # Create minimal config
        with open('config.yaml', 'w') as f:
            f.write('pipelines:\n  test:\n    steps: []\n')
        
        # Test
        # result = runner.invoke(run, ['config.yaml', '--summary'])
        # print(result.output)

---

## Exercise 3: Config Linter

Create a `lint` command that checks config quality.

**Requirements:**
- Check for common anti-patterns
- Detect missing best practices
- Report style issues
- Add `--fix` option to auto-fix issues
- Generate lint report

**Checks:**
- Unused connections
- Pipelines without descriptions
- Steps without explanations
- Missing validation steps
- Long pipelines (>10 steps)
- Hardcoded values in SQL

**Example:**
```bash
$ odibi lint config.yaml

❌ Errors (2):
  • sales_etl.step2: Missing explanation
  • customer_pipeline: No validation step

⚠️  Warnings (3):
  • Connection 'old_db' is unused
  • sales_etl: Long pipeline (12 steps)
  • Hardcoded date in SQL: '2024-01-01'

ℹ️  Suggestions:
  • Add pipeline descriptions
  • Consider splitting sales_etl into smaller pipelines
```

In [None]:
import click
import yaml
import re
from typing import List, Dict, Any
from dataclasses import dataclass

@dataclass
class LintResult:
    severity: str  # 'error', 'warning', 'info'
    location: str
    message: str
    rule: str
    fixable: bool = False

class ConfigLinter:
    """Lints Odibi configuration files."""
    
    def __init__(self, config: Dict[str, Any]):
        self.config = config
        self.results: List[LintResult] = []
    
    def lint(self) -> List[LintResult]:
        """Run all lint checks."""
        self.check_unused_connections()
        self.check_missing_descriptions()
        self.check_missing_explanations()
        self.check_missing_validation()
        self.check_long_pipelines()
        self.check_hardcoded_values()
        return self.results
    
    def check_unused_connections(self):
        """Check for connections that are never used."""
        # YOUR CODE HERE
        pass
    
    def check_missing_descriptions(self):
        """Check for pipelines without descriptions."""
        # YOUR CODE HERE
        pass
    
    def check_missing_explanations(self):
        """Check for steps without explanations."""
        # YOUR CODE HERE
        pass
    
    def check_missing_validation(self):
        """Check for pipelines without validation steps."""
        # YOUR CODE HERE
        pass
    
    def check_long_pipelines(self, max_steps: int = 10):
        """Check for overly long pipelines."""
        # YOUR CODE HERE
        pass
    
    def check_hardcoded_values(self):
        """Check for hardcoded dates, numbers in SQL."""
        # YOUR CODE HERE
        pass

# YOUR CODE HERE
# Create 'lint' command
# Use ConfigLinter to check config
# Format and display results
# Add --fix option











# Test your implementation
if __name__ == "__main__":
    test_config = {
        'connections': {
            'prod_db': {'type': 'databricks'},
            'old_db': {'type': 'postgres'},  # Unused
        },
        'pipelines': {
            'sales_etl': {
                'steps': [
                    {'extract': {'sql': "SELECT * FROM sales WHERE date = '2024-01-01'"}},
                    {'transform': {'code': 'df["x"] = 1'}},  # No explanation
                    {'load': {'table': 'output'}}
                ]
            }
        }
    }
    
    linter = ConfigLinter(test_config)
    results = linter.lint()
    
    print(f"Found {len(results)} issues:")
    for result in results:
        print(f"  {result.severity.upper()}: {result.location} - {result.message}")

---

## Exercise 4: Dry-Run Mode

Add `--dry-run` flag to `run` command.

**Requirements:**
- Show what would be executed without running
- Display execution plan
- Show which connections would be used
- Estimate runtime based on historical data
- Show data flow diagram

**Example:**
```bash
$ odibi run config.yaml --dry-run

Execution Plan:
================

Pipeline: sales_etl
Connection: prod_db (databricks)

Steps:
  1. extract
     • SQL: SELECT * FROM sales
     • Estimated rows: ~10,000
     • Estimated time: 2.3s
  
  2. transform
     • Operations: 3 columns added
     • Estimated time: 0.8s
  
  3. validate
     • Checks: no_nulls, range_check
     • Estimated time: 0.5s
  
  4. load
     • Target: sales_clean
     • Estimated time: 1.2s

Total estimated time: 4.8s

Data Flow:
  sales → extract → transform → validate → sales_clean

⚠️  This is a dry run. No data will be modified.
```

In [None]:
import click
import yaml
from typing import Dict, Any, List

class ExecutionPlanner:
    """Plans pipeline execution without running."""
    
    def __init__(self, config: Dict[str, Any]):
        self.config = config
    
    def plan(self, pipeline_name: str) -> Dict[str, Any]:
        """Generate execution plan."""
        # YOUR CODE HERE
        pass
    
    def estimate_step_time(self, step: Dict[str, Any]) -> float:
        """Estimate step execution time."""
        # YOUR CODE HERE
        # Use historical data, step type, complexity
        pass
    
    def estimate_row_count(self, step: Dict[str, Any]) -> int:
        """Estimate number of rows."""
        # YOUR CODE HERE
        pass
    
    def analyze_operations(self, step: Dict[str, Any]) -> List[str]:
        """Analyze what operations will be performed."""
        # YOUR CODE HERE
        pass
    
    def get_data_flow(self, pipeline_name: str) -> str:
        """Generate data flow diagram."""
        # YOUR CODE HERE
        pass

# YOUR CODE HERE
# Enhance 'run' command with --dry-run flag
# Use ExecutionPlanner to generate plan
# Display formatted execution plan











# Test your implementation
if __name__ == "__main__":
    test_config = """
connections:
  prod_db:
    type: databricks
    catalog: main

pipelines:
  sales_etl:
    connection: prod_db
    steps:
      - extract:
          sql: "SELECT * FROM sales"
      - transform:
          code: |
            df['revenue'] = df['price'] * df['quantity']
            df['margin'] = df['revenue'] - df['cost']
      - validate:
          checks:
            - no_nulls: ['revenue', 'margin']
      - load:
          table: "sales_clean"
"""
    
    runner = CliRunner()
    with runner.isolated_filesystem():
        with open('config.yaml', 'w') as f:
            f.write(test_config)
        
        # Test dry run
        # result = runner.invoke(run, ['config.yaml', '--dry-run'])
        # print(result.output)

---

## Bonus Challenge: Interactive Config Builder

Create `odibi init` command that interactively builds config.

**Requirements:**
- Ask user for project name
- Prompt for connections (with validation)
- Guide pipeline creation
- Suggest best practices
- Generate complete config file
- Add example pipelines

**Example:**
```bash
$ odibi init

Welcome to Odibi! Let's set up your project.

Project name: My Analytics Project

Add a connection? [y/n]: y
Connection name: prod_db
Connection type: 
  1. Databricks
  2. Snowflake
  3. Postgres
Choice [1]: 1

Databricks catalog: main
Databricks schema: analytics

✓ Connection 'prod_db' configured

Add another connection? [y/n]: n

Create a pipeline? [y/n]: y
Pipeline name: sales_etl
...

✅ Configuration saved to odibi_config.yaml
```

In [None]:
import click
import yaml
from typing import Dict, Any

# YOUR CODE HERE
# Create 'init' command with interactive prompts
# Use click.prompt() and click.confirm()
# Build config incrementally
# Generate YAML output











# Test
# Note: Can't fully test interactive mode in notebook
# But you can test with CliRunner and input parameter