# YAML Configuration - Exercises

Practice what you've learned! Solutions in [solutions.ipynb](solutions.ipynb).

## Exercise 1: Fix the Broken YAML

This YAML has **5 errors**. Find and fix them!

In [None]:
import yaml

broken_yaml = """
project: Data Pipeline
version: 1.0  # Should be string, not float
enabled: YES  # Should be lowercase true/false

connections
  database:  # Missing colon above
    host: localhost
    port: "5432"  # Should be int, not string
  storage:
  type: local  # Wrong indentation
    path: /data

countries:
  - NO  # Norway problem - becomes False!
  - SE
"""

# Fix it here:
fixed_yaml = """
# Your fixed YAML here
"""

# Test
data = yaml.safe_load(fixed_yaml)
print(data)

## Exercise 2: Use Anchors to DRY Up Config

Refactor this repetitive config using anchors & aliases.

In [None]:
repetitive_yaml = """
dev_database:
  host: dev.db.com
  port: 5432
  timeout: 30
  pool_size: 5
  retry_attempts: 3

staging_database:
  host: staging.db.com
  port: 5432
  timeout: 30
  pool_size: 5
  retry_attempts: 3

prod_database:
  host: prod.db.com
  port: 5432
  timeout: 60  # Only difference: longer timeout
  pool_size: 10  # And larger pool
  retry_attempts: 3
"""

# Refactor using anchors:
refactored_yaml = """
# Your refactored YAML here using &anchor and *alias
"""

# Test
data = yaml.safe_load(refactored_yaml)
assert data['dev_database']['port'] == 5432
assert data['prod_database']['timeout'] == 60
print("✅ Refactored successfully!")

## Exercise 3: Create a Pydantic Schema

Write a Pydantic model to validate this Odibi-style pipeline config.

In [None]:
from pydantic import BaseModel, Field, validator
from typing import List, Dict, Optional, Literal

pipeline_yaml = """
project: Sales ETL
engine: pandas

connections:
  data:
    type: local
    base_path: ./data

retry:
  max_attempts: 3
  backoff_seconds: 2.0

pipelines:
  - pipeline: bronze_to_silver
    nodes:
      - name: load_sales
        operation: read
      - name: clean_sales
        operation: transform
        depends_on: [load_sales]
"""

# Define your schemas:
class ConnectionConfig(BaseModel):
    # TODO: Add fields
    pass

class RetryConfig(BaseModel):
    # TODO: Add fields with validation
    pass

class NodeConfig(BaseModel):
    # TODO: Add fields
    pass

class PipelineConfig(BaseModel):
    # TODO: Add fields
    pass

class AppConfig(BaseModel):
    # TODO: Add fields
    pass

# Test
data = yaml.safe_load(pipeline_yaml)
config = AppConfig(**data)
print(config.model_dump_json(indent=2))

## Exercise 4: Environment Variable Substitution

Implement a function to load YAML with environment variable substitution AND defaults.

In [None]:
import os
import re

def load_yaml_with_env(yaml_string: str) -> dict:
    """
    Load YAML and replace:
    - ${VAR} with environment variable
    - ${VAR:-default} with environment variable OR default if not set
    
    Example:
      host: ${DB_HOST:-localhost}  # Uses localhost if DB_HOST not set
    """
    # TODO: Implement this function
    pass

# Test
os.environ['DB_HOST'] = 'production.db.com'
# DB_PORT not set - should use default

test_yaml = """
database:
  host: ${DB_HOST}
  port: ${DB_PORT:-5432}
  timeout: ${TIMEOUT:-30}
"""

config = load_yaml_with_env(test_yaml)
assert config['database']['host'] == 'production.db.com'
assert config['database']['port'] == 5432  # Default used
assert config['database']['timeout'] == 30
print("✅ All tests passed!")

## Exercise 5: Multi-Environment Configs

Create a system to load environment-specific configs (dev/staging/prod).

In [None]:
from pathlib import Path

def load_config_for_env(base_path: Path, env: str) -> dict:
    """
    Load config with inheritance:
    1. Load base.yaml (shared settings)
    2. Load {env}.yaml (environment-specific)
    3. Merge them (env overrides base)
    
    Example structure:
      config/
        base.yaml
        dev.yaml
        prod.yaml
    """
    # TODO: Implement this function
    # Hint: Use dict update or recursive merge
    pass

# Test (create test files first!)
config_dir = Path('example_configs/multi_env')
config_dir.mkdir(exist_ok=True)

# Create base.yaml
base_config = """
project: My Project
timeout: 30
log_level: INFO
"""

# Create prod.yaml
prod_config = """
timeout: 60  # Override
log_level: WARNING  # Override
replicas: 3  # New field
"""

config = load_config_for_env(config_dir, 'prod')
assert config['timeout'] == 60  # Overridden
assert config['project'] == 'My Project'  # From base
assert config['replicas'] == 3  # From prod
print("✅ Multi-environment config works!")

## Exercise 6: Analyze Odibi Config

Write code to extract insights from Odibi's YAML configs.

In [None]:
def analyze_odibi_config(yaml_path: str) -> dict:
    """
    Analyze an Odibi pipeline config and return:
    - Total number of pipelines
    - Total number of nodes
    - Connection types used
    - Formats used (csv, parquet, delta, etc.)
    - Nodes with dependencies (depends_on)
    """
    # TODO: Implement this function
    pass

# Test with real Odibi config
odibi_path = r'c:\Users\hodibi\OneDrive - Ingredion\Desktop\Repos\Odibi\examples\example_delta_pipeline.yaml'
analysis = analyze_odibi_config(odibi_path)

print("Odibi Config Analysis:")
print(f"  Pipelines: {analysis['total_pipelines']}")
print(f"  Nodes: {analysis['total_nodes']}")
print(f"  Connection types: {analysis['connection_types']}")
print(f"  Formats: {analysis['formats']}")
print(f"  Nodes with dependencies: {analysis['nodes_with_deps']}")

## Bonus Exercise: Config Linter

Build a linter that checks for common YAML config mistakes.

In [None]:
def lint_yaml_config(yaml_path: Path) -> List[str]:
    """
    Check for common issues:
    - Hardcoded passwords/secrets
    - Missing required fields
    - Inconsistent naming (snake_case vs camelCase)
    - Unquoted values that might be misinterpreted (NO, YES, 1.0)
    - TODO/FIXME comments
    
    Returns list of warnings.
    """
    warnings = []
    # TODO: Implement checks
    return warnings

# Test
warnings = lint_yaml_config(Path('example_configs/basic.yaml'))
for warning in warnings:
    print(f"⚠️  {warning}")