# Exercises: Odibi Configuration System

Practice extending and working with Odibi's config architecture.

## Exercise 1: Add New Enum - PartitionStrategy

**Goal**: Create a new enum for partition strategies used in data writing.

**Requirements**:
1. Create `PartitionStrategy` enum with values:
   - `NONE` = "none"
   - `HASH` = "hash"
   - `RANGE` = "range"
   - `DATE` = "date"
2. Inherit from `str, Enum`
3. Add docstring
4. Test with Pydantic model

In [None]:
from enum import Enum
from pydantic import BaseModel, ValidationError

# TODO: Create PartitionStrategy enum

# TODO: Create test model
# class PartitionConfig(BaseModel):
#     strategy: PartitionStrategy
#     columns: List[str] = Field(default_factory=list)

# TODO: Test valid and invalid values

## Exercise 2: Add Partitioning to WriteConfig

**Goal**: Extend `WriteConfig` to support partitioning.

**Requirements**:
1. Add optional `partition_strategy` field (use your enum from Exercise 1)
2. Add optional `partition_columns` field (list of strings)
3. Add model validator: if `partition_strategy` is not `NONE`, `partition_columns` must be provided
4. Test all scenarios

In [None]:
from typing import Optional, List, Dict, Any
from pydantic import BaseModel, Field, model_validator

# Copy your PartitionStrategy from Exercise 1

class WriteMode(str, Enum):
    OVERWRITE = "overwrite"
    APPEND = "append"

# TODO: Create enhanced WriteConfig
# class WriteConfig(BaseModel):
#     connection: str
#     format: str
#     table: Optional[str] = None
#     path: Optional[str] = None
#     mode: WriteMode = WriteMode.OVERWRITE
#     options: Dict[str, Any] = Field(default_factory=dict)
#     
#     # TODO: Add partition fields
#     # TODO: Add validators

# TODO: Test cases
# 1. No partitioning (valid)
# 2. Partitioning with columns (valid)
# 3. Partitioning without columns (invalid - should error)

## Exercise 3: Create ScheduleConfig

**Goal**: Add scheduling configuration for pipelines.

**Requirements**:
1. Create `ScheduleType` enum:
   - `MANUAL` = "manual"
   - `CRON` = "cron"
   - `INTERVAL` = "interval"
2. Create `ScheduleConfig` model:
   - `type: ScheduleType`
   - `cron_expression: Optional[str]` (required if type is CRON)
   - `interval_minutes: Optional[int]` (required if type is INTERVAL)
3. Add model validator to enforce the requirements above
4. Test all three schedule types

In [None]:
# TODO: Implement ScheduleType enum

# TODO: Implement ScheduleConfig with validation

# TODO: Test cases:
# 1. Manual schedule (no extra fields needed)
# 2. Cron schedule with expression
# 3. Interval schedule with minutes
# 4. Cron without expression (should fail)
# 5. Interval without minutes (should fail)

## Exercise 4: Discriminated Union - Storage Configs

**Goal**: Create different storage configurations based on storage type.

**Requirements**:
1. Create `StorageType` enum: `S3`, `GCS`, `AZURE`, `LOCAL`
2. Create base `BaseStorageConfig` with:
   - `type: StorageType`
   - `encryption: bool = False`
3. Create specific configs:
   - `S3StorageConfig`: bucket, region, access_key_id (optional)
   - `GCSStorageConfig`: bucket, project
   - `AzureStorageConfig`: account_name, container, sas_token (optional)
   - `LocalStorageConfig`: base_path
4. Create union type `StorageConfig`
5. Test each storage type validates correctly

In [None]:
from typing import Union

# TODO: Implement storage config hierarchy

# TODO: Test each storage type

## Exercise 5: Field Validator - Validate Cron Expression

**Goal**: Add field-level validation for cron expressions.

**Requirements**:
1. Extend `ScheduleConfig` from Exercise 3
2. Add `@field_validator` for `cron_expression`
3. Validate it has exactly 5 parts (minute, hour, day, month, weekday)
4. Each part should be either `*`, a number, or a range
5. Provide clear error messages

**Example valid expressions**:
- `"0 0 * * *"` (daily at midnight)
- `"*/15 * * * *"` (every 15 minutes)
- `"0 9-17 * * 1-5"` (hourly, 9am-5pm, weekdays)

In [None]:
from pydantic import field_validator
import re

# TODO: Enhance ScheduleConfig with field validator
# class ScheduleConfig(BaseModel):
#     type: ScheduleType
#     cron_expression: Optional[str] = None
#     interval_minutes: Optional[int] = None
#     
#     @field_validator("cron_expression")
#     @classmethod
#     def validate_cron(cls, v):
#         # TODO: Implement validation
#         pass

# TODO: Test valid and invalid cron expressions

## Exercise 6: Complete Pipeline Config with Extensions

**Goal**: Create a complete pipeline config using all your new features.

**Requirements**:
1. Add `schedule` field to `PipelineConfig` (use your `ScheduleConfig`)
2. Create YAML for a pipeline with:
   - Schedule (cron-based)
   - Multiple nodes
   - At least one write operation with partitioning
3. Load and validate it
4. Print summary of the pipeline

In [None]:
import yaml

# TODO: Define enhanced PipelineConfig

# TODO: Create YAML configuration
pipeline_yaml = """
# Your YAML here
"""

# TODO: Load and validate

# TODO: Print summary

## Exercise 7: Error Message Quality

**Goal**: Practice writing helpful validation errors.

**Requirements**:
1. Create a config model with complex validation
2. Each validation error should:
   - Clearly state what's wrong
   - Suggest what to do instead
   - Include relevant context (e.g., available options)
3. Test multiple error scenarios

**Example**:
```python
raise ValueError(
    f"Connection '{conn_name}' not found. "
    f"Available connections: {', '.join(available)}. "
    f"Add it to the 'connections' section of your YAML."
)
```

In [None]:
# TODO: Create config model with helpful error messages

# TODO: Test and show error messages

## Exercise 8: Config Inheritance

**Goal**: Implement config inheritance/overrides.

**Requirements**:
1. Create a `merge_configs` function that:
   - Takes a base config and override config
   - Returns merged config (override takes precedence)
   - Handles nested dictionaries
2. Use it to implement environment-specific overrides
3. Test with dev/prod configs

**Example**:
```python
base = {"logging": {"level": "INFO", "structured": False}}
prod = {"logging": {"level": "ERROR"}}
merged = merge_configs(base, prod)
# Result: {"logging": {"level": "ERROR", "structured": False}}
```

In [None]:
def merge_configs(base: Dict[str, Any], override: Dict[str, Any]) -> Dict[str, Any]:
    """Merge two config dictionaries, override takes precedence."""
    # TODO: Implement
    pass

# TODO: Test with various scenarios

## Exercise 9: Config Testing Framework

**Goal**: Write comprehensive tests for your configs.

**Requirements**:
1. Create test functions for:
   - Valid configs (should succeed)
   - Invalid configs (should fail with specific errors)
   - Edge cases (empty lists, missing optionals, etc.)
2. Use pytest-style assertions
3. Test at least 3 different config models

In [None]:
# TODO: Write test functions

def test_write_config_valid():
    """Test valid WriteConfig."""
    # TODO: Implement
    pass

def test_write_config_missing_table_and_path():
    """Test WriteConfig with neither table nor path fails."""
    # TODO: Implement
    pass

# TODO: Add more tests

# Run tests
# test_write_config_valid()
# test_write_config_missing_table_and_path()

## Bonus Exercise: Complete Feature - Data Quality Config

**Goal**: Design and implement a complete data quality configuration system.

**Requirements**:
1. Create enums for:
   - `QualityCheckType`: `SCHEMA`, `COMPLETENESS`, `UNIQUENESS`, `RANGE`, `CUSTOM`
   - `Severity`: `WARNING`, `ERROR`, `CRITICAL`
2. Create config models:
   - `SchemaCheck`: Expected column names and types
   - `CompletenessCheck`: Columns that can't be null, min fill rate
   - `UniquenessCheck`: Columns that must be unique
   - `RangeCheck`: Min/max values for numeric columns
   - `CustomCheck`: SQL expression or Python function
3. Create `QualityConfig` that includes list of checks
4. Add to `NodeConfig` as optional field
5. Create comprehensive YAML example
6. Write tests for all check types

This is a realistic production feature - take your time!

In [None]:
# TODO: Implement complete data quality config system

# Your implementation here

---

## ðŸŽ¯ Learning Checklist

After completing these exercises, you should be able to:

- [ ] Create and use custom enums with Pydantic
- [ ] Write field validators with constraints and patterns
- [ ] Write model validators for cross-field validation
- [ ] Create discriminated unions for polymorphic configs
- [ ] Design clear, helpful error messages
- [ ] Test configuration models thoroughly
- [ ] Work with nested Pydantic models
- [ ] Load and validate YAML configurations
- [ ] Extend existing config systems with new features

**Solutions**: See `solutions.ipynb` for reference implementations.