# 07 - Transformation Decorator and Composition

## üß≠ Goal

Understand how the `@transformation` decorator works internally and how to compose transformations into pipelines.

This notebook will:
- Explain how the `@transformation` decorator attaches metadata to functions
- Show the decorator pattern: `@transformation(name, version, category, tags)`
- Demonstrate function metadata attachment and inspection
- Build a mini pipeline by composing multiple decorated functions
- Show how metadata flows through the pipeline
- Export pipeline execution metadata

**Estimated time:** 30 seconds

---

## üß± Core Concepts

**The Decorator Pattern:**
```python
@transformation(name="clean_data", version="1.0.0", category="cleaning")
def clean_data(df):
    return df.dropna()

# The decorator:
# 1. Attaches metadata to the function
# 2. Registers it in the global registry
# 3. Returns the original function (unchanged behavior)
```

**Function Composition:**
```python
# Chain transformations together
result = transform_c(transform_b(transform_a(df)))

# Each function carries metadata about what it does
```

## üîß Setup

In [None]:
# ‚úÖ Environment Setup
import os
from pathlib import Path
import pandas as pd
import json
from datetime import datetime

# Navigate to project root
project_root = Path.cwd().parent if Path.cwd().name == 'walkthroughs' else Path.cwd()
os.chdir(project_root)

# Create artifacts directory
artifacts_dir = Path('walkthroughs/.artifacts/07_decorator')
artifacts_dir.mkdir(parents=True, exist_ok=True)

# Import ODIBI transformation system
from odibi.transformations import get_registry, transformation

print("‚úÖ Environment ready")
print(f"üìÅ Artifacts: {artifacts_dir}")

## ‚ñ∂Ô∏è Run: Inspect Decorator Internals

In [None]:
# Create a simple transformation to inspect
@transformation("demo_transform", version="1.0.0", category="demo", tags=["example", "test"])
def demo_transform(df):
    """A simple demo transformation."""
    return df

print("üîç Inspecting Decorator Metadata:\n")

# Check function attributes
print(f"Function name: {demo_transform.__name__}")
print(f"Function docstring: {demo_transform.__doc__}")

# Get metadata from registry
registry = get_registry()
metadata = registry.get_metadata("demo_transform")

print("\nMetadata attached by decorator:")
for key, value in metadata.items():
    print(f"  ‚Ä¢ {key}: {value}")

print("\n‚úÖ The decorator enriches functions with metadata without changing behavior")

## üîç Inspect: Test Metadata Persistence

In [None]:
# Create a test DataFrame
df_test = pd.DataFrame({
    "id": [1, 2, 3],
    "value": [10, 20, 30]
})

print("üß™ Testing that decorated functions still work normally:\n")
print("Input DataFrame:")
print(df_test)

# Call the decorated function
result = demo_transform(df_test)

print("\nOutput DataFrame (unchanged):")
print(result)

print("\n‚úÖ Function behavior preserved despite decoration")
print("‚úÖ Metadata still accessible via registry")

## üé® Create: Build a Transformation Pipeline

In [None]:
# Step 1: Create sample data with messy names and missing values
df_raw = pd.DataFrame({
    "Product Name": ["Widget A", "Gadget B", "Gizmo C"],
    "Q1 Sales": [100, 150, 200],
    "Q2 Sales": [120, 160, 220],
    "Q3 Sales": [110, 140, 210]
})

print("üìä Original Data:")
print(df_raw)
print(f"Shape: {df_raw.shape}")

In [None]:
# Step 2: Define transformation functions with full metadata

@transformation(
    name="clean_column_names",
    version="1.0.0",
    category="cleaning",
    tags=["names", "standardization"]
)
def clean_column_names(df):
    """Convert column names to lowercase and replace spaces with underscores."""
    df = df.copy()
    df.columns = [col.lower().replace(" ", "_") for col in df.columns]
    return df

@transformation(
    name="add_total_column",
    version="1.0.0",
    category="aggregation",
    tags=["sum", "calculated"]
)
def add_total_column(df):
    """Add a total column summing all numeric columns."""
    df = df.copy()
    numeric_cols = df.select_dtypes(include=['number']).columns
    df['total'] = df[numeric_cols].sum(axis=1)
    return df

@transformation(
    name="normalize_values",
    version="1.0.0",
    category="normalization",
    tags=["scaling", "percentage"]
)
def normalize_values(df):
    """Convert numeric values to percentages of total."""
    df = df.copy()
    if 'total' in df.columns:
        numeric_cols = [col for col in df.select_dtypes(include=['number']).columns 
                       if col != 'total']
        for col in numeric_cols:
            df[f"{col}_pct"] = (df[col] / df['total'] * 100).round(2)
    return df

print("‚úÖ Created 3 custom transformations:")
print("  1. clean_column_names")
print("  2. add_total_column")
print("  3. normalize_values")

## üîó Compose: Chain Transformations Together

In [None]:
# Track pipeline execution
pipeline_steps = []
transformation_metadata = {}

print("üîó Composing Pipeline: clean ‚Üí add_total ‚Üí normalize\n")

# Step 1: Clean column names
df_step1 = clean_column_names(df_raw)
pipeline_steps.append({
    "step": 1,
    "transformation": "clean_column_names",
    "columns_in": list(df_raw.columns),
    "columns_out": list(df_step1.columns),
    "shape": str(df_step1.shape)
})
transformation_metadata["clean_column_names"] = registry.get_metadata("clean_column_names")

print("Step 1 - After clean_column_names:")
print(df_step1)
print()

# Step 2: Add total column
df_step2 = add_total_column(df_step1)
pipeline_steps.append({
    "step": 2,
    "transformation": "add_total_column",
    "columns_in": list(df_step1.columns),
    "columns_out": list(df_step2.columns),
    "shape": str(df_step2.shape)
})
transformation_metadata["add_total_column"] = registry.get_metadata("add_total_column")

print("Step 2 - After add_total_column:")
print(df_step2)
print()

# Step 3: Normalize values
df_final = normalize_values(df_step2)
pipeline_steps.append({
    "step": 3,
    "transformation": "normalize_values",
    "columns_in": list(df_step2.columns),
    "columns_out": list(df_final.columns),
    "shape": str(df_final.shape)
})
transformation_metadata["normalize_values"] = registry.get_metadata("normalize_values")

print("Step 3 - After normalize_values (FINAL):")
print(df_final)
print()

print(f"‚úÖ Pipeline complete: {df_raw.shape} ‚Üí {df_final.shape}")
print(f"‚úÖ Columns: {len(df_raw.columns)} ‚Üí {len(df_final.columns)}")

## üíæ Export Pipeline Artifacts

In [None]:
# Save final output
output_file = artifacts_dir / 'pipeline_output.parquet'
df_final.to_parquet(output_file, index=False)
print(f"‚úÖ Saved pipeline output: {output_file}")

# Save transformation metadata
metadata_file = artifacts_dir / 'transformation_metadata.json'
with open(metadata_file, 'w') as f:
    json.dump(transformation_metadata, f, indent=2)
print(f"‚úÖ Saved transformation metadata: {metadata_file}")

# Save pipeline execution steps
steps_file = artifacts_dir / 'pipeline_steps.json'
pipeline_summary = {
    "executed_at": datetime.now().isoformat(),
    "total_steps": len(pipeline_steps),
    "steps": pipeline_steps
}
with open(steps_file, 'w') as f:
    json.dump(pipeline_summary, f, indent=2)
print(f"‚úÖ Saved pipeline steps: {steps_file}")

print("\nüì¶ All artifacts exported successfully!")

## ‚úÖ Self-Check

In [None]:
import time
start_time = time.time()

try:
    # Check artifacts exist
    assert (artifacts_dir / 'pipeline_output.parquet').exists(), "pipeline_output.parquet not found"
    assert (artifacts_dir / 'transformation_metadata.json').exists(), "transformation_metadata.json not found"
    assert (artifacts_dir / 'pipeline_steps.json').exists(), "pipeline_steps.json not found"
    
    # Load and validate pipeline output
    df_check = pd.read_parquet(artifacts_dir / 'pipeline_output.parquet')
    expected_columns = ['product_name', 'q1_sales', 'q2_sales', 'q3_sales', 'total', 
                       'q1_sales_pct', 'q2_sales_pct', 'q3_sales_pct']
    assert list(df_check.columns) == expected_columns, f"Expected columns {expected_columns}, got {list(df_check.columns)}"
    assert len(df_check) == 3, f"Expected 3 rows, got {len(df_check)}"
    
    # Validate transformation metadata
    with open(artifacts_dir / 'transformation_metadata.json') as f:
        metadata = json.load(f)
    
    required_transforms = ['clean_column_names', 'add_total_column', 'normalize_values']
    for name in required_transforms:
        assert name in metadata, f"Missing metadata for {name}"
        assert 'name' in metadata[name], f"Missing 'name' in {name} metadata"
        assert 'version' in metadata[name], f"Missing 'version' in {name} metadata"
        assert 'category' in metadata[name], f"Missing 'category' in {name} metadata"
        assert 'tags' in metadata[name], f"Missing 'tags' in {name} metadata"
    
    # Validate pipeline steps
    with open(artifacts_dir / 'pipeline_steps.json') as f:
        steps = json.load(f)
    
    assert 'steps' in steps, "Missing 'steps' in pipeline_steps.json"
    assert len(steps['steps']) == 3, f"Expected 3 pipeline steps, got {len(steps['steps'])}"
    assert steps['total_steps'] == 3, "total_steps should be 3"
    
    # Check runtime
    elapsed = time.time() - start_time
    assert elapsed < 30, f"Runtime {elapsed:.1f}s exceeds 30s budget"
    
    print("üéâ Walkthrough verified successfully!")
    print(f"‚è±Ô∏è  Runtime: {elapsed:.2f}s")
    print(f"üìä Pipeline steps: {len(steps['steps'])}")
    print(f"üìã Transformations: {len(metadata)}")
    print("‚úÖ All checks passed!")
    
except AssertionError as e:
    print(f"‚ùå Walkthrough failed: {e}")
    raise
except Exception as e:
    print(f"‚ùå Unexpected error: {e}")
    raise

## üß† Reflection

### What You Learned

1. **Decorator Mechanics**: The `@transformation` decorator attaches metadata without changing function behavior
2. **Function Composition**: Transformations can be chained together to build data pipelines
3. **Metadata Flow**: Each transformation carries its own metadata (name, version, category, tags)
4. **Pipeline Tracking**: You can capture and export information about pipeline execution

### Where This Fits in ODIBI

```
Pipeline Construction:
YAML Definition ‚Üí Parser ‚Üí Compose Functions ‚Üí Execute Pipeline ‚Üí Track Metadata
                           ‚Üë
                   This notebook showed composition!
```

The decorator pattern makes functions **self-documenting** and **traceable**. When ODIBI executes a pipeline, it can track exactly which transformations ran, their versions, and their metadata.

### Key Insights

- **Decorators are wrappers**: They enhance functions with extra capabilities
- **Composition is powerful**: Complex pipelines are just functions calling functions
- **Metadata enables traceability**: You know exactly what happened to your data
- **Functions stay pure**: Decorated functions work exactly like regular functions

---

## ‚è≠ Next Steps

**Continue to:** [08_advanced_transformations.ipynb](08_advanced_transformations.ipynb)

Learn about advanced transformation patterns including error handling, validation, and parameterization.

**Deep dive:**
- Read `odibi/transformations/decorators.py` - The decorator implementation
- Read `odibi/core/pipeline.py` - How pipelines compose transformations
- Experiment with creating your own transformation chains