# Understanding Narwhals Patterns for Backend-Agnostic Code

This tutorial demonstrates how to write robust, backend-agnostic DataFrame operations using Narwhals. We'll cover:

1. Core Narwhals Concepts
2. Expression-Based Operations
3. Type Safety and Validation
4. Lazy vs Eager Evaluation

## Core Narwhals Concepts

Narwhals provides three key mechanisms for backend-agnostic operations:

1. **@nw.narwhalify Decorator**:
   - Automatically handles backend conversion
   - Combines from_native() and to_native()
   - Supports eager_only=True for multiple inputs
   
   ```python
   # Example: Basic narwhalify usage
   @nw.narwhalify
   def process(df: FrameT) -> FrameT:
       return df.select([...])
   
   # Example: Eager execution for multiple inputs
   @nw.narwhalify(eager_only=True)
   def process_multi(df: FrameT, values: List) -> int:
       return df.filter(...).count()
   ```

2. **Manual Conversion** (Testing Only):
   - nw.from_native(): Convert to Narwhals format
   - nw.to_native(): Convert back to original backend
   - Only needed for testing or debugging
   
   ```python
   # Example: Manual conversion (testing only)
   df_nw = nw.from_native(df_pd)
   result = df_nw.select([...])
   df_pd = result.to_native()
   ```

3. **Execution Modes**:
   - Lazy: Operations are deferred (default)
   - Eager: Immediate execution (eager_only=True)
   - Mixed: Handle both with compute()/collect()
   
   ```python
   # Example: Lazy evaluation
   result = df.select([...])  # Operation is deferred
   
   # Example: Mixed mode handling
   if hasattr(result, "compute"):
       result = result.compute()
   elif hasattr(result, "collect"):
       result = result.collect()
   ```

Let's see these patterns in action with real examples.

In [3]:
import narwhals as nw
from narwhals.typing import FrameT, DataFrameT
from typing import Dict, List, Optional, Union, Any, Literal

# Pattern 1: Basic Column Operations
@nw.narwhalify
def calculate_stats(df: FrameT, value_col: str) -> FrameT:
    """Calculate statistics using Narwhals expressions.
    
    Key Pattern: Expression Chaining
    - Use nw.col() for column references
    - Chain operations for readability
    - Cast results to specific types
    
    Common Pitfalls:
    - Don't use df[col] - use nw.col(col)
    - Don't forget to cast aggregation results
    - Don't chain operations after select()
    
    Example:
    ```python
    # Good: Chain within select()
    df.select([
        nw.col("value").mean().cast(nw.Float64())
    ])
    
    # Bad: Chain after select()
    df.select([nw.col("value")]).mean()  # May fail
    ```
    """
    return df.select([
        # Mean with proper type casting
        nw.col(value_col)
           .mean()
           .cast(nw.Float64())  # Always cast aggregations
           .alias("mean"),      # Always alias results
           
        # Count nulls with type casting
        nw.col(value_col)
           .is_null()           # Check nulls first
           .sum()               # Then aggregate
           .cast(nw.Int64())    # Cast to integer
           .alias("null_count") # Clear alias
    ])

# Pattern 2: Group-by Operations
@nw.narwhalify
def group_and_aggregate(df: FrameT, group_col: str, value_col: str) -> FrameT:
    """Group-by operations with proper aggregation.
    
    Key Pattern: Group-by with Aggregation
    - Use group_by() for grouping
    - Chain agg() for aggregations
    - Sort results if needed
    
    Performance Notes:
    - group_by() materializes results
    - Sort after aggregation
    - Multiple aggregations in one agg()
    
    Example:
    ```python
    # Good: Multiple aggregations in one call
    df.group_by(group_col).agg([
        nw.col(value_col).mean(),
        nw.col(value_col).std()
    ])
    
    # Bad: Multiple group_by calls
    means = df.group_by(group_col).agg([...])
    stds = df.group_by(group_col).agg([...])  # Inefficient
    ```
    """
    return df.group_by(group_col).agg([
        # Multiple aggregations in one call
        nw.col(value_col).mean().alias("mean"),
        nw.col(value_col).std().alias("std"),
        nw.col(value_col).count().alias("count")
    ]).sort(group_col)  # Sort after aggregation

# Pattern 3: Horizontal Operations
@nw.narwhalify
def combine_columns(df: FrameT, col1: str, col2: str) -> FrameT:
    """Combine columns horizontally.
    
    Key Pattern: Horizontal Operations
    - Use with_columns for new columns
    - Use sum_horizontal for row-wise ops
    - Handle multiple columns together
    
    Backend Notes:
    - sum_horizontal works across backends
    - Column arithmetic (+, -, etc.) may vary
    - Check null handling per backend
    
    Example:
    ```python
    # Good: Use sum_horizontal
    df.with_columns([nw.sum_horizontal("a", "b")])
    
    # Also works: Column arithmetic
    df.with_columns([nw.col("a") + nw.col("b")])
    ```
    """
    return df.with_columns([
        # Two ways to combine columns
        nw.sum_horizontal(col1, col2).alias("sum"),        # Preferred
        (nw.col(col1) + nw.col(col2)).alias("sum_alt")    # Alternative
    ])

# Pattern 4: Multiple Inputs (Eager Only)
@nw.narwhalify(eager_only=True)
def filter_by_values(df: DataFrameT, values: List[Any], col_name: str) -> int:
    """Filter DataFrame using external values.
    
    Key Pattern: Eager Execution
    - Use eager_only=True for multiple inputs
    - Return Python types (int, float, etc.)
    - Handle external data structures
    
    Why eager_only=True?
    - Multiple inputs need immediate results
    - Can't defer with external data
    - Better for interactive analysis
    
    Example:
    ```python
    # Good: Eager execution with multiple inputs
    @nw.narwhalify(eager_only=True)
    def func(df: FrameT, values: List) -> int:
        return df.filter(...).count()
    
    # Bad: Lazy execution with multiple inputs
    @nw.narwhalify  # May fail or give unexpected results
    def func(df: FrameT, values: List) -> FrameT:
        return df.filter(...)
    ```
    """
    return df.filter(
        nw.col(col_name).is_in(values)  # Filter using external values
    ).select([
        nw.col(col_name).count().alias("count")  # Count matches
    ]).item()  # Get scalar result

## Testing the Patterns

Let's test these patterns with different backends to understand:
1. How Narwhals handles conversion
2. Backend-specific behavior
3. Error handling differences

In [4]:
import pandas as pd
import polars as pl
import pyarrow as pa

# Create test data with edge cases
data = {
    "group": ["A", "A", "B", "B", "C"],           # Groups for aggregation
    "value1": [10, None, 30, 40, 50],           # Has null value
    "value2": [1, 2, 3, None, 5]                # Has null value
}

# Test with different backends
df_pd = pd.DataFrame(data)    # Pandas backend
df_pl = pl.DataFrame(data)    # Polars backend

# Test 1: Basic Stats - Shows null handling
print("Basic Stats (Pandas):")
print(calculate_stats(df_pd, "value1"))
print("\nBasic Stats (Polars):")
print(calculate_stats(df_pl, "value1"))

# Test 2: Group-by - Shows aggregation
print("\nGroup Aggregation (Pandas):")
print(group_and_aggregate(df_pd, "group", "value1"))

# Test 3: Horizontal Ops - Shows null propagation
print("\nHorizontal Operations (Pandas):")
print(combine_columns(df_pd, "value1", "value2"))

# Test 4: Eager Operation - Shows immediate execution
print("\nFiltered Count (Pandas):")
print(filter_by_values(df_pd, ["A", "B"], "group"))

Basic Stats (Pandas):
   mean  null_count
0  32.5           1

Basic Stats (Polars):
shape: (1, 2)
┌──────┬────────────┐
│ mean ┆ null_count │
│ ---  ┆ ---        │
│ f64  ┆ i64        │
╞══════╪════════════╡
│ 32.5 ┆ 1          │
└──────┴────────────┘

Group Aggregation (Pandas):
  group  mean       std  count
0     A  10.0       NaN      1
1     B  35.0  7.071068      2
2     C  50.0       NaN      1

Horizontal Operations (Pandas):
  group  value1  value2   sum  sum_alt
0     A    10.0     1.0  11.0     11.0
1     A     NaN     2.0   2.0      NaN
2     B    30.0     3.0  33.0     33.0
3     B    40.0     NaN  40.0      NaN
4     C    50.0     5.0  55.0     55.0

Filtered Count (Pandas):
4


## Lazy vs Eager Evaluation

Narwhals supports both lazy and eager evaluation modes, each with specific use cases:

1. **Lazy Evaluation (Default)**
   - Operations are deferred until needed
   - Supports optimization across operations
   - Use compute() or collect() to materialize
   
   ```python
   # Example: Lazy chain of operations
   result = df.select([...])     # Deferred
            .filter([...])        # Still deferred
            .sort([...])         # Still deferred
   final = result.compute()      # Now executes
   ```

2. **Eager Evaluation**
   - Use eager_only=True for immediate execution
   - Required for multiple inputs
   - Better for interactive analysis
   
   ```python
   # Example: Eager execution
   @nw.narwhalify(eager_only=True)
   def get_count(df: FrameT) -> int:
       return df.select([...]).item()
   ```

3. **Mixed Mode Handling**
   - Check hasattr(df, "compute") or hasattr(df, "collect")
   - Handle both modes gracefully
   - Materialize only when needed
   
   ```python
   # Example: Handle both modes
   if hasattr(df, "compute"):
       df = df.compute()    # Dask style
   elif hasattr(df, "collect"):
       df = df.collect()    # Polars style
   ```

## Key Takeaways

1. **Core Operations**
   - select() for transformations
   - with_columns() for new columns
   - group_by().agg() for aggregations

2. **Type Safety**
   - Use class-based dtypes (nw.Int64, nw.Float64)
   - Cast aggregation results explicitly
   - Handle PyArrow scalars with .as_py()

3. **Execution Modes**
   - Default to lazy evaluation
   - Use eager_only when needed
   - Handle both modes safely

4. **Error Handling**
   - Validate inputs early
   - Use type hints properly
   - Handle backend differences

These patterns form the foundation for implementing TemporalScope's core utilities using Narwhals.