# Understanding Lazy vs Eager Evaluation in Narwhals

This notebook demonstrates when and why to use lazy vs eager evaluation in Narwhals, with practical examples from TemporalScope. We'll explore:

## When to Use Lazy Evaluation (Default)
By default, Narwhals uses lazy evaluation, which means operations don't execute until you need the results. This is great for:

1. **DataFrame Transformations**
   - Chain multiple operations together
   - Let Narwhals optimize the execution
   - Example: TemporalScope's target shifting operations

2. **Memory Efficiency**
   - Avoid storing intermediate results
   - Better for large datasets
   - Example: Processing time series data

## When to Use Eager Evaluation (eager_only=True)
Sometimes you need results immediately. Use eager evaluation when:

1. **Computing Metrics**
   - Need scalar values (counts, means)
   - Validation checks
   - Example: Checking null values before processing

2. **External Data**
   - Working with Python lists/dicts
   - Need immediate Python types
   - Example: Validation functions that return counts

Let's see these patterns in action with real examples from TemporalScope.

In [1]:
import narwhals as nw
from narwhals.typing import FrameT
import pandas as pd
import polars as pl
from typing import Tuple

# Sample data with missing values
data = {
    "temporal_index": range(1, 6),
    "observation": [10, None, 30, None, 50],  # Has null values
    "stratum": ["A", "A", "B", "B", "C"],  # Groups for aggregation
}

# Create DataFrames with different backends
df_pd = pd.DataFrame(data)
df_pl = pl.DataFrame(data)


# Example 1: Lazy Evaluation in Transformation Chains
@nw.narwhalify
def analyze_temporal_sequence(df: FrameT) -> FrameT:
    """Implementation of a lazy evaluation chain for temporal data analysis.

    This function demonstrates the advantages of operation fusion in temporal
    sequence processing. The entire transformation chain is optimized as a
    single execution plan.

    Parameters
    ----------
    df : FrameT
        Input DataFrame containing temporal sequences

    Returns
    -------
    FrameT
        Transformed DataFrame with computed metrics
    """
    print("Initializing transformation chain (deferred execution)")

    # Stage 1: Feature Engineering (deferred)
    result = df.select([nw.col("observation").alias("raw_value"), (nw.col("observation") * 2).alias("scaled_value")])

    # Stage 2: Missing Value Handling (deferred)
    result = result.filter(~nw.col("raw_value").is_null())

    print("Transformation chain constructed but not yet executed")
    return result


# Example 2: Eager Evaluation for Validation
@nw.narwhalify(eager_only=True)
def validate_temporal_sequence(df: FrameT, metric_col: str) -> Tuple[int, float]:
    """Implementation of eager evaluation for immediate metric computation.

    This function demonstrates the necessity of eager evaluation when computing
    validation metrics that require immediate materialization.

    Parameters
    ----------
    df : FrameT
        Input DataFrame containing temporal sequences
    metric_col : str
        Column name for metric computation

    Returns
    -------
    Tuple[int, float]
        Count of missing values and mean of valid observations
    """
    # Immediate computation of validation metrics
    metrics = df.select(
        [
            nw.col(metric_col).is_null().sum().cast(nw.Int64).alias("null_count"),
            nw.col(metric_col).mean().cast(nw.Float64).alias("mean_value"),
        ]
    )

    return (metrics.select([nw.col("null_count")]).item(), metrics.select([nw.col("mean_value")]).item())


# Example 3: Hybrid Approach (Mixed Evaluation)
@nw.narwhalify
def compute_temporal_aggregates(df: FrameT) -> FrameT:
    """Implementation of a hybrid evaluation strategy for temporal aggregation.

    This function demonstrates a practical combination of lazy and eager
    evaluation patterns in temporal data processing.

    Parameters
    ----------
    df : FrameT
        Input DataFrame containing temporal sequences

    Returns
    -------
    FrameT
        Aggregated results by stratum
    """
    # Stage 1: Lazy Transformation
    result = df.select([nw.col("observation").alias("value"), nw.col("stratum").alias("group")])

    # Stage 2: Lazy Aggregation
    return result.group_by("group").agg(
        [nw.col("value").mean().alias("stratum_mean"), nw.col("value").std().alias("stratum_std")]
    )

In [2]:
# Let's see these patterns in action

print("Example 1: Lazy Evaluation - Watch the Execution Order")
print("----------------------------------------")
# Notice: Nothing happens until we print
lazy_result = analyze_temporal_sequence(df_pd)
print("\nNow we trigger computation (Pandas):")
print(lazy_result)  # NOW it executes

print("\nSame behavior with Polars:")
print(analyze_temporal_sequence(df_pl))  # Works the same way

print("\nExample 2: Eager Evaluation - Immediate Results")
print("----------------------------------------")
# These execute right away because we need the values
pandas_metrics = validate_temporal_sequence(df_pd, "observation")
polars_metrics = validate_temporal_sequence(df_pl, "observation")
print(f"Pandas Metrics - Missing: {pandas_metrics[0]}, Mean: {pandas_metrics[1]:.2f}")
print(f"Polars Metrics - Missing: {polars_metrics[0]}, Mean: {polars_metrics[1]:.2f}")

print("\nExample 3: Hybrid Approach - Best of Both Worlds")
print("----------------------------------------")
# Lazy until we need results
temporal_results = compute_temporal_aggregates(df_pd)
print("\nFinal Results:")
print(temporal_results)

Example 1: Lazy Evaluation - Watch the Execution Order
----------------------------------------
Initializing transformation chain (deferred execution)
Transformation chain constructed but not yet executed

Now we trigger computation (Pandas):
   raw_value  scaled_value
0       10.0          20.0
2       30.0          60.0
4       50.0         100.0

Same behavior with Polars:
Initializing transformation chain (deferred execution)
Transformation chain constructed but not yet executed
shape: (3, 2)
┌───────────┬──────────────┐
│ raw_value ┆ scaled_value │
│ ---       ┆ ---          │
│ i64       ┆ i64          │
╞═══════════╪══════════════╡
│ 10        ┆ 20           │
│ 30        ┆ 60           │
│ 50        ┆ 100          │
└───────────┴──────────────┘

Example 2: Eager Evaluation - Immediate Results
----------------------------------------
Pandas Metrics - Missing: 2, Mean: 30.00
Polars Metrics - Missing: 2, Mean: 30.00

Example 3: Hybrid Approach - Best of Both Worlds
---------------

## Key Takeaways

1. **Use Lazy Evaluation When:**
   - Chaining DataFrame operations
   - Need optimization across operations
   - Working with large datasets
   - Example: TemporalScope's target shifting uses lazy evaluation for efficient data transformations

2. **Use Eager Evaluation When:**
   - Need immediate scalar values
   - Doing validation checks
   - Working with external data
   - Example: TemporalScope's validation functions use eager evaluation to check data quality

3. **Combine Both When:**
   - Complex pipelines need both patterns
   - Some operations need immediate results
   - Others can be optimized together
   - Example: TemporalScope's data loading combines validation (eager) with transformations (lazy)

Remember: The choice between lazy and eager evaluation can significantly impact your code's performance. Choose based on whether you need immediate results (eager) or can benefit from optimization (lazy).