# Complex Narwhals Functions: Time Operations and Aggregations

This notebook demonstrates advanced Narwhals patterns used in TemporalScope, focusing on:

1. **Time-Based Operations**
   - Converting between datetime and numeric representations
   - Validating temporal uniqueness
   - Handling mixed-frequency time series data

2. **Complex Aggregations**
   - Efficient null/NaN checking with lazy evaluation
   - Multi-column type validation
   - Time-based sorting with different backends

We'll show both lazy and eager evaluation patterns for each operation type.

In [1]:
import narwhals as nw
from narwhals.typing import FrameT
import pandas as pd
import polars as pl
from typing import Dict, List, Optional, Union, Any, Literal

# Create sample time series data with mixed frequencies and null values
data = {
    "patient_id": [1, 1, 1, 2, 2],
    "timestamp": pd.date_range("2023-01-01", periods=5, freq="D"),
    "value": [10.0, None, 30.0, None, 50.0],
    "category": ["A", "A", "B", "B", "C"]
}

# Create DataFrames with different backends
df_pd = pd.DataFrame(data)
df_pl = pl.DataFrame(data)

# Example 1: Time Column Conversion (Lazy)
@nw.narwhalify
def convert_time_column(df: FrameT, time_col: str, to_type: Literal["numeric", "datetime"]) -> FrameT:
    """Convert time column between datetime and numeric formats.
    
    This demonstrates lazy evaluation for time column conversion.
    Operations are deferred until the result is needed.
    
    Why Lazy?
    - Time conversions are often part of a larger transformation chain
    - Allows Narwhals to optimize the entire operation sequence
    - No need for immediate results, can be computed when needed
    """
    print(f"Initializing {to_type} conversion (deferred execution)")
    
    if to_type == "numeric":
        # Convert datetime to Unix timestamp (microseconds)
        result = df.with_columns([
            nw.col(time_col).dt.timestamp(time_unit="us")
               .cast(nw.Float64())
               .alias(time_col)
        ])
    else:
        # Convert numeric to datetime
        result = df.with_columns([
            nw.col(time_col).cast(nw.Datetime())
               .alias(time_col)
        ])
    
    print("Conversion chain constructed but not yet executed")
    return result

# Example 2: Null Value Checking (Eager)
@nw.narwhalify(eager_only=True)
def check_column_nulls(df: FrameT, columns: List[str]) -> Dict[str, int]:
    """Check for null values in specified columns.
    
    This demonstrates eager evaluation for immediate validation results.
    Results are computed immediately due to eager_only=True.
    
    Why Eager?
    - Validation needs immediate results to proceed
    - Returns Python types (Dict[str, int]) that can't be lazy
    - Used in TemporalScope's validation checks before processing
    """
    result = {}
    for col in columns:
        # Immediate computation of null counts
        null_check = df.select([
            nw.col(col).is_null().sum().cast(nw.Int64).alias("null_count")
        ])
        result[col] = null_check.select([nw.col("null_count")]).item()
    return result

# Example 3: Temporal Uniqueness Validation (Mixed)
@nw.narwhalify
def validate_unique_timestamps(df: FrameT, time_col: str, group_col: Optional[str] = None) -> FrameT:
    """Validate and report temporal uniqueness.
    
    This demonstrates a hybrid approach:
    - Lazy: Group and count operations (can be optimized)
    - Eager: Final validation check (needed immediately)
    
    Why Mixed?
    - Heavy computations (grouping) benefit from lazy optimization
    - Final validation needs immediate results
    - Common pattern in TemporalScope's data loading
    """
    print("Starting temporal uniqueness check (mixed evaluation)")
    
    # Stage 1: Group by time (and optionally group_col) - Lazy
    group_cols = [time_col] if group_col is None else [group_col, time_col]
    counts = df.group_by(group_cols).agg([
        nw.col(time_col).count().alias("count")
    ])
    
    # Stage 2: Filter for duplicates - Lazy
    duplicates = counts.filter(nw.col("count") > 1)
    
    print("Uniqueness check chain constructed")
    return duplicates  # Only computed when needed

In [2]:
# Let's demonstrate these patterns with real examples

print("Example 1: Time Column Conversion (Lazy)")
print("-" * 50)
# Convert to numeric (operations deferred)
numeric_time = convert_time_column(df_pd, "timestamp", "numeric")
print("\nNumeric Timestamp Result (Pandas):")
print(numeric_time)  # NOW it executes

# Convert back to datetime (operations deferred)
datetime_time = convert_time_column(numeric_time, "timestamp", "datetime")
print("\nDatetime Result (Pandas):")
print(datetime_time)  # NOW it executes

print("\nExample 2: Null Value Checking (Eager)")
print("-" * 50)
# These execute immediately because we need the results
pandas_nulls = check_column_nulls(df_pd, ["value", "category"])
polars_nulls = check_column_nulls(df_pl, ["value", "category"])
print(f"Pandas Nulls: {pandas_nulls}")
print(f"Polars Nulls: {polars_nulls}")

print("\nExample 3: Temporal Uniqueness (Mixed)")
print("-" * 50)
# Check uniqueness by patient (lazy until print)
duplicates = validate_unique_timestamps(df_pd, "timestamp", "patient_id")
print("\nDuplicate Timestamps by Patient:")
print(duplicates)  # NOW it executes

Example 1: Time Column Conversion (Lazy)
--------------------------------------------------
Initializing numeric conversion (deferred execution)
Conversion chain constructed but not yet executed

Numeric Timestamp Result (Pandas):
   patient_id     timestamp  value category
0           1  1.672531e+15   10.0        A
1           1  1.672618e+15    NaN        A
2           1  1.672704e+15   30.0        B
3           2  1.672790e+15    NaN        B
4           2  1.672877e+15   50.0        C
Initializing datetime conversion (deferred execution)
Conversion chain constructed but not yet executed

Datetime Result (Pandas):
   patient_id  timestamp  value category
0           1 2023-01-01   10.0        A
1           1 2023-01-02    NaN        A
2           1 2023-01-03   30.0        B
3           2 2023-01-04    NaN        B
4           2 2023-01-05   50.0        C

Example 2: Null Value Checking (Eager)
--------------------------------------------------
Pandas Nulls: {'value': np.int64(2), 

## Key Takeaways

1. **Time Operations (Lazy Evaluation)**
   - Use lazy evaluation for time conversions to optimize chains of operations
   - Let Narwhals handle backend-specific datetime implementations
   - Operations only execute when results are needed
   - Example: TemporalScope's time column conversions are lazy to allow optimization

2. **Validation Checks (Eager Evaluation)**
   - Use eager evaluation when immediate results are needed (like null checks)
   - Perfect for validation that must happen before proceeding
   - Results are computed right away
   - Example: TemporalScope's data validation needs immediate results

3. **Complex Operations (Mixed Evaluation)**
   - Combine lazy and eager patterns for optimal performance
   - Use lazy for heavy computations that can be optimized
   - Use eager for final validation or results
   - Example: TemporalScope's data loading combines validation (eager) with transformations (lazy)

These patterns are used throughout TemporalScope to ensure efficient and correct handling of temporal data across different backends. The choice between lazy and eager evaluation significantly impacts performance:

- **Lazy Evaluation**: Better for large datasets and complex transformations
- **Eager Evaluation**: Better for validation and immediate results
- **Mixed Approach**: Best for real-world scenarios with both needs