# Narwhals Machine Learning Patterns

Narwhals provides a unified interface for DataFrame operations across Pandas, Polars, and Dask. This tutorial outlines patterns for building backend-agnostic functions tailored to typical machine learning workflows, including data validation, feature engineering, and time series processing. These patterns help create scalable and maintainable pipelines suitable for both prototyping and production, benefiting both OSS and enterprise-grade developers.

## Overview of Patterns

1. **Backward Compatability Policy**:  
   Narwhals ensures stability for library maintainers with strict backward compatibility. Code written with `import narwhals.stable.v1 as nw` will remain functional indefinitely, even if breaking changes occur. Breaking changes are isolated in `narwhals.stable.v2` or higher, while `narwhals.stable.v1` remains unaffected, enabling safe coexistence of multiple versions.

2. **Data Validation**:  
   Use `@nw.narwhalify` to decorate functions for consistent backend-agnostic validation. Leverage `nw.col(...)` for type casting, null handling, and basic statistics across backends.
3. **Feature Engineering**:  
   Chain transformations like `.cast(nw.Float64())` or `.fill_null(...)` to preprocess numeric and categorical data. Defer materialization until necessary to optimize memory usage.
4. **Time Series Handling**:  
   Validate temporal data by grouping with `.group_by([id_col, time_col])`, checking for uniqueness or applying rolling windows. Maintain backend independence without additional logic.
5. **Workflow Optimization**:  
   Begin with lazy mode (e.g., `nw.from_native(dask_df)`) for scalability and switch to eager mode using `.collect()` or `.to_native()` for tasks needing immediate results.
6. **OSS and Enterprise Notes**:  
   Use tools like Hatch to manage lean environments for production and comprehensive setups for testing.  
   - Define lean setups in `[tool.hatch.envs.default]` for minimal dependencies (e.g., Narwhals and Pandas).  
   - Use `[tool.hatch.envs.test]` for broader testing across multiple backends (e.g., Polars and Dask).  
   Handle unsupported objects gracefully by setting `pass_through=True` or `strict=False`.

In [2]:
import narwhals as nw
from narwhals.typing import FrameT
import pandas as pd
import polars as pl
import dask.dataframe as dd
import numpy as np
from typing import Dict, List, Optional, Union, Any

## Pattern 1: Backward Compatibility Policy

This pattern demonstrates how Narwhals guarantees **backward compatibility**, ensuring stability for production-grade workflows and eliminating breaking changes across updates.

1. **Stable Namespace**: Code written with `narwhals.stable.v1` remains functional indefinitely, even if breaking changes occur in Narwhals or its backends.
2. **Version Migration**: Developers can adopt new features by explicitly switching to updated namespaces, such as `narwhals.stable.v2`.
3. **Integration**: Multiple Narwhals versions can coexist in a single project, ensuring smooth collaboration without dependency conflicts.


In [3]:
# Import the stable API for guaranteed compatibility
import narwhals.stable.v1 as nw
from narwhals.typing import IntoFrameT

# Example dataset
data = {"feature1": [1, 2, 3], "feature2": [4, 5, 6]}

# Workflow using Narwhals stable.v1
def backward_compatible_workflow(df: IntoFrameT) -> IntoFrameT:
    """Use Narwhals stable.v1 API to process data."""
    # Convert to Narwhals lazy frame
    df_nw = nw.from_native(df)
    
    # Perform transformations
    df_transformed = df_nw.select([
        nw.col("feature1").mean().alias("mean_feature1"),
        nw.col("feature2").sum().alias("sum_feature2"),
    ])
    
    # Convert back to native format (e.g., Pandas)
    return df_transformed.to_native()

# Testing the backward-compatible workflow
import pandas as pd
df_pd = pd.DataFrame(data)
result = backward_compatible_workflow(df_pd)

print("Result using Narwhals stable API v1:")
print(result)


Result using Narwhals stable API v1:
   mean_feature1  sum_feature2
0            2.0            15


## Pattern 1: Lazy-to-Eager Frame Transitions

This pattern demonstrates how Narwhals provides a unified interface for transitioning between lazy and eager evaluation, regardless of the backend. There's no need to distinguish between different materialization methods - Narwhals handles this automatically. The example shows:

- **Unified Collection**: Using `collect()` provides a consistent way to materialize results, automatically handling the transition from any lazy backend (like Dask) to an eager Pandas DataFrame
- **Lazy Operations**: Start with any lazy backend for memory-efficient processing of large datasets, with operations optimized and deferred until needed
- **Backend Transitions**: Narwhals automatically manages the transition from lazy (e.g., Dask) to eager (Pandas) evaluation, simplifying ML workflows
- **ML Integration**: Final `to_native()` call provides the DataFrame in the format needed by ML libraries


In [4]:
# Create sample ML dataset
data = {
    "numeric_feature": [1.5, 2.0, None, 4.0, 5.5],    # Has missing value
    "categorical_feature": ["A", "B", "A", "C", "B"]  # Needs encoding
}
df_pd = pd.DataFrame(data)

# Start with any lazy backend (e.g., Dask)
df_dask = dd.from_pandas(df_pd, npartitions=2)

# Unified lazy-to-eager pattern
df_nw = nw.from_native(df_dask)           # Works with any lazy frame
df_processed = df_nw.select([             # Lazy operations
    nw.col("numeric_feature")
       .fill_null(0)
       .cast(nw.Float64())
       .alias("numeric_feature"),
    nw.col("categorical_feature")
       .cast(nw.String())
       .alias("categorical_feature")
])
df_collected = df_processed.collect()      # Unified collection - no compute() needed
df_pandas = df_collected.to_native()       # Ready for ML libraries

print("Original Dask DataFrame (lazy):")
print(df_dask)
print("\nProcessed Pandas DataFrame (eager):")
print(df_pandas)

Original Dask DataFrame (lazy):
Dask DataFrame Structure:
              numeric_feature categorical_feature
npartitions=2                                    
0                     float64              string
3                         ...                 ...
4                         ...                 ...
Dask Name: frompandas, 1 expression
Expr=df

Processed Pandas DataFrame (eager):
   numeric_feature categorical_feature
0              1.5                   A
1              2.0                   B
2              0.0                   A
3              4.0                   C
4              5.5                   B


## Pattern 2: Data Validation - Handling Unsupported Types

This pattern demonstrates how to build robust data validation pipelines that can handle unexpected or unsupported data types. In data engineering workflows, you often need to validate and process data from various sources that may contain custom objects, mixed types, or other non-standard formats. The pass_through parameter provides explicit control over validation behavior:

- **Development Mode**: Using `pass_through=True` enables initial data exploration and debugging by allowing inspection of problematic data types. This is crucial when investigating data quality issues or understanding new data sources.
- **Production Mode**: Using `pass_through=False` (default) enforces strict validation rules in production pipelines, preventing unexpected data types from silently propagating through the system and causing downstream issues.
- **Error Recovery**: Both modes provide clear error messages that help identify and handle data quality issues at the appropriate stage of your pipeline.
- **Pipeline Integration**: Choose development mode during data exploration and pipeline development, then switch to production mode for robust, production-grade data validation.


In [5]:
class CustomObject:
    def __init__(self, value):
        self.value = value
    def __str__(self):
        return f"Custom({self.value})"

# Create DataFrame with unsupported objects
data = {
    "feature1": [1, 2, 3],
    "unsupported": [CustomObject(1), CustomObject(2), CustomObject(3)]
}
df_pd = pd.DataFrame(data)

# Development mode - allows inspection
print("Development Mode (pass_through=True):")
print("Expected: Can load and view all data")
df_nw1 = nw.from_native(df_pd, pass_through=True)
print(df_nw1.to_native())

print("\nTrying operations on normal column:")
print("Expected: Should work normally")
result = df_nw1.select([
    nw.col("feature1").cast(nw.Float64())
])
print(result.to_native())

print("\nTrying operations on unsupported column:")
print("Expected: Should fail gracefully")
try:
    result = df_nw1.select([
        nw.col("unsupported").cast(nw.Float64())
    ])
except Exception as e:
    print(f"Error: Cannot cast custom objects to Float64")

# Production mode - strict type checking
print("\nProduction Mode (pass_through=False):")
print("Expected: Should fail on unsupported types")
try:
    df_nw2 = nw.from_native(df_pd, pass_through=False)
    result = df_nw2.select([
        nw.col("unsupported").cast(nw.Float64())
    ])
except Exception as e:
    print("Error: Unsupported types not allowed in strict mode")

Development Mode (pass_through=True):
Expected: Can load and view all data
   feature1 unsupported
0         1   Custom(1)
1         2   Custom(2)
2         3   Custom(3)

Trying operations on normal column:
Expected: Should work normally
   feature1
0       1.0
1       2.0
2       3.0

Trying operations on unsupported column:
Expected: Should fail gracefully
Error: Cannot cast custom objects to Float64

Production Mode (pass_through=False):
Expected: Should fail on unsupported types
Error: Unsupported types not allowed in strict mode


## Pattern 3: Data Validation

This pattern shows how to validate data types and quality across DataFrame operations in a backend-agnostic way. By using Narwhals native types and operations, you can ensure consistent validation behavior regardless of the underlying DataFrame implementation. The example shows:

- **Type Safety**: Using Narwhals native types (Float64, String) ensures consistent type handling across backends, preventing type-related errors in ML pipelines
- **Validation Workflow**: Backend-agnostic operations for checking nulls, type compatibility, and data quality enable robust validation pipelines
- **Error Handling**: Graceful error recovery and clear error messages help identify data quality issues early in the pipeline
- **ML Integration**: Consistent validation behavior across training and inference ensures reliable model deployment

Note: Dask backend requires different handling for lazy evaluation. See Pattern 1 for lazy-to-eager transition patterns.

In [5]:
# Create sample data with validation issues
data = {
    'numeric': [1, 2, None, 4],           # Has null
    'mixed': ['1', '2', 'bad', '4'],      # Has invalid value
}

# Test across backends
backends = {
    'Pandas': pd.DataFrame(data),
    'Polars': pl.DataFrame(data)
}

for name, df in backends.items():
    print(f"\nTesting {name} backend:")
    print("=" * 50)
    df_nw = nw.from_native(df)
    
    # Validate numeric column (should succeed with nulls)
    print("\nValidating numeric column with nulls:")
    print("Expected: Should compute mean and null count")
    try:
        result = df_nw.select([
            nw.col("numeric")
               .cast(nw.Float64())
               .mean()
               .alias("mean"),
            nw.col("numeric")
               .is_null()
               .sum()
               .alias("nulls")
        ])
        print(f"Success - Mean: {result['mean'].item()}")
        print(f"Success - Null count: {result['nulls'].item()}")
    except Exception as e:
        print(f"Failed as expected: {str(e)}")
    
    # Try invalid column (should fail with type error)
    print("\nTrying to convert invalid strings to float:")
    print("Expected: Should fail with type conversion error")
    try:
        result = df_nw.select([
            nw.col("mixed")
               .cast(nw.Float64())
               .mean()
        ])
        print("Unexpected success!")
    except Exception as e:
        print(f"Failed as expected: {str(e)}")


Testing Pandas backend:

Validating numeric column with nulls:
Expected: Should compute mean and null count
Success - Mean: 2.3333333333333335
Success - Null count: 1

Trying to convert invalid strings to float:
Expected: Should fail with type conversion error
Failed as expected: could not convert string to float: 'bad'

Testing Polars backend:

Validating numeric column with nulls:
Expected: Should compute mean and null count
Success - Mean: 2.3333333333333335
Success - Null count: 1

Trying to convert invalid strings to float:
Expected: Should fail with type conversion error
Failed as expected: conversion from `str` to `f64` failed in column 'mixed' for 1 out of 4 values: ["bad"]


## Pattern 4: Feature Engineering & Collect-Then-Item Pattern

This pattern demonstrates how to build efficient feature engineering pipelines for ML workflows. A critical challenge in ML pipelines is handling both lazy backends (like Dask for large datasets) and eager backends (like Pandas for interactive development). The key is following a consistent materialization pattern when you need concrete values (like computing means for imputation).

Key principles for consistent feature engineering:
- **Consistent Materialization**: Always check if collect() is needed before item() - use `result.item()` for eager frames and `result.collect().item()` for lazy frames. This pattern ensures your code works correctly whether using Dask for large-scale processing or Pandas for development.
- **Evaluation Strategy**: Use `eager_only=True` for functions that compute statistics (means, medians, etc.) since these require materialized values. Keep other transformations (type casting, null filling) lazy to let Narwhals optimize them.
- **Backend Independence**: By following the collect-then-item pattern, your functions work automatically with any backend - they'll use collect() when needed (Dask) and skip it when unnecessary (Pandas).
- **Memory Efficiency**: Only use the materialization pattern when you absolutely need concrete values (like computing statistics). Let other operations stay lazy so Narwhals can optimize them together.


In [13]:
# Create sample data with preprocessing needs
data = {
    'integer_feature': [1, 2, None, 4, 5],           # Needs mean imputation
    'category_messy': ['a', 'B', None, 'c', 'b']     # Needs standardization
}

# Test across backends
backends = {
    'Pandas': pd.DataFrame(data),
    'Polars': pl.DataFrame(data),
    'Dask': dd.from_pandas(pd.DataFrame(data), npartitions=2)  # Lazy Dask frame
}

@nw.narwhalify(eager_only=True)  # Decorator handles materialization consistently
def process_numeric_feature(df: FrameT, column: str) -> FrameT:
    """Process a numeric feature for ML.
    
    Common ML transformations:
    - Convert to float
    - Fill nulls with mean
    - Standardize format
    """
    # Get mean first - handle both lazy and eager cases
    result = df.select([
        nw.col(column)
           .cast(nw.Float64())
           .mean()
    ])
    mean_val = result.item() if not hasattr(result, 'collect') else result.collect().item()
    
    # Then use it for filling nulls
    return df.select([
        nw.col(column)
           .cast(nw.Float64())
           .fill_null(mean_val)
           .alias(column)
    ])

@nw.narwhalify  # Default lazy evaluation for chainable operations
def process_categorical_feature(df: FrameT, column: str) -> FrameT:
    """Process a categorical feature for ML.
    
    Common ML transformations:
    - Handle nulls first
    - Standardize case
    - Ensure string format
    """
    return df.select([
        nw.col(column)
           .fill_null("UNKNOWN")     # Handle nulls before casting
           .cast(nw.String())
           .str.to_uppercase()
           .alias(column)
    ])

# Test features across backends
for name, df in backends.items():
    print(f"\n{name} backend:")
    print("=" * 50)
    df_nw = nw.from_native(df)  # Works with any backend (lazy or eager)
    
    print("\nProcessing numeric feature:")
    print("Expected: Nulls filled with mean value, cast to Float64")
    numeric_result = process_numeric_feature(df_nw, "integer_feature")
    print(numeric_result)
    
    print("\nProcessing categorical feature:")
    print("Expected: Uppercase strings, nulls filled with UNKNOWN")
    categorical_result = process_categorical_feature(df_nw, "category_messy")
    print(categorical_result)



Pandas backend:

Processing numeric feature:
Expected: Nulls filled with mean value, cast to Float64
   integer_feature
0              1.0
1              2.0
2              3.0
3              4.0
4              5.0

Processing categorical feature:
Expected: Uppercase strings, nulls filled with UNKNOWN
  category_messy
0              A
1              B
2        UNKNOWN
3              C
4              B

Polars backend:

Processing numeric feature:
Expected: Nulls filled with mean value, cast to Float64
shape: (5, 1)
┌─────────────────┐
│ integer_feature │
│ ---             │
│ f64             │
╞═════════════════╡
│ 1.0             │
│ 2.0             │
│ 3.0             │
│ 4.0             │
│ 5.0             │
└─────────────────┘

Processing categorical feature:
Expected: Uppercase strings, nulls filled with UNKNOWN
shape: (5, 1)
┌────────────────┐
│ category_messy │
│ ---            │
│ str            │
╞════════════════╡
│ A              │
│ B              │
│ UNKNOWN        │
│ C 

## Pattern 5: Time Series Validation & Materialization

This pattern demonstrates how to validate temporal data quality in ML/DS workflows. Time series data processing often requires complex transformations (interpolation, resampling, windowing) that need to work efficiently across different scales - from interactive analysis in Pandas to large-scale processing in Dask. By handling lazy and eager evaluation consistently, you can write temporal validation code once and use it across your entire ML pipeline.

Key principles for time series validation:
- **Consistent Materialization**: Follow the collect-then-item pattern when materializing results, ensuring consistent behavior across Pandas (eager) and Dask (lazy) backends
- **Entity Grouping**: Handle multiple time series efficiently by grouping operations, letting Narwhals optimize the execution plan
- **Data Quality**: Detect and report duplicate timestamps that could skew temporal analysis, using lazy evaluation for memory efficiency
- **Backend Independence**: Write validation functions that work identically whether processing historical data with Dask or real-time data with Pandas

The output shows the same validation working seamlessly across backends (Pandas, Polars, Dask), with each backend displaying results in its native format while maintaining consistent behavior. This enables developers to focus on temporal logic rather than backend-specific implementations.

In [16]:
# Generate hourly timestamps with duplicates
base_timestamps = pd.date_range(
    start="2023-01-01",
    periods=3,
    freq="h"
)
timestamps = [
    base_timestamps[0],  # First timestamp
    base_timestamps[0],  # Duplicate for id=1
    base_timestamps[1],
    base_timestamps[0],  # Duplicate for id=2
    base_timestamps[2],
]

# Create synthetic dataset with known properties
data = {
    # Entity identifier and temporal index
    'id': [1, 1, 1, 2, 2],
    'timestamp': timestamps,
    
    # Numeric features for validation
    'feature1': [1.0, 2.0, None, 4.0, 5.0],     # Float with missing values
    'feature2': [1.5, 2.5, 3.5, None, 5.5],     # Float with missing values
    'feature3': [10, 20, 30, 40, 50],           # Integer without missing values
}

@nw.narwhalify
def validate_temporal_uniqueness(df: FrameT, id_col: str, time_col: str) -> FrameT:
    """Validate temporal uniqueness within entity groups.
    
    Parameters
    ----------
    df : FrameT
        Input DataFrame with time series data
    id_col : str
        Column name for entity identifier
    time_col : str
        Column name for temporal index
    
    Returns
    -------
    FrameT
        DataFrame containing any duplicate timestamps found
    """
    # Group by entity and timestamp - stays lazy for efficiency
    counts = df.group_by([id_col, time_col]).agg([
        nw.col(time_col).count().alias("count")
    ])
    
    # Filter for duplicates - still lazy until results needed
    return counts.filter(nw.col("count") > 1)

# Initialize DataFrames for each backend
df_pd = pd.DataFrame(data)           # Pandas backend
df_pl = pl.DataFrame(data)          # Polars backend
df_dask = dd.from_pandas(df_pd, npartitions=2)  # Dask backend

# Test across backends
print("Temporal Uniqueness Validation Results")
print("=" * 50)

for name, df in [
    ("Pandas", df_pd),
    ("Polars", df_pl),
    ("Dask", df_dask)
]:
    print(f"\n{name} Backend Results:")
    print("-" * 30)
    df_nw = nw.from_native(df)
    result = validate_temporal_uniqueness(df_nw, "id", "timestamp")
    print(result)

Temporal Uniqueness Validation Results

Pandas Backend Results:
------------------------------
   id  timestamp  count
0   1 2023-01-01      2

Polars Backend Results:
------------------------------
shape: (1, 3)
┌─────┬─────────────────────┬───────┐
│ id  ┆ timestamp           ┆ count │
│ --- ┆ ---                 ┆ ---   │
│ i64 ┆ datetime[μs]        ┆ u32   │
╞═════╪═════════════════════╪═══════╡
│ 1   ┆ 2023-01-01 00:00:00 ┆ 2     │
└─────┴─────────────────────┴───────┘

Dask Backend Results:
------------------------------
Dask DataFrame Structure:
                  id       timestamp  count
npartitions=1                              
               int64  datetime64[ns]  int64
                 ...             ...    ...
Dask Name: loc, 10 expressions
Expr=Loc(frame=ResetIndex(frame=ColumnsSetter(frame=(GroupbyAggregation(frame=df, arg=defaultdict(<class 'list'>, {'timestamp': ['count']}), observed=True, dropna=False))[[('timestamp', 'count')]], columns=('count',))), iindexer=Renam