# Converting Columns with Narwhals: ML-Ready Data Types

This notebook demonstrates how to convert columns to specific data types using Narwhals' native types. This is crucial for ML preprocessing where specific column types are required. We'll cover:

1. **Native Narwhals Types**
   - Using `nw.Int64`, `nw.Float64`, etc.
   - Why string-based types (`'int64'`, `'float64'`) should be avoided
   - Backend-agnostic type safety

2. **Column Type Conversion**
   - Converting single columns
   - Handling multiple columns with different types
   - Validating column types for ML

3. **ML Preprocessing Patterns**
   - Ensuring numeric features
   - Handling categorical columns
   - Validating feature types

In [1]:
import narwhals as nw
from narwhals.typing import FrameT
import pandas as pd
import polars as pl
from typing import Dict, List, Optional, Union, Any, Literal

# Create sample data with mixed types
data = {
    "int_col": ["1", "2", "3", "4", "5"],  # Strings that should be integers
    "float_col": [1, 2, None, 4, 5],  # Integers that should be floats
    "mixed_col": [1.5, None, "3.0", 4, 5.5],  # Mixed numeric types
    "cat_col": ["A", "B", "A", "C", "B"],  # Categorical
}

# Create DataFrames with different backends
df_pd = pd.DataFrame(data)
# For Polars, convert mixed types to strings first
pl_data = {
    "int_col": [str(x) for x in data["int_col"]],
    "float_col": data["float_col"],
    "mixed_col": [str(x) if x is not None else None for x in data["mixed_col"]],
    "cat_col": data["cat_col"],
}
df_pl = pl.DataFrame(pl_data)


# Example 1: Single Column Type Conversion
@nw.narwhalify
def convert_to_numeric(df: FrameT, col: str, target_type: Union[nw.Int64, nw.Float64]) -> FrameT:
    """Convert a column to a specific numeric type using Narwhals native types.

    Why Native Types?
    - Backend-agnostic type safety
    - Consistent behavior across implementations
    - Better error messages for type mismatches
    """
    print(f"Converting {col} to {target_type.__name__}")

    # Try direct cast first
    try:
        result = df.with_columns([nw.col(col).cast(target_type).alias(col)])
        return result
    except Exception:
        # If direct cast fails, try string conversion
        result = df.with_columns(
            [nw.col(col).cast(nw.String()).str.replace(",", "").cast(nw.Float64()).cast(target_type).alias(col)]
        )
        return result


# Example 2: Multi-Column Type Conversion
@nw.narwhalify
def convert_feature_columns(df: FrameT, type_map: Dict[str, Any]) -> FrameT:
    """Convert multiple columns to specific types for ML preprocessing.

    Why This Pattern?
    - Common ML requirement: specific types for features
    - Handles multiple columns efficiently
    - Maintains type safety across operations
    """
    result = df

    for col, target_type in type_map.items():
        # Try direct cast first
        try:
            result = result.with_columns([nw.col(col).cast(target_type).alias(col)])
        except Exception:
            # If direct cast fails, try string conversion for numeric types
            if target_type in (nw.Int64(), nw.Float64()):
                result = result.with_columns(
                    [nw.col(col).cast(nw.String()).str.replace(",", "").cast(nw.Float64()).cast(target_type).alias(col)]
                )
            else:
                # For non-numeric types, just try string cast
                result = result.with_columns([nw.col(col).cast(nw.String()).alias(col)])

    return result


# Example 3: ML Feature Type Validation
@nw.narwhalify(eager_only=True)
def validate_feature_types(df: FrameT, required_types: Dict[str, Any]) -> Dict[str, bool]:
    """Validate that columns have correct types for ML.

    Why Eager?
    - Type validation needs immediate results
    - Used before model training
    - Returns Python dict for easy checking
    """
    result = {}

    for col, required_type in required_types.items():
        # Check if column can be cast to required type
        try:
            if required_type in (nw.Int64(), nw.Float64()):
                # Try numeric conversion
                _ = df.select(
                    [nw.col(col).cast(nw.String()).str.replace(",", "").cast(nw.Float64()).cast(required_type)]
                )
            else:
                # Try direct cast for other types
                _ = df.select([nw.col(col).cast(required_type)])
            result[col] = True
        except Exception:
            result[col] = False

    return result

In [2]:
# Let's demonstrate these patterns

print("Example 1: Single Column Type Conversion")
print("-" * 50)
# Convert string integers to actual integers
int_df = convert_to_numeric(df_pd, "int_col", nw.Int64)
print("\nConverted int_col:")
print(int_df["int_col"].dtype)
print(int_df["int_col"])

print("\nExample 2: Multi-Column Type Conversion")
print("-" * 50)
# Define required types for ML features
type_map = {"int_col": nw.Int64(), "float_col": nw.Float64(), "mixed_col": nw.Float64()}
ml_ready = convert_feature_columns(df_pd, type_map)
print("\nML-Ready DataFrame:")
print(ml_ready.dtypes)

print("\nExample 3: ML Feature Type Validation")
print("-" * 50)
# Check if columns can be used for ML
required_types = {"int_col": nw.Int64(), "float_col": nw.Float64(), "mixed_col": nw.Float64(), "cat_col": nw.String()}
validation = validate_feature_types(df_pd, required_types)
print("\nFeature Type Validation:")
for col, is_valid in validation.items():
    print(f"{col}: {'✓' if is_valid else '✗'}")

Example 1: Single Column Type Conversion
--------------------------------------------------
Converting int_col to Int64

Converted int_col:
int64
0    1
1    2
2    3
3    4
4    5
Name: int_col, dtype: int64

Example 2: Multi-Column Type Conversion
--------------------------------------------------

ML-Ready DataFrame:
int_col        int64
float_col    float64
mixed_col    float64
cat_col       object
dtype: object

Example 3: ML Feature Type Validation
--------------------------------------------------

Feature Type Validation:
int_col: ✓
float_col: ✓
mixed_col: ✗
cat_col: ✓


## Key Takeaways

1. **Use Native Narwhals Types**
   - Always use `nw.Int64()`, `nw.Float64()`, etc.
   - Never use string-based types like `'int64'`
   - Ensures backend-agnostic type safety

2. **Column Type Conversion**
   - Handle string-to-numeric conversion safely
   - Convert multiple columns efficiently
   - Use type maps for clear requirements

3. **ML Preprocessing**
   - Validate types before model training
   - Handle mixed-type columns properly
   - Use eager evaluation for validation

These patterns show how to use Narwhals for ML-specific column type handling, which is crucial for:
- Feature preprocessing
- Model input validation
- Type safety across backends

This completes our Narwhals tutorial series:
1. General Patterns (1_narwhals_patterns.ipynb)
2. Lazy vs Eager (2_narwhals_lazy_vs_eager.ipynb)
3. Complex Functions (3_narwhals_complex_functions.ipynb)
4. Column Types (4_narwhals_converting_columns.ipynb)

With these patterns, TemporalScope can eliminate DataFrame-to-DataFrame conversion functions and use Narwhals' native types for all data handling.