# TemporalScope Tutorial: Backend-Agnostic Functions Using Narwhals

### Purpose
This tutorial demonstrates how **TemporalScope** can leverage **Narwhals** to support backend-agnostic data operations. By building backend-agnostic functions, TemporalScope can run seamlessly across **Pandas**, **Modin**, **Polars**, and **PyArrow** without modifying the underlying code.

### Key Steps
- **Create a Large Synthetic Dataset**: Start with a sample Pandas DataFrame and test compatibility across Modin, Polars, and PyArrow.
- **Implement a Narwhals-Decorated Function**: Refactor a simple function to be backend-agnostic using Narwhals’ `@narwhalify`.
- **Test Across Backends**: Verify if our function executes smoothly in Pandas, Modin, Polars, and PyArrow.

### Advantages of Narwhals
- **Uniform API**: The `@narwhalify` decorator allows backend-neutral code execution.
- **Enhanced Compatibility**: Narwhals optimizes handling of DataFrames across multiple libraries.
- **Simplified Codebase**: Core logic remains backend-agnostic, reducing code duplication.

---

### Example Function
| **Operation**      | **Command**                          | **Purpose**                       |
|--------------------|--------------------------------------|-----------------------------------|
| Data Initialization| `nw.from_native(df)`                | Initialize a Narwhals DataFrame   |
| Null Check         | `nw.col("column_name").is_null()`   | Checks for nulls in a column      |
| Return to Native   | `nw.to_native(df)`                  | Convert back to original backend  |


In [7]:
import pandas as pd
import modin.pandas as mpd
import polars as pl
import pyarrow as pa
import narwhals as nw
from narwhals.typing import FrameT

# Step 1: Generate a large synthetic DataFrame in Pandas
def create_synthetic_dataframe() -> pd.DataFrame:
    """Creates a synthetic DataFrame with 1 million rows for testing."""
    data = pd.DataFrame({f"feature_{i}": range(1_000_000) for i in range(10)})
    data["target"] = range(1_000_000)
    print("\n--- Created Pandas DataFrame with 1 million rows ---\n")
    return data

# Step 2: Define a Narwhals-compatible null check function
@nw.narwhalify
def check_nulls_nw(df: FrameT) -> bool:
    """Backend-agnostic null check using Narwhals."""
    null_check_result = df.select([(nw.col(col).is_null().any()).alias(f"{col}_null") for col in df.columns])
    return null_check_result.to_pandas().any().any()

# Step 3: Backend test function
def test_narwhals_agnostic_function(data_df: pd.DataFrame):
    """Test `check_nulls_nw` across various data backends."""
    results = []

    # Backend data transformations
    backends = {
        "Pandas": data_df,
        "Modin": mpd.DataFrame(data_df),
        "Polars": pl.DataFrame(data_df),
        "PyArrow": nw.from_native(pa.Table.from_pandas(data_df))  # Narwhals Frame from PyArrow
    }

    for backend_name, backend_df in backends.items():
        try:
            # Execute Narwhals function
            nulls_found = check_nulls_nw(backend_df)
            post_backend_type = "Narwhals FrameT" if isinstance(backend_df, nw.DataFrame) else type(backend_df).__name__
            results.append({
                "Original Backend": backend_name,
                "Post-Narwhals Backend": post_backend_type,
                "Executed Successfully": True,
                "Nulls Found": nulls_found
            })
            print(f"{backend_name} -> {post_backend_type}: Executed Successfully, Nulls Found = {nulls_found}")
        except Exception as e:
            results.append({
                "Original Backend": backend_name,
                "Post-Narwhals Backend": type(backend_df).__name__,
                "Executed Successfully": False,
                "Error": str(e)
            })
            print(f"{backend_name} -> {type(backend_df).__name__}: Failed with error: {e}")

    return results

if __name__ == "__main__":
    # Generate synthetic data and test across backends
    data_df = create_synthetic_dataframe()
    backend_results = test_narwhals_agnostic_function(data_df)
    print("\n--- Summary of Backend Compatibility ---")
    for result in backend_results:
        print(result)



--- Created Pandas DataFrame with 1 million rows ---





Pandas -> DataFrame: Executed Successfully, Nulls Found = False
Modin -> DataFrame: Executed Successfully, Nulls Found = False
Polars -> DataFrame: Executed Successfully, Nulls Found = False
PyArrow -> Narwhals FrameT: Executed Successfully, Nulls Found = False

--- Summary of Backend Compatibility ---
{'Original Backend': 'Pandas', 'Post-Narwhals Backend': 'DataFrame', 'Executed Successfully': True, 'Nulls Found': np.False_}
{'Original Backend': 'Modin', 'Post-Narwhals Backend': 'DataFrame', 'Executed Successfully': True, 'Nulls Found': np.False_}
{'Original Backend': 'Polars', 'Post-Narwhals Backend': 'DataFrame', 'Executed Successfully': True, 'Nulls Found': np.False_}
{'Original Backend': 'PyArrow', 'Post-Narwhals Backend': 'Narwhals FrameT', 'Executed Successfully': True, 'Nulls Found': np.False_}


### Using Narwhals for Complex Backend-Agnostic Functions

Narwhals simplifies backend-agnostic data operations, enabling functions that work seamlessly across Pandas, Modin, Polars, and PyArrow. Below are the core steps for implementing and testing more complex functions in a backend-agnostic way.

1. **Initialize the Narwhals DataFrame**: Convert any native DataFrame to a Narwhals-compatible DataFrame using `nw.from_native` (or use the `@narwhalify` decorator for direct integration).
2. **Implement Backend-Agnostic Logic**: Write your processing logic using a subset of the Polars API supported by Narwhals. This can include operations like group-by, aggregation, and creating lagged features.
3. **Return Results in Native Format**: Use `nw.to_native` to convert the output back to the original DataFrame type if needed.

### Example Code: Feature Scaling and Lagging with Narwhals

Here’s an example using a smaller synthetic dataset to create scaled and lagged features:


In [2]:
import pandas as pd
import modin.pandas as mpd
import polars as pl
import pyarrow as pa
import narwhals as nw
from narwhals.typing import FrameT
import numpy as np

# Step 1: Create a smaller synthetic DataFrame with random target values
def create_synthetic_dataframe() -> pd.DataFrame:
    """Creates a synthetic DataFrame with 5000 rows for testing."""
    np.random.seed(42)
    data = pd.DataFrame({f"feature_{i}": range(5000) for i in range(10)})
    data["target"] = np.random.rand(5000)  # Randomized target
    print("\n--- Created Pandas DataFrame with 5000 rows ---\n")
    return data

# Step 2: Define a Narwhals-compatible function for scaling and lagging features
@nw.narwhalify
def scale_and_lag_features(df: FrameT, lag_steps: int = 1) -> FrameT:
    """Scales numerical features and creates lagged features using Narwhals."""
    standardized_df = df.with_columns([
        ((nw.col(col) - nw.col(col).mean()) / nw.col(col).std()).alias(f"{col}_scaled")
        for col in df.columns if col != "target"
    ])
    lagged_df = standardized_df.with_columns([
        nw.col(col).shift(lag_steps).alias(f"{col}_lag{lag_steps}")
        for col in standardized_df.columns if col.endswith("_scaled")
    ])
    return lagged_df

# Step 3: Backend test function
def test_narwhals_agnostic_function(data_df: pd.DataFrame):
    """Test `scale_and_lag_features` across various data backends."""
    results = []
    backends = {
        "Pandas": data_df,
        "Modin": mpd.DataFrame(data_df),
        "Polars": pl.DataFrame(data_df),
        "PyArrow": nw.from_native(pa.Table.from_pandas(data_df))  # Narwhals Frame from PyArrow
    }

    for backend_name, backend_df in backends.items():
        try:
            transformed_df = scale_and_lag_features(backend_df, lag_steps=3)
            # Convert to Pandas and format with Markdown for preview
            transformation_preview = transformed_df.to_pandas().iloc[:5].to_markdown()
            post_backend_type = "Narwhals FrameT" if isinstance(transformed_df, nw.DataFrame) else type(transformed_df).__name__
            results.append({
                "Original Backend": backend_name,
                "Post-Narwhals Backend": post_backend_type,
                "Executed Successfully": True,
                "Transformation Preview": transformation_preview
            })
            print(f"{backend_name} -> {post_backend_type}: Executed Successfully, Transformation Preview:\n{transformation_preview}\n")
        except Exception as e:
            results.append({
                "Original Backend": backend_name,
                "Post-Narwhals Backend": type(backend_df).__name__,
                "Executed Successfully": False,
                "Error": str(e)
            })
            print(f"{backend_name} -> {type(backend_df).__name__}: Failed with error: {e}\n")

    return results

if __name__ == "__main__":
    data_df = create_synthetic_dataframe()
    backend_results = test_narwhals_agnostic_function(data_df)
    print("\n--- Summary of Backend Compatibility ---")
    for result in backend_results:
        print(result)



--- Created Pandas DataFrame with 5000 rows ---

Pandas -> DataFrame: Failed with error: 'DataFrame' object has no attribute 'to_pandas'





Modin -> DataFrame: Failed with error: 'DataFrame' object has no attribute 'to_pandas'

Polars -> DataFrame: Failed with error: Missing optional dependency 'tabulate'.  Use pip or conda to install tabulate.

PyArrow -> DataFrame: Failed with error: Missing optional dependency 'tabulate'.  Use pip or conda to install tabulate.


--- Summary of Backend Compatibility ---
{'Original Backend': 'Pandas', 'Post-Narwhals Backend': 'DataFrame', 'Executed Successfully': False, 'Error': "'DataFrame' object has no attribute 'to_pandas'"}
{'Original Backend': 'Modin', 'Post-Narwhals Backend': 'DataFrame', 'Executed Successfully': False, 'Error': "'DataFrame' object has no attribute 'to_pandas'"}
{'Original Backend': 'Polars', 'Post-Narwhals Backend': 'DataFrame', 'Executed Successfully': False, 'Error': "Missing optional dependency 'tabulate'.  Use pip or conda to install tabulate."}
{'Original Backend': 'PyArrow', 'Post-Narwhals Backend': 'DataFrame', 'Executed Successfully': False, 'Error': "Miss