# TemporalScope Tutorial: Single-Step Target Shifting

## Engineering Design Overview

The `SingleStepTargetShifter` class provides backend-agnostic target shifting for time series data, following clear separation between validation and transformation phases:

### Core Components

1. **Validation Phase (fit)**:
   - Validates TimeFrame or supported DataFrame type
   - Ensures target column is set or can be inferred
   - No Narwhals operations at this stage

2. **Transformation Phase (transform)**:
   - Uses pure Narwhals operations
   - Shifts target using backend-agnostic operations
   - Preserves TimeFrame metadata if present

### Engineering Design Assumptions

1. **Single-step Mode**:
   - Each row represents one time step
   - Target variable is shifted by specified lag
   - Compatible with traditional ML frameworks
   - Supports scalar target prediction tasks

2. **Backend Agnostic**:
   - Validation in fit() before any operations
   - Pure Narwhals operations in transform()
   - Clean separation of concerns

3. **Input Handling**:
   - TimeFrame: Uses existing metadata
   - DataFrame: Validates in fit
   - numpy array: Converts in fit

## Example 1: Using with TimeFrame

In [None]:
import pandas as pd
import polars as pl
import modin.pandas as mpd

from temporalscope.core.core_utils import print_divider
from temporalscope.core.temporal_data_loader import TimeFrame
from temporalscope.datasets.datasets import DatasetLoader
from temporalscope.target_shifters.single_step import SingleStepTargetShifter

# Load data using DatasetLoader
loader = DatasetLoader("macrodata")
data = loader.load_data(backend="pandas")

# Create TimeFrame
tf = TimeFrame(data, time_col="ds", target_col="realgdp")

print("Original TimeFrame:")
print(tf.df.head())
print_divider()

# Initialize SingleStepTargetShifter
# Note: target_col can be inferred from TimeFrame
shifter = SingleStepTargetShifter(n_lags=1, verbose=True)

# Transform data - target will be shifted for future prediction
transformed_tf = shifter.fit_transform(tf)

print("\nTransformed TimeFrame:")
print(transformed_tf.df.head())
print_divider()

# Verify metadata preservation
print("TimeFrame Configuration:")
print(f"Backend: {transformed_tf.backend}")
print(f"Mode: {transformed_tf.mode}")
print(f"Sort Order: {'Ascending' if transformed_tf.ascending else 'Descending'}")

Loading dataset: 'macrodata'
DataFrame shape: (203, 13)
Target column: realgdp
Original TimeFrame:
    realgdp  realcons  realinv  realgovt  realdpi    cpi     m1  tbilrate  \
0  2710.349    1707.4  286.898   470.045   1886.9  28.98  139.7      2.82   
1  2778.801    1733.7  310.859   481.301   1919.7  29.15  141.7      3.08   
2  2775.488    1751.8  289.226   491.260   1916.4  29.35  140.5      3.82   
3  2785.204    1753.7  299.356   484.052   1931.3  29.37  140.0      4.33   
4  2847.699    1770.5  331.722   462.199   1955.5  29.54  139.6      3.50   

   unemp      pop  infl  realint         ds  
0    5.8  177.146  0.00     0.00 1959-01-01  
1    5.1  177.830  2.34     0.74 1959-04-01  
2    5.3  178.657  2.74     1.09 1959-07-01  
3    5.6  179.386  0.27     4.06 1959-10-01  
4    5.2  180.007  2.31     1.19 1960-01-01  
Initialized SingleStepTargetShifter with target_col=None, n_lags=1
Rows before: 203; Rows after: 202; Dropped: 1

Transformed TimeFrame:
   realcons  realinv  rea

In [None]:
# You can explore the TimeFrame objects further here

# Original TimeFrame data
print("Original TimeFrame columns:")
print(tf.df.columns)
print("\nFirst few rows:")
tf.df.head()

# Note: The target column 'realgdp' is present in the original data

Original TimeFrame columns:
Index(['realgdp', 'realcons', 'realinv', 'realgovt', 'realdpi', 'cpi', 'm1',
       'tbilrate', 'unemp', 'pop', 'infl', 'realint', 'ds'],
      dtype='object')

First few rows:


Unnamed: 0,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint,ds
0,2710.349,1707.4,286.898,470.045,1886.9,28.98,139.7,2.82,5.8,177.146,0.0,0.0,1959-01-01
1,2778.801,1733.7,310.859,481.301,1919.7,29.15,141.7,3.08,5.1,177.83,2.34,0.74,1959-04-01
2,2775.488,1751.8,289.226,491.26,1916.4,29.35,140.5,3.82,5.3,178.657,2.74,1.09,1959-07-01
3,2785.204,1753.7,299.356,484.052,1931.3,29.37,140.0,4.33,5.6,179.386,0.27,4.06,1959-10-01
4,2847.699,1770.5,331.722,462.199,1955.5,29.54,139.6,3.5,5.2,180.007,2.31,1.19,1960-01-01


In [None]:
# Explore the transformed TimeFrame

# Note: The original target column is replaced with a shifted version
print("Transformed TimeFrame columns:")
print(transformed_tf.df.columns)
print("\nFirst few rows:")
transformed_tf.df.head()

# Notice how 'realgdp' is now 'realgdp_shift_1', containing future values

Transformed TimeFrame columns:
Index(['realcons', 'realinv', 'realgovt', 'realdpi', 'cpi', 'm1', 'tbilrate',
       'unemp', 'pop', 'infl', 'realint', 'ds', 'realgdp_shift_1'],
      dtype='object')

First few rows:


Unnamed: 0,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint,ds,realgdp_shift_1
0,1707.4,286.898,470.045,1886.9,28.98,139.7,2.82,5.8,177.146,0.0,0.0,1959-01-01,2778.801
1,1733.7,310.859,481.301,1919.7,29.15,141.7,3.08,5.1,177.83,2.34,0.74,1959-04-01,2775.488
2,1751.8,289.226,491.26,1916.4,29.35,140.5,3.82,5.3,178.657,2.74,1.09,1959-07-01,2785.204
3,1753.7,299.356,484.052,1931.3,29.37,140.0,4.33,5.6,179.386,0.27,4.06,1959-10-01,2847.699
4,1770.5,331.722,462.199,1955.5,29.54,139.6,3.5,5.2,180.007,2.31,1.19,1960-01-01,2834.39


## Example 2: Using with DataFrame Directly

SingleStepTargetShifter also works directly with DataFrames from any supported backend (pandas, polars, modin, etc.).

In [None]:
import pandas as pd
import polars as pl
import modin.pandas as mpd

from temporalscope.core.core_utils import print_divider
from temporalscope.datasets.datasets import DatasetLoader
from temporalscope.target_shifters.single_step import SingleStepTargetShifter

# Load data using DatasetLoader
loader = DatasetLoader("macrodata")

# Demonstrate with different backends
for backend in ["pandas", "polars", "modin"]:
    print(f"\nUsing {backend} backend:")
    
    # Load data in specific backend
    data = loader.load_data(backend=backend)
    
    # Initialize SingleStepTargetShifter
    # Note: target_col must be specified when using raw DataFrames
    shifter = SingleStepTargetShifter(target_col="realgdp", n_lags=1, verbose=True)
    
    # Transform data
    transformed = shifter.fit_transform(data)
    
    print(f"Original shape: {data.shape}")
    print(f"Transformed shape: {transformed.shape}")
    print(f"Target column: {shifter.target_col}_shift_{shifter.n_lags}")
    print_divider()


Using pandas backend:
Loading dataset: 'macrodata'
DataFrame shape: (203, 13)
Target column: realgdp
Initialized SingleStepTargetShifter with target_col=realgdp, n_lags=1
Rows before: 203; Rows after: 202; Dropped: 1
Original shape: (203, 13)
Transformed shape: (202, 13)
Target column: realgdp_shift_1

Using polars backend:
Loading dataset: 'macrodata'
DataFrame shape: (203, 13)
Target column: realgdp
Initialized SingleStepTargetShifter with target_col=realgdp, n_lags=1
Rows before: 203; Rows after: 202; Dropped: 1
Original shape: (203, 13)
Transformed shape: (202, 13)
Target column: realgdp_shift_1

Using modin backend:
Loading dataset: 'macrodata'
DataFrame shape: (203, 13)
Target column: realgdp
Initialized SingleStepTargetShifter with target_col=realgdp, n_lags=1
Rows before: 203; Rows after: 202; Dropped: 1
Original shape: (203, 13)
Transformed shape: (202, 13)
Target column: realgdp_shift_1




In [None]:
# Let's examine the pandas DataFrame in detail

# Load pandas data
pandas_data = loader.load_data(backend="pandas")
print("Original pandas DataFrame:")
pandas_data.head()

# Note: You can see the original 'realgdp' column here

Loading dataset: 'macrodata'
DataFrame shape: (203, 13)
Target column: realgdp
Original pandas DataFrame:


Unnamed: 0,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint,ds
0,2710.349,1707.4,286.898,470.045,1886.9,28.98,139.7,2.82,5.8,177.146,0.0,0.0,1959-01-01
1,2778.801,1733.7,310.859,481.301,1919.7,29.15,141.7,3.08,5.1,177.83,2.34,0.74,1959-04-01
2,2775.488,1751.8,289.226,491.26,1916.4,29.35,140.5,3.82,5.3,178.657,2.74,1.09,1959-07-01
3,2785.204,1753.7,299.356,484.052,1931.3,29.37,140.0,4.33,5.6,179.386,0.27,4.06,1959-10-01
4,2847.699,1770.5,331.722,462.199,1955.5,29.54,139.6,3.5,5.2,180.007,2.31,1.19,1960-01-01


In [None]:
# Transform pandas data
shifter = SingleStepTargetShifter(target_col="realgdp", n_lags=1)
transformed_pandas = shifter.fit_transform(pandas_data)

print("Transformed pandas DataFrame:")
transformed_pandas.head()

# Note: 'realgdp' is now 'realgdp_shift_1' containing future values
# The DataFrame has one fewer row due to the shift

Transformed pandas DataFrame:


Unnamed: 0,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint,ds,realgdp_shift_1
0,1707.4,286.898,470.045,1886.9,28.98,139.7,2.82,5.8,177.146,0.0,0.0,1959-01-01,2778.801
1,1733.7,310.859,481.301,1919.7,29.15,141.7,3.08,5.1,177.83,2.34,0.74,1959-04-01,2775.488
2,1751.8,289.226,491.26,1916.4,29.35,140.5,3.82,5.3,178.657,2.74,1.09,1959-07-01,2785.204
3,1753.7,299.356,484.052,1931.3,29.37,140.0,4.33,5.6,179.386,0.27,4.06,1959-10-01,2847.699
4,1770.5,331.722,462.199,1955.5,29.54,139.6,3.5,5.2,180.007,2.31,1.19,1960-01-01,2834.39


In [None]:
import modin.pandas as mpd
import pandas as pd
import polars as pl

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends
from temporalscope.core.temporal_data_loader import TimeFrame as tf
from temporalscope.datasets.datasets import DatasetLoader

# First, let's see all supported backends
print("Supported TemporalScope backends:")
print(get_temporalscope_backends())
print_divider()

def init_timeframes_for_backends(target_col: str):
    """Initialize TimeFrame objects for demonstration backends.

    This function demonstrates TimeFrame initialization across different backends,
    ensuring data meets the requirements for temporal XAI workflows:
    - Clean, preprocessed data
    - Proper time column format
    - Numeric features
    
    :param target_col: The target column for prediction
    :type target_col: str
    :return: A dictionary containing TimeFrame objects for each backend
    :rtype: dict
    """
    # Initialize DatasetLoader - the recommended way to load data in TemporalScope
    loader = DatasetLoader("macrodata")
    timeframes = {}
    
    # Load and initialize TimeFrames for demonstration backends
    for backend in ["pandas", "polars", "modin"]:
        print(f"Loading data with {backend} backend...")
        data = loader.load_data(backend=backend)
        timeframes[backend] = tf(data, time_col="ds", target_col=target_col)
        print(f"Successfully created TimeFrame with {backend} backend")
        print_divider()
    
    return timeframes

if __name__ == "__main__":
    # Initialize TimeFrames
    timeframes = init_timeframes_for_backends(target_col="realgdp")
    
    # Demonstrate with Modin backend
    print("\nDetailed Example with Modin Backend:")
    macro_modin_tf = timeframes["modin"]
    
    # Verify backend
    print(f"Backend type: {macro_modin_tf.backend}")
    
    print("\nPreview of the DataFrame:")
    print(macro_modin_tf.df.head())
    print_divider()
    
    print("TimeFrame Configuration:")
    print(f"Mode: {macro_modin_tf.mode}")
    print(f"Sort Order: {'Ascending' if macro_modin_tf.ascending else 'Descending'}")
    print_divider()

Supported TemporalScope backends:
['pandas', 'modin', 'pyarrow', 'polars', 'dask']
Loading data with pandas backend...
Loading dataset: 'macrodata'
DataFrame shape: (203, 13)
Target column: realgdp
Successfully created TimeFrame with pandas backend
Loading data with polars backend...
Loading dataset: 'macrodata'
DataFrame shape: (203, 13)
Target column: realgdp
Successfully created TimeFrame with polars backend
Loading data with modin backend...
Loading dataset: 'macrodata'
DataFrame shape: (203, 13)
Target column: realgdp
Successfully created TimeFrame with modin backend

Detailed Example with Modin Backend:
Backend type: modin

Preview of the DataFrame:
    realgdp  realcons  realinv  realgovt  realdpi    cpi     m1  tbilrate  \
0  2710.349    1707.4  286.898   470.045   1886.9  28.98  139.7      2.82   
1  2778.801    1733.7  310.859   481.301   1919.7  29.15  141.7      3.08   
2  2775.488    1751.8  289.226   491.260   1916.4  29.35  140.5      3.82   
3  2785.204    1753.7  299.3

## Implementation Details

The SingleStepTargetShifter handles different DataFrame backends through Narwhals operations:

1. **LazyFrame (Dask/Polars)**:
   - Uses collect() for scalar access
   - Avoids direct indexing
   - Handles lazy evaluation properly

2. **PyArrow**:
   - Uses nw.Int64 for numeric operations
   - Handles comparisons through Narwhals
   - Converts types before arithmetic operations

3. **All Backends**:
   - Uses @nw.narwhalify for backend conversions
   - Pure Narwhals operations throughout
   - Consistent behavior across supported types

This design ensures reliable target shifting operations across all supported DataFrame backends, preparing data for machine learning tasks and temporal feature importance analysis.