# Temporal Scope Tutorial: Utilizing Target Shifter

## Overview

This tutorial demonstrates how to load macroeconomic data and apply the **TemporalTargetShifter** using the **Modin** backend. The tutorial shows how to shift the target variable in **machine learning** and **deep learning** modes for forecasting tasks. The tool supports flexible configurations for different forecasting needs.

### Summary

| **Step**  | **Description**                                                                 |
|-----------|---------------------------------------------------------------------------------|
| **1**     | **Data Loading**: Load macroeconomic data and create a datetime column (`ds`).   |
| **2**     | **Modin Backend Initialization**: Initialize a `TimeFrame` for scalable data processing with Modin. |
| **3**     | **Target Shifting (ML Mode)**: Shift the target variable (`realgdp`) for one-step-ahead forecasting in **machine learning mode**. |
| **4**     | **Target Shifting (DL Mode)**: Shift the target variable for sequence-based forecasting in **deep learning mode**. |

### Key Concepts

- **One-step ahead forecasting**: Shifting the target variable to predict the next time step for machine learning models.
- **Sequence forecasting**: Generating sequences of target variables for deep learning models.
- **Modin Backend**: Scalable version of Pandas for large datasets.
- **TemporalTargetShifter**: A tool to shift target variables for forecasting tasks, supporting both machine learning and deep learning modes.

### Steps

1. **Load the macroeconomic dataset** using the `statsmodels` library.
2. **Initialize a TimeFrame** for the Modin backend.
3. **Apply the Target Shifter** in machine learning mode to shift the target variable by one step (for simple one-step-ahead forecasting).
4. **Apply the Target Shifter** in deep learning mode to create sequences for sequence-based forecasting tasks.


# Part 1: Load Macro-Economic Dataset

In [1]:
import pandas as pd
from statsmodels.datasets import macrodata

from temporalscope.core.core_utils import print_divider

# Constants for modes
MODE_MACHINE_LEARNING = "machine_learning"

def load_macrodata(target_col: str = "realgdp"):
    """Preprocess the dataset with a combined column for time target.
    
    :param target_col: The column to be used as the target for prediction.
    :type target_col: str, optional
    :return: Preprocessed DataFrame with target column.
    :rtype: pd.DataFrame
    """
    print_divider()
    print("Loading the 'macrodata' dataset from statsmodels.")
    print(f"Using '{target_col}' as the target column for future prediction.")
    print_divider()

    # Load macrodata dataset
    macro_df = macrodata.load_pandas().data.copy()

    # Create 'ds' column combining 'year' and 'quarter'
    macro_df["ds"] = pd.to_datetime(
        macro_df["year"].astype(int).astype(str) + "-" + ((macro_df["quarter"] - 1) * 3 + 1).astype(int).astype(str) + "-01"
    )

    # Drop the 'year' and 'quarter' columns
    macro_df = macro_df.drop(columns=["year", "quarter"])

    # Reorder columns to put 'ds' (datetime) first
    cols = ["ds"] + [col for col in macro_df.columns if col != "ds"]
    macro_df = macro_df[cols].copy()

    return macro_df, target_col


# Load the macrodata dataset and preprocess
macro_df, target_col = load_macrodata()
macro_df

Loading the 'macrodata' dataset from statsmodels.
Using 'realgdp' as the target column for future prediction.


Unnamed: 0,ds,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint
0,1959-01-01,2710.349,1707.4,286.898,470.045,1886.9,28.980,139.7,2.82,5.8,177.146,0.00,0.00
1,1959-04-01,2778.801,1733.7,310.859,481.301,1919.7,29.150,141.7,3.08,5.1,177.830,2.34,0.74
2,1959-07-01,2775.488,1751.8,289.226,491.260,1916.4,29.350,140.5,3.82,5.3,178.657,2.74,1.09
3,1959-10-01,2785.204,1753.7,299.356,484.052,1931.3,29.370,140.0,4.33,5.6,179.386,0.27,4.06
4,1960-01-01,2847.699,1770.5,331.722,462.199,1955.5,29.540,139.6,3.50,5.2,180.007,2.31,1.19
...,...,...,...,...,...,...,...,...,...,...,...,...,...
198,2008-07-01,13324.600,9267.7,1990.693,991.551,9838.3,216.889,1474.7,1.17,6.0,305.270,-3.16,4.33
199,2008-10-01,13141.920,9195.3,1857.661,1007.273,9920.4,212.174,1576.5,0.12,6.9,305.952,-8.79,8.91
200,2009-01-01,12925.410,9209.2,1558.494,996.287,9926.4,212.671,1592.8,0.22,8.1,306.547,0.94,-0.71
201,2009-04-01,12901.504,9189.0,1456.678,1023.528,10077.5,214.469,1653.6,0.18,9.2,307.226,3.37,-3.19


## Part 2: Shifting for Machine Learning

In [4]:
import modin.pandas as mpd
from temporalscope.partition.sliding_window import SlidingWindowPartitioner
from temporalscope.core.core_utils import BACKEND_MODIN, print_divider
from temporalscope.core.temporal_data_loader import TimeFrame
from temporalscope.core.temporal_target_shifter import TemporalTargetShifter

# Constants for modes
MODE_MACHINE_LEARNING = "machine_learning"

# Step 1: Load the macrodata dataset and preprocess
macro_df, target_col = load_macrodata()

# Step 2: Initialize Modin TimeFrame for Modin backend
print_divider()
print("Initializing TimeFrame for the Modin backend...")
macro_modin_df = mpd.DataFrame(macro_df)
modin_tf = TimeFrame(macro_modin_df, time_col="ds", target_col=target_col, dataframe_backend=BACKEND_MODIN)

# Step 3: Preview the original data
print("Original DataFrame:")
print(modin_tf.get_data().head())
print_divider()

# Step 4: Create 2 partitions using `num_partitions=2` with train and test split
partitioner = SlidingWindowPartitioner(tf=modin_tf, num_partitions=2, train_pct=0.7, test_pct=0.3)  # Set train/test split

# Step 5: Get the partitions
partitions = list(partitioner.fit_transform())

print_divider()
print(f"Total partitions created: {len(partitions)}")
for i, partition in enumerate(partitions):
    print(f"Partition {i+1} preview:")
    print(partition['partition_1']['train'].head())  # Access the train split
    print(partition['partition_1']['test'].head())   # Access the test split
    print_divider()


Loading the 'macrodata' dataset from statsmodels.
Using 'realgdp' as the target column for future prediction.
Initializing TimeFrame for the Modin backend...
Original DataFrame:
          ds   realgdp  realcons  realinv  realgovt  realdpi    cpi     m1  \
0 1959-01-01  2710.349    1707.4  286.898   470.045   1886.9  28.98  139.7   
1 1959-04-01  2778.801    1733.7  310.859   481.301   1919.7  29.15  141.7   
2 1959-07-01  2775.488    1751.8  289.226   491.260   1916.4  29.35  140.5   
3 1959-10-01  2785.204    1753.7  299.356   484.052   1931.3  29.37  140.0   
4 1960-01-01  2847.699    1770.5  331.722   462.199   1955.5  29.54  139.6   

   tbilrate  unemp      pop  infl  realint  
0      2.82    5.8  177.146  0.00     0.00  
1      3.08    5.1  177.830  2.34     0.74  
2      3.82    5.3  178.657  2.74     1.09  
3      4.33    5.6  179.386  0.27     4.06  
4      3.50    5.2  180.007  2.31     1.19  


TypeError: SlidingWindowPartitioner._fit_pandas_modin() takes 1 positional argument but 2 were given

In [None]:
shifted_df

In [None]:
shifted_df.head()

## Part 2: Shifting for Deep Learning

In [None]:
# Step 5: Apply the TemporalTargetShifter in deep learning mode
MODE_DEEP_LEARNING = "deep_learning"

print(f"\nApplying Target Shifter in {MODE_DEEP_LEARNING} mode...")

# Setup the TemporalTargetShifter for deep learning mode with a sequence length
sequence_length = 3  # Length of sequence for deep learning
shifter_dl = TemporalTargetShifter(n_lags=1, mode=MODE_DEEP_LEARNING, sequence_length=sequence_length, verbose=True)

# Apply the shifter
shifted_dl_df = shifter_dl.fit_transform(modin_tf)

# Print the shifted data with sequences
print("Shifted data for deep learning mode (sequences):")
print(shifted_dl_df.head())


In [None]:
shifted_dl_df

## Part 4: Shifting for all backends

In [None]:
import modin.pandas as mpd
import polars as pl

from temporalscope.core.core_utils import BACKEND_MODIN, BACKEND_PANDAS, BACKEND_POLARS, print_divider
from temporalscope.core.temporal_data_loader import TimeFrame as tf
from temporalscope.core.temporal_target_shifter import TemporalTargetShifter

# Constants for modes
MODE_MACHINE_LEARNING = "machine_learning"
MODE_DEEP_LEARNING = "deep_learning"

def load_macrodata(target_col: str = "realgdp"):
    """Preprocess the dataset with a combined column for time & shifted target.

    :param target_col: The column to be used as the target for prediction
    :type target_col: str, optional
    :default target_col: 'realgdp'

    :return: Preprocessed DataFrame with shifted target
    :rtype: pd.DataFrame
    """
    print_divider()
    print("Loading the 'macrodata' dataset from the open-license statsmodels package.")
    print(f"Using '{target_col}' as the target column for future prediction.")
    print_divider()

    # Load macrodata dataset
    macro_df = macrodata.load_pandas().data.copy()

    # Create 'ds' column by combining 'year' and 'quarter'
    macro_df["ds"] = pd.to_datetime(
        macro_df["year"].astype(int).astype(str)
        + "-"
        + ((macro_df["quarter"] - 1) * 3 + 1).astype(int).astype(str)
        + "-01"
    )

    # Drop the 'year' and 'quarter' columns
    macro_df = macro_df.drop(columns=["year", "quarter"])

    # Reorder columns to place 'ds' first
    cols = ["ds"] + [col for col in macro_df.columns if col != "ds"]
    macro_df = macro_df[cols].copy()

    return macro_df, target_col


def init_timeframes_for_backends(macro_df, target_col: str):
    """Initialize TimeFrame objects for all backends (Pandas, Polars, Modin) using constants.

    :param macro_df: Preprocessed macro dataset.
    :type macro_df: pd.DataFrame
    :param target_col: The target column for prediction.
    :type target_col: str
    :return: A dictionary containing TimeFrame objects for Pandas, Polars, and Modin.
    :rtype: dict
    """
    timeframes = {}

    # Pandas backend
    macro_pandas_df = pd.DataFrame(macro_df)
    timeframes[BACKEND_PANDAS] = tf(
        macro_pandas_df, time_col="ds", target_col=target_col, backend=BACKEND_PANDAS
    )

    # Polars backend
    macro_polars_df = pl.DataFrame(macro_df)
    timeframes[BACKEND_POLARS] = tf(
        macro_polars_df, time_col="ds", target_col=target_col, backend=BACKEND_POLARS
    )

    # Modin backend
    macro_modin_df = mpd.DataFrame(macro_df)
    timeframes[BACKEND_MODIN] = tf(
        macro_modin_df, time_col="ds", target_col=target_col, backend=BACKEND_MODIN
    )

    return timeframes


def apply_target_shifter(tf_obj, mode: str):
    """Apply the TemporalTargetShifter in the specified mode.

    :param tf_obj: TimeFrame object to apply the shifter to.
    :param mode: Mode of operation (machine_learning or deep_learning).
    """
    print(f"\nApplying Target Shifter in {mode} mode...")

    # Setup the TemporalTargetShifter
    if mode == MODE_MACHINE_LEARNING:
        shifter = TemporalTargetShifter(n_lags=1, mode=MODE_MACHINE_LEARNING, verbose=True)
    elif mode == MODE_DEEP_LEARNING:
        # In deep learning mode, sequence_length must be provided
        shifter = TemporalTargetShifter(n_lags=1, mode=MODE_DEEP_LEARNING, sequence_length=3, verbose=True)
    else:
        raise ValueError(f"Invalid mode: {mode}")

    # Apply the shifter
    shifted_df = shifter.fit_transform(tf_obj)

    # Print the result (since it's already a DataFrame, no need for get_data())
    print("Shifted data:")
    print(shifted_df.head())


if __name__ == "__main__":
    # Load the macrodata dataset and preprocess
    macro_df, target_col = load_macrodata()

    # Initialize TimeFrame objects for various backends using constants
    timeframes = init_timeframes_for_backends(macro_df, target_col)

    # Apply and demonstrate shifting for all backends
    for backend, tf_obj in timeframes.items():
        print_divider()
        print(f"Demonstrating Target Shifter for backend: {backend}")
        print("Preview of the TimeFrame DataFrame:")
        print(tf_obj.get_data().head())
        print_divider()

        # Apply target shifting in machine learning mode
        apply_target_shifter(tf_obj, MODE_MACHINE_LEARNING)

        # Apply target shifting in deep learning mode
        apply_target_shifter(tf_obj, MODE_DEEP_LEARNING)
