# Temporal Scope Tutorial

## Loading and Testing Macro Data Across Multiple Backends

This tutorial demonstrates how to load macroeconomic data and test it across **Pandas**, **Polars**, and **Modin** backends using TemporalScope’s `TimeFrame`.

- **Different Backends**:
  - **Pandas**: Ideal for small-to-medium datasets with seamless integration into the Python ecosystem.
  - **Polars**: Optimized for speed, suitable for handling large datasets using multi-threading.
  - **Modin**: Scales Pandas operations across multiple cores, ideal for large datasets.
- **TimeFrame**:
  - Consistent handling of temporal data across all backends.
  - Facilitates switching between backends based on dataset size and performance needs.
- **Strict Assumptions for Default Models**:
  - **One-step ahead forecasting**: Requires shifting the target variable for future prediction.
  - **Predictive performance**: Requires proper data cleaning and preparation before using TemporalScope partitioning or SHAP-based temporal analysis pipelines.


In [1]:
import modin.pandas as mpd
import pandas as pd
import polars as pl
from statsmodels.datasets import macrodata

from temporalscope.core.temporal_data_loader import TimeFrame as tf
from temporalscope.core.core_utils import print_divider


def load_macrodata(target_col: str = "realgdp"):
    """Preprocess the dataset with a combined column for time & shifted target.

    :param target_col: The column to be used as the target for prediction
    :type target_col: str, optional
    :default target_col: 'realgdp'

    :return: Preprocessed DataFrame with shifted target
    :rtype: pd.DataFrame
    """
    print_divider()
    print("Loading the 'macrodata' dataset from the open-license statsmodels package.")
    print(f"Using '{target_col}' as the target column for future prediction.")
    print_divider()

    # Load macrodata dataset
    macro_df = macrodata.load_pandas().data.copy()

    # Create 'ds' column by combining 'year' and 'quarter'
    macro_df["ds"] = pd.to_datetime(
        macro_df["year"].astype(int).astype(str)
        + "-"
        + ((macro_df["quarter"] - 1) * 3 + 1).astype(int).astype(str)
        + "-01"
    )

    # Drop the 'year' and 'quarter' columns
    macro_df = macro_df.drop(columns=["year", "quarter"])

    # Reorder columns to place 'ds' first
    cols = ["ds"] + [col for col in macro_df.columns if col != "ds"]
    macro_df = macro_df[cols].copy()

    # Shift the target column for future prediction and rename it
    shifted_target_col = f"target_{target_col}"
    macro_df[shifted_target_col] = macro_df[target_col].shift(-1)

    # Drop any rows with NaN (due to shifting)
    macro_df = macro_df.dropna().copy()

    # Print the shape of the DataFrame
    print(f"Loaded DataFrame shape: {macro_df.shape}")

    print_divider()
    print(
        f"""Shifted '{target_col}' to create a new target column '{shifted_target_col}'
        for future prediction."""
    )
    print_divider()

    return macro_df, shifted_target_col


def init_timeframes_for_backends(macro_df, target_col: str):
    """Initialize TimeFrame objects for multiple backends (Pandas, Polars, Modin).

    :param macro_df: Preprocessed macro dataset.
    :type macro_df: pd.DataFrame
    :param target_col: The target column for prediction.
    :type target_col: str
    :return: A dictionary containing TimeFrame objects for Pandas, Polars, and Modin.
    :rtype: dict
    """
    # Pandas backend
    macro_pandas_df = pd.DataFrame(macro_df)
    macro_pandas_tf = tf(
        macro_pandas_df, time_col="ds", target_col=target_col, backend="pd"
    )

    # Polars backend
    macro_polars_df = pl.DataFrame(macro_df)
    macro_polars_tf = tf(
        macro_polars_df, time_col="ds", target_col=target_col, backend="pl"
    )

    # Modin backend
    macro_modin_df = mpd.DataFrame(macro_df)
    macro_modin_tf = tf(
        macro_modin_df, time_col="ds", target_col=target_col, backend="mpd"
    )

    return {
        "pandas": macro_pandas_tf,
        "polars": macro_polars_tf,
        "modin": macro_modin_tf,
    }


if __name__ == "__main__":
    # Load the macrodata dataset and preprocess
    macro_df, shifted_target_col = load_macrodata()

    # Init TimeFram passing the correct target column explicitly
    timeframes = init_timeframes_for_backends(macro_df, target_col="target_realgdp")

    # We will only demonstrate detailed output for Modin
    print_divider()
    print("Using Modin backend:")
    macro_modin_tf = timeframes["modin"]

    # Assert that the backend is Modin
    assert macro_modin_tf.backend == "mpd", "Backend is not Modin!"

    print("Preview of the Modin DataFrame (macrodata):")
    print(macro_modin_tf.get_data().head())
    print_divider()

    # Print object's attributes (metadata)
    print("Metadata for Modin TimeFrame object:")
    print(macro_modin_tf.__dict__)
    print_divider()

Loading the 'macrodata' dataset from the open-license statsmodels package.
Using 'realgdp' as the target column for future prediction.
Loaded DataFrame shape: (202, 14)
Shifted 'realgdp' to create a new target column 'target_realgdp'
        for future prediction.


2024-09-20 02:19:57,110	INFO worker.py:1786 -- Started a local Ray instance.


Using Modin backend:
Preview of the Modin DataFrame (macrodata):
          ds   realgdp  realcons  realinv  realgovt  realdpi    cpi     m1  \
0 1959-01-01  2710.349    1707.4  286.898   470.045   1886.9  28.98  139.7   
1 1959-04-01  2778.801    1733.7  310.859   481.301   1919.7  29.15  141.7   
2 1959-07-01  2775.488    1751.8  289.226   491.260   1916.4  29.35  140.5   
3 1959-10-01  2785.204    1753.7  299.356   484.052   1931.3  29.37  140.0   
4 1960-01-01  2847.699    1770.5  331.722   462.199   1955.5  29.54  139.6   

   tbilrate  unemp      pop  infl  realint  target_realgdp  
0      2.82    5.8  177.146  0.00     0.00        2778.801  
1      3.08    5.1  177.830  2.34     0.74        2775.488  
2      3.82    5.3  178.657  2.74     1.09        2785.204  
3      4.33    5.6  179.386  0.27     4.06        2847.699  
4      3.50    5.2  180.007  2.31     1.19        2834.390  
Metadata for Modin TimeFrame object:
{'_backend': 'mpd', '_cfg': {'BACKENDS': {'pl': 'polars', 'pd':

In [2]:
macro_modin_tf.get_data()

Unnamed: 0,ds,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint,target_realgdp
0,1959-01-01,2710.349,1707.4,286.898,470.045,1886.9,28.980,139.7,2.82,5.8,177.146,0.00,0.00,2778.801
1,1959-04-01,2778.801,1733.7,310.859,481.301,1919.7,29.150,141.7,3.08,5.1,177.830,2.34,0.74,2775.488
2,1959-07-01,2775.488,1751.8,289.226,491.260,1916.4,29.350,140.5,3.82,5.3,178.657,2.74,1.09,2785.204
3,1959-10-01,2785.204,1753.7,299.356,484.052,1931.3,29.370,140.0,4.33,5.6,179.386,0.27,4.06,2847.699
4,1960-01-01,2847.699,1770.5,331.722,462.199,1955.5,29.540,139.6,3.50,5.2,180.007,2.31,1.19,2834.390
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
197,2008-04-01,13415.266,9351.0,2026.518,961.280,10059.0,218.610,1409.3,1.74,5.4,304.483,8.53,-6.79,13324.600
198,2008-07-01,13324.600,9267.7,1990.693,991.551,9838.3,216.889,1474.7,1.17,6.0,305.270,-3.16,4.33,13141.920
199,2008-10-01,13141.920,9195.3,1857.661,1007.273,9920.4,212.174,1576.5,0.12,6.9,305.952,-8.79,8.91,12925.410
200,2009-01-01,12925.410,9209.2,1558.494,996.287,9926.4,212.671,1592.8,0.22,8.1,306.547,0.94,-0.71,12901.504


In [3]:
macro_modin_tf.__dict__

{'_backend': 'mpd',
 '_cfg': {'BACKENDS': {'pl': 'polars', 'pd': 'pandas', 'mpd': 'modin'}},
 '_time_col': 'ds',
 '_target_col': 'target_realgdp',
 '_sort': True,
 'df':             ds    realgdp  realcons   realinv  realgovt  realdpi      cpi  \
 0   1959-01-01   2710.349    1707.4   286.898   470.045   1886.9   28.980   
 1   1959-04-01   2778.801    1733.7   310.859   481.301   1919.7   29.150   
 2   1959-07-01   2775.488    1751.8   289.226   491.260   1916.4   29.350   
 3   1959-10-01   2785.204    1753.7   299.356   484.052   1931.3   29.370   
 4   1960-01-01   2847.699    1770.5   331.722   462.199   1955.5   29.540   
 ..         ...        ...       ...       ...       ...      ...      ...   
 197 2008-04-01  13415.266    9351.0  2026.518   961.280  10059.0  218.610   
 198 2008-07-01  13324.600    9267.7  1990.693   991.551   9838.3  216.889   
 199 2008-10-01  13141.920    9195.3  1857.661  1007.273   9920.4  212.174   
 200 2009-01-01  12925.410    9209.2  1558.494   99