# TemporalScope Tutorial: TimeFrame and Backend-Agnostic Data Loading

## TimeFrame Modes

The `TimeFrame` class supports two key modes for handling temporal data:

1. **Implicit & Static Time Series** (Default Mode):
   - Time column is treated as a feature for static modeling
   - Supports mixed-frequency workflows
   - No strict temporal ordering enforced
   - Use when: Building ML models where time is just another feature
   - Example: `enforce_temporal_uniqueness=False` (default)

2. **Strict Time Series**:
   - Enforces strict temporal ordering and uniqueness
   - Suitable for forecasting tasks
   - Can validate by groups using `id_col`
   - Use when: Building forecasting models requiring temporal integrity
   - Example: `enforce_temporal_uniqueness=True`

## Engineering Design Overview

The `TimeFrame` class is designed with several key assumptions to ensure performance, scalability, and flexibility across temporal XAI workflows:

1. **Preprocessed Data Assumption**:
   - TemporalScope assumes users provide clean, preprocessed data
   - Similar to TensorFlow and GluonTS, preprocessing (categorical encoding, missing data, scaling) should be handled before using TemporalScope

2. **Time Column Constraints**:
   - `time_col` must be numeric index or timestamp
   - Critical for operations like sliding window partitioning and temporal XAI (e.g., MASV computations)

3. **Numeric Features Requirement**:
   - All features (except `time_col`) must be numeric
   - Ensures compatibility with ML models and XAI techniques like SHAP

4. **Universal Model Assumption**:
   - Models operate on entire dataset without hidden groupings
   - Enables seamless integration with SHAP, Boruta-SHAP, and LIME

## Backend Support

TemporalScope supports multiple DataFrame backends through its core utilities:

- **Core Backends**:
  - `pandas`: Core DataFrame library
  - `modin`: Parallelized Pandas operations
  - `pyarrow`: Apache Arrow-based processing
  - `polars`: High-performance Rust implementation
  - `dask`: Distributed computing framework

This tutorial demonstrates loading data with pandas, polars, and modin as examples, but the principles apply across all supported backends.

## Core Purpose

TimeFrame's primary purpose is ensuring data quality and compatibility for:
- Computing Mean Absolute SHAP Values (MASV) on partitions
- Temporal feature importance analysis
- Integration with model-agnostic explainability tools

By validating data upfront, TimeFrame ensures reliable XAI workflows downstream.


## Demonstrating Default Mode (Implicit & Static Time Series)

This example demonstrates TimeFrame's default mode where time is treated as a static feature without strict ordering requirements.

In [1]:
import modin.pandas as mpd
import pandas as pd
import polars as pl

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends
from temporalscope.core.temporal_data_loader import TimeFrame as tf
from temporalscope.datasets.datasets import DatasetLoader

# First, let's see all supported backends
print("Supported TemporalScope backends:")
print(get_temporalscope_backends())
print_divider()

def init_timeframes_for_backends(target_col: str):
    """Initialize TimeFrame objects for demonstration backends.
    
    :param target_col: The target column for prediction
    :type target_col: str
    :return: A dictionary containing TimeFrame objects for each backend
    :rtype: dict
    """
    loader = DatasetLoader("macrodata")
    timeframes = {}
    
    for backend in ["pandas", "polars", "modin"]:
        print(f"Loading data with {backend} backend...")
        data = loader.load_data(backend=backend)
        # Default mode: time treated as static feature
        timeframes[backend] = tf(data, time_col="ds", target_col=target_col, enforce_temporal_uniqueness=False)
        print(f"Successfully created TimeFrame with {backend} backend")
        print_divider()
    
    return timeframes

if __name__ == "__main__":
    timeframes = init_timeframes_for_backends(target_col="realgdp")
    
    # Demonstrate with Modin backend
    print("\nDetailed Example with Modin Backend:")
    macro_modin_tf = timeframes["modin"]
    
    # Verify backend
    print(f"Backend type: {macro_modin_tf.backend}")
    
    print("\nPreview of the DataFrame:")
    print(macro_modin_tf.df.head())
    print_divider()
    
    print("TimeFrame Configuration:")
    print(f"Mode: {macro_modin_tf.mode}")
    print(f"Sort Order: {'Ascending' if macro_modin_tf.ascending else 'Descending'}")
    print_divider()

Supported TemporalScope backends:
['pandas', 'modin', 'pyarrow', 'polars', 'dask']
Loading data with pandas backend...
Loading dataset: 'macrodata'
DataFrame shape: (203, 13)
Target column: realgdp
Successfully created TimeFrame with pandas backend
Loading data with polars backend...
Loading dataset: 'macrodata'
DataFrame shape: (203, 13)
Target column: realgdp
Successfully created TimeFrame with polars backend
Loading data with modin backend...
Loading dataset: 'macrodata'
DataFrame shape: (203, 13)
Target column: realgdp


In addition, using fork() with Python in general is a recipe for mysterious
deadlocks and crashes.

The most likely reason you are seeing this error is because you are using the
multiprocessing module on Linux, which uses fork() by default. This will be
fixed in Python 3.14. Until then, you want to use the "spawn" context instead.

See https://docs.pola.rs/user-guide/misc/multiprocessing/ for details.

or by setting POLARS_ALLOW_FORKING_THREAD=1.

2024-12-08 20:46:56,645	INFO worker.py:1821 -- Started a local Ray instance.


Successfully created TimeFrame with modin backend

Detailed Example with Modin Backend:
Backend type: modin

Preview of the DataFrame:
    realgdp  realcons  realinv  realgovt  realdpi    cpi     m1  tbilrate  \
0  2710.349    1707.4  286.898   470.045   1886.9  28.98  139.7      2.82   
1  2778.801    1733.7  310.859   481.301   1919.7  29.15  141.7      3.08   
2  2775.488    1751.8  289.226   491.260   1916.4  29.35  140.5      3.82   
3  2785.204    1753.7  299.356   484.052   1931.3  29.37  140.0      4.33   
4  2847.699    1770.5  331.722   462.199   1955.5  29.54  139.6      3.50   

   unemp      pop  infl  realint         ds  
0    5.8  177.146  0.00     0.00 1959-01-01  
1    5.1  177.830  2.34     0.74 1959-04-01  
2    5.3  178.657  2.74     1.09 1959-07-01  
3    5.6  179.386  0.27     4.06 1959-10-01  
4    5.2  180.007  2.31     1.19 1960-01-01  
TimeFrame Configuration:
Mode: single_target
Sort Order: Ascending


In [2]:
# Show all available attributes of TimeFrame
print("All TimeFrame attributes:")
print([attr for attr in dir(macro_modin_tf) if not attr.startswith('_')])
print_divider()

# Show key properties
print("Key TimeFrame Properties:")
print(f"Backend: {macro_modin_tf.backend}")
print(f"Mode: {macro_modin_tf.mode}")
print(f"Sort Order: {'Ascending' if macro_modin_tf.ascending else 'Descending'}")
print(f"Metadata: {macro_modin_tf.metadata}")
print_divider()

All TimeFrame attributes:
['ascending', 'backend', 'df', 'metadata', 'mode', 'setup', 'sort_dataframe_time', 'update_dataframe', 'validate_dataframe']
Key TimeFrame Properties:
Backend: modin
Mode: single_target
Sort Order: Ascending
Metadata: {}


In [3]:
macro_modin_tf.df.head()

Unnamed: 0,realgdp,realcons,realinv,realgovt,realdpi,cpi,m1,tbilrate,unemp,pop,infl,realint,ds
0,2710.349,1707.4,286.898,470.045,1886.9,28.98,139.7,2.82,5.8,177.146,0.0,0.0,1959-01-01
1,2778.801,1733.7,310.859,481.301,1919.7,29.15,141.7,3.08,5.1,177.83,2.34,0.74,1959-04-01
2,2775.488,1751.8,289.226,491.26,1916.4,29.35,140.5,3.82,5.3,178.657,2.74,1.09,1959-07-01
3,2785.204,1753.7,299.356,484.052,1931.3,29.37,140.0,4.33,5.6,179.386,0.27,4.06,1959-10-01
4,2847.699,1770.5,331.722,462.199,1955.5,29.54,139.6,3.5,5.2,180.007,2.31,1.19,1960-01-01


# Temporal Uniqueness Validation in TimeFrame

The TimeFrame class supports validation of temporal uniqueness within groups, which is essential for many time series applications. This validation ensures that timestamps are unique within specified groups (defined by an ID column) or globally across the dataset.

Key points:
- `enforce_temporal_uniqueness`: When True, validates that timestamps in time_col are unique within groups
- `id_col`: Optional column name to define groups for temporal uniqueness validation
- Validation is performed during TimeFrame initialization and data updates
- Mixed-frequency datasets are supported, with flexible handling of temporal data

The example below demonstrates:
1. Data with temporal uniqueness violations (duplicate timestamps within ID groups)
2. Valid data with unique timestamps per ID group
3. Testing across different backends (pandas, polars, modin)


In [8]:
import modin.pandas as mpd
import pandas as pd
import polars as pl
import numpy as np

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends, convert_to_backend
from temporalscope.core.temporal_data_loader import TimeFrame as tf

# First, let's see all supported backends
print("Supported TemporalScope backends:")
print(get_temporalscope_backends())
print_divider()

# Create synthetic data with temporal uniqueness violations
# Case 1: Data with duplicate timestamps within same ID group
df_duplicates = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 1, 2, 2],  # Duplicate timestamps within each group
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):")
print(df_duplicates)
print_divider()

# Case 2: Valid data with unique timestamps within each ID group
df_valid = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 2, 1, 3],  # Each ID has unique timestamps
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with valid temporal uniqueness (unique timestamps per ID):")
print(df_valid)
print_divider()

# Try to initialize with each backend and test temporal uniqueness validation
for backend in ["pandas", "polars", "modin"]:
    print(f"\nTesting with {backend} backend...")
    
    # Test data with temporal uniqueness violations
    print("\nTesting data with temporal uniqueness violations:")
    data = convert_to_backend(df_duplicates, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("WARNING: Expected validation error did not occur!")
    except Exception as e:
        print(f"Got expected error with {backend} backend: {str(e)}")
    
    # Test valid data
    print("\nTesting valid data:")
    data = convert_to_backend(df_valid, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("Successfully created TimeFrame with valid data")
    except Exception as e:
        print(f"Unexpected error with {backend} backend: {str(e)}")
    
    print_divider()

Supported TemporalScope backends:
['pandas', 'modin', 'pyarrow', 'polars', 'dask']
Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     1       0.2       1.2      20
2   2     2       0.3       1.3      30
3   2     2       0.4       1.4      40
Created DataFrame with valid temporal uniqueness (unique timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     2       0.2       1.2      20
2   2     1       0.3       1.3      30
3   2     3       0.4       1.4      40

Testing with pandas backend...

Testing data with temporal uniqueness violations:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validation completed successfully.
Got expected error with pandas backend: Duplicate timestamps in id_col 'id' column 'time'.

Testing valid data:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validat

In [None]:
import modin.pandas as mpd
import pandas as pd
import polars as pl
import numpy as np

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends, convert_to_backend
from temporalscope.core.temporal_data_loader import TimeFrame as tf

# First, let's see all supported backends
print("Supported TemporalScope backends:")
print(get_temporalscope_backends())
print_divider()

# Create synthetic data with temporal uniqueness violations
# Case 1: Data with duplicate timestamps within same ID group
df_duplicates = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 1, 2, 2],  # Duplicate timestamps within each group
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):")
print(df_duplicates)
print_divider()

# Case 2: Valid data with unique timestamps within each ID group
df_valid = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 2, 1, 3],  # Each ID has unique timestamps
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with valid temporal uniqueness (unique timestamps per ID):")
print(df_valid)
print_divider()

# Try to initialize with each backend and test temporal uniqueness validation
for backend in ["pandas", "polars", "modin"]:
    print(f"\nTesting with {backend} backend...")
    
    # Test data with temporal uniqueness violations
    print("\nTesting data with temporal uniqueness violations:")
    data = convert_to_backend(df_duplicates, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("WARNING: Expected validation error did not occur!")
    except Exception as e:
        print(f"Got expected error with {backend} backend: {str(e)}")
    
    # Test valid data
    print("\nTesting valid data:")
    data = convert_to_backend(df_valid, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("Successfully created TimeFrame with valid data")
    except Exception as e:
        print(f"Unexpected error with {backend} backend: {str(e)}")
    
    print_divider()

Supported TemporalScope backends:
['pandas', 'modin', 'pyarrow', 'polars', 'dask']
Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     1       0.2       1.2      20
2   2     2       0.3       1.3      30
3   2     2       0.4       1.4      40
Created DataFrame with valid temporal uniqueness (unique timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     2       0.2       1.2      20
2   2     1       0.3       1.3      30
3   2     3       0.4       1.4      40

Testing with pandas backend...

Testing data with temporal uniqueness violations:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validation completed successfully.
Got expected error with pandas backend: Duplicate timestamps in id_col 'id' column 'time'.

Testing valid data:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validat

In [None]:
import modin.pandas as mpd
import pandas as pd
import polars as pl
import numpy as np

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends, convert_to_backend
from temporalscope.core.temporal_data_loader import TimeFrame as tf

# First, let's see all supported backends
print("Supported TemporalScope backends:")
print(get_temporalscope_backends())
print_divider()

# Create synthetic data with temporal uniqueness violations
# Case 1: Data with duplicate timestamps within same ID group
df_duplicates = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 1, 2, 2],  # Duplicate timestamps within each group
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):")
print(df_duplicates)
print_divider()

# Case 2: Valid data with unique timestamps within each ID group
df_valid = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 2, 1, 3],  # Each ID has unique timestamps
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with valid temporal uniqueness (unique timestamps per ID):")
print(df_valid)
print_divider()

# Try to initialize with each backend and test temporal uniqueness validation
for backend in ["pandas", "polars", "modin"]:
    print(f"\nTesting with {backend} backend...")
    
    # Test data with temporal uniqueness violations
    print("\nTesting data with temporal uniqueness violations:")
    data = convert_to_backend(df_duplicates, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("WARNING: Expected validation error did not occur!")
    except Exception as e:
        print(f"Got expected error with {backend} backend: {str(e)}")
    
    # Test valid data
    print("\nTesting valid data:")
    data = convert_to_backend(df_valid, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("Successfully created TimeFrame with valid data")
    except Exception as e:
        print(f"Unexpected error with {backend} backend: {str(e)}")
    
    print_divider()

Supported TemporalScope backends:
['pandas', 'modin', 'pyarrow', 'polars', 'dask']
Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     1       0.2       1.2      20
2   2     2       0.3       1.3      30
3   2     2       0.4       1.4      40
Created DataFrame with valid temporal uniqueness (unique timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     2       0.2       1.2      20
2   2     1       0.3       1.3      30
3   2     3       0.4       1.4      40

Testing with pandas backend...

Testing data with temporal uniqueness violations:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validation completed successfully.
Got expected error with pandas backend: Duplicate timestamps in id_col 'id' column 'time'.

Testing valid data:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validat

In [None]:
import modin.pandas as mpd
import pandas as pd
import polars as pl
import numpy as np

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends, convert_to_backend
from temporalscope.core.temporal_data_loader import TimeFrame as tf

# First, let's see all supported backends
print("Supported TemporalScope backends:")
print(get_temporalscope_backends())
print_divider()

# Create synthetic data with temporal uniqueness violations
# Case 1: Data with duplicate timestamps within same ID group
df_duplicates = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 1, 2, 2],  # Duplicate timestamps within each group
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):")
print(df_duplicates)
print_divider()

# Case 2: Valid data with unique timestamps within each ID group
df_valid = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 2, 1, 3],  # Each ID has unique timestamps
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with valid temporal uniqueness (unique timestamps per ID):")
print(df_valid)
print_divider()

# Try to initialize with each backend and test temporal uniqueness validation
for backend in ["pandas", "polars", "modin"]:
    print(f"\nTesting with {backend} backend...")
    
    # Test data with temporal uniqueness violations
    print("\nTesting data with temporal uniqueness violations:")
    data = convert_to_backend(df_duplicates, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("WARNING: Expected validation error did not occur!")
    except Exception as e:
        print(f"Got expected error with {backend} backend: {str(e)}")
    
    # Test valid data
    print("\nTesting valid data:")
    data = convert_to_backend(df_valid, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("Successfully created TimeFrame with valid data")
    except Exception as e:
        print(f"Unexpected error with {backend} backend: {str(e)}")
    
    print_divider()

Supported TemporalScope backends:
['pandas', 'modin', 'pyarrow', 'polars', 'dask']
Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     1       0.2       1.2      20
2   2     2       0.3       1.3      30
3   2     2       0.4       1.4      40
Created DataFrame with valid temporal uniqueness (unique timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     2       0.2       1.2      20
2   2     1       0.3       1.3      30
3   2     3       0.4       1.4      40

Testing with pandas backend...

Testing data with temporal uniqueness violations:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validation completed successfully.
Got expected error with pandas backend: Duplicate timestamps in id_col 'id' column 'time'.

Testing valid data:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validat

In [None]:
import modin.pandas as mpd
import pandas as pd
import polars as pl
import numpy as np

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends, convert_to_backend
from temporalscope.core.temporal_data_loader import TimeFrame as tf

# First, let's see all supported backends
print("Supported TemporalScope backends:")
print(get_temporalscope_backends())
print_divider()

# Create synthetic data with temporal uniqueness violations
# Case 1: Data with duplicate timestamps within same ID group
df_duplicates = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 1, 2, 2],  # Duplicate timestamps within each group
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):")
print(df_duplicates)
print_divider()

# Case 2: Valid data with unique timestamps within each ID group
df_valid = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 2, 1, 3],  # Each ID has unique timestamps
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with valid temporal uniqueness (unique timestamps per ID):")
print(df_valid)
print_divider()

# Try to initialize with each backend and test temporal uniqueness validation
for backend in ["pandas", "polars", "modin"]:
    print(f"\nTesting with {backend} backend...")
    
    # Test data with temporal uniqueness violations
    print("\nTesting data with temporal uniqueness violations:")
    data = convert_to_backend(df_duplicates, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("WARNING: Expected validation error did not occur!")
    except Exception as e:
        print(f"Got expected error with {backend} backend: {str(e)}")
    
    # Test valid data
    print("\nTesting valid data:")
    data = convert_to_backend(df_valid, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("Successfully created TimeFrame with valid data")
    except Exception as e:
        print(f"Unexpected error with {backend} backend: {str(e)}")
    
    print_divider()

Supported TemporalScope backends:
['pandas', 'modin', 'pyarrow', 'polars', 'dask']
Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     1       0.2       1.2      20
2   2     2       0.3       1.3      30
3   2     2       0.4       1.4      40
Created DataFrame with valid temporal uniqueness (unique timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     2       0.2       1.2      20
2   2     1       0.3       1.3      30
3   2     3       0.4       1.4      40

Testing with pandas backend...

Testing data with temporal uniqueness violations:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validation completed successfully.
Got expected error with pandas backend: Duplicate timestamps in id_col 'id' column 'time'.

Testing valid data:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validat

In [None]:
import modin.pandas as mpd
import pandas as pd
import polars as pl
import numpy as np

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends, convert_to_backend
from temporalscope.core.temporal_data_loader import TimeFrame as tf

# First, let's see all supported backends
print("Supported TemporalScope backends:")
print(get_temporalscope_backends())
print_divider()

# Create synthetic data with temporal uniqueness violations
# Case 1: Data with duplicate timestamps within same ID group
df_duplicates = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 1, 2, 2],  # Duplicate timestamps within each group
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):")
print(df_duplicates)
print_divider()

# Case 2: Valid data with unique timestamps within each ID group
df_valid = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 2, 1, 3],  # Each ID has unique timestamps
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with valid temporal uniqueness (unique timestamps per ID):")
print(df_valid)
print_divider()

# Try to initialize with each backend and test temporal uniqueness validation
for backend in ["pandas", "polars", "modin"]:
    print(f"\nTesting with {backend} backend...")
    
    # Test data with temporal uniqueness violations
    print("\nTesting data with temporal uniqueness violations:")
    data = convert_to_backend(df_duplicates, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("WARNING: Expected validation error did not occur!")
    except Exception as e:
        print(f"Got expected error with {backend} backend: {str(e)}")
    
    # Test valid data
    print("\nTesting valid data:")
    data = convert_to_backend(df_valid, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("Successfully created TimeFrame with valid data")
    except Exception as e:
        print(f"Unexpected error with {backend} backend: {str(e)}")
    
    print_divider()

Supported TemporalScope backends:
['pandas', 'modin', 'pyarrow', 'polars', 'dask']
Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     1       0.2       1.2      20
2   2     2       0.3       1.3      30
3   2     2       0.4       1.4      40
Created DataFrame with valid temporal uniqueness (unique timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     2       0.2       1.2      20
2   2     1       0.3       1.3      30
3   2     3       0.4       1.4      40

Testing with pandas backend...

Testing data with temporal uniqueness violations:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validation completed successfully.
Got expected error with pandas backend: Duplicate timestamps in id_col 'id' column 'time'.

Testing valid data:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validat

In [None]:
import modin.pandas as mpd
import pandas as pd
import polars as pl
import numpy as np

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends, convert_to_backend
from temporalscope.core.temporal_data_loader import TimeFrame as tf

# First, let's see all supported backends
print("Supported TemporalScope backends:")
print(get_temporalscope_backends())
print_divider()

# Create synthetic data with temporal uniqueness violations
# Case 1: Data with duplicate timestamps within same ID group
df_duplicates = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 1, 2, 2],  # Duplicate timestamps within each group
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):")
print(df_duplicates)
print_divider()

# Case 2: Valid data with unique timestamps within each ID group
df_valid = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 2, 1, 3],  # Each ID has unique timestamps
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with valid temporal uniqueness (unique timestamps per ID):")
print(df_valid)
print_divider()

# Try to initialize with each backend and test temporal uniqueness validation
for backend in ["pandas", "polars", "modin"]:
    print(f"\nTesting with {backend} backend...")
    
    # Test data with temporal uniqueness violations
    print("\nTesting data with temporal uniqueness violations:")
    data = convert_to_backend(df_duplicates, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("WARNING: Expected validation error did not occur!")
    except Exception as e:
        print(f"Got expected error with {backend} backend: {str(e)}")
    
    # Test valid data
    print("\nTesting valid data:")
    data = convert_to_backend(df_valid, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("Successfully created TimeFrame with valid data")
    except Exception as e:
        print(f"Unexpected error with {backend} backend: {str(e)}")
    
    print_divider()

Supported TemporalScope backends:
['pandas', 'modin', 'pyarrow', 'polars', 'dask']
Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     1       0.2       1.2      20
2   2     2       0.3       1.3      30
3   2     2       0.4       1.4      40
Created DataFrame with valid temporal uniqueness (unique timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     2       0.2       1.2      20
2   2     1       0.3       1.3      30
3   2     3       0.4       1.4      40

Testing with pandas backend...

Testing data with temporal uniqueness violations:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validation completed successfully.
Got expected error with pandas backend: Duplicate timestamps in id_col 'id' column 'time'.

Testing valid data:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validat

In [None]:
import modin.pandas as mpd
import pandas as pd
import polars as pl
import numpy as np

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends, convert_to_backend
from temporalscope.core.temporal_data_loader import TimeFrame as tf

# First, let's see all supported backends
print("Supported TemporalScope backends:")
print(get_temporalscope_backends())
print_divider()

# Create synthetic data with temporal uniqueness violations
# Case 1: Data with duplicate timestamps within same ID group
df_duplicates = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 1, 2, 2],  # Duplicate timestamps within each group
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):")
print(df_duplicates)
print_divider()

# Case 2: Valid data with unique timestamps within each ID group
df_valid = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 2, 1, 3],  # Each ID has unique timestamps
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with valid temporal uniqueness (unique timestamps per ID):")
print(df_valid)
print_divider()

# Try to initialize with each backend and test temporal uniqueness validation
for backend in ["pandas", "polars", "modin"]:
    print(f"\nTesting with {backend} backend...")
    
    # Test data with temporal uniqueness violations
    print("\nTesting data with temporal uniqueness violations:")
    data = convert_to_backend(df_duplicates, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("WARNING: Expected validation error did not occur!")
    except Exception as e:
        print(f"Got expected error with {backend} backend: {str(e)}")
    
    # Test valid data
    print("\nTesting valid data:")
    data = convert_to_backend(df_valid, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("Successfully created TimeFrame with valid data")
    except Exception as e:
        print(f"Unexpected error with {backend} backend: {str(e)}")
    
    print_divider()

Supported TemporalScope backends:
['pandas', 'modin', 'pyarrow', 'polars', 'dask']
Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     1       0.2       1.2      20
2   2     2       0.3       1.3      30
3   2     2       0.4       1.4      40
Created DataFrame with valid temporal uniqueness (unique timestamps per ID):
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   1     2       0.2       1.2      20
2   2     1       0.3       1.3      30
3   2     3       0.4       1.4      40

Testing with pandas backend...

Testing data with temporal uniqueness violations:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validation completed successfully.
Got expected error with pandas backend: Duplicate timestamps in id_col 'id' column 'time'.

Testing valid data:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validat

In [5]:
# Show all available attributes of TimeFrame
print("All TimeFrame attributes:")
print([attr for attr in dir(strict_modin_tf) if not attr.startswith('_')])
print_divider()

# Show key properties
print("Key TimeFrame Properties:")
print(f"Backend: {strict_modin_tf.backend}")
print(f"Mode: {strict_modin_tf.mode}")
print(f"Sort Order: {'Ascending' if strict_modin_tf.ascending else 'Descending'}")
print(f"Metadata: {strict_modin_tf.metadata}")
print_divider()

All TimeFrame attributes:
['ascending', 'backend', 'df', 'metadata', 'mode', 'setup', 'sort_dataframe_time', 'update_dataframe', 'validate_dataframe']
Key TimeFrame Properties:
Backend: modin
Mode: single_target
Sort Order: Ascending
Metadata: {}


In [6]:
strict_modin_tf.df.head()

Unnamed: 0,id,time,feature1,feature2,target
0,0,0,0.496714,-0.463418,1.465649
1,1,1,-0.138264,-0.46573,-0.225776
2,2,2,0.647689,0.241962,0.067528
3,3,3,1.52303,-1.91328,-1.424748
4,4,4,-0.234153,-1.724918,-0.544383


## Demonstrating Invalid Strict Temporal Ordering

This example shows how TimeFrame enforces strict temporal ordering by rejecting data with temporal violations. When `enforce_temporal_uniqueness=True`, timestamps must be strictly increasing within each ID group.

In [7]:
import modin.pandas as mpd
import pandas as pd
import polars as pl
import numpy as np

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends, convert_to_backend
from temporalscope.core.temporal_data_loader import TimeFrame as tf

# First, let's see all supported backends
print("Supported TemporalScope backends:")
print(get_temporalscope_backends())
print_divider()

# Create synthetic data with temporal violations
# Case 1: Duplicate timestamps
df_duplicates = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'time': [1, 1, 2, 3],  # Duplicate timestamp '1'
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with duplicate timestamps:")
print(df_duplicates)
print_divider()

# Case 2: Unsorted timestamps
df_unsorted = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'time': [3, 1, 4, 2],  # Unsorted timestamps
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with unsorted timestamps:")
print(df_unsorted)
print_divider()

# Try to initialize with each backend and each violation case
for backend in ["pandas", "polars", "modin"]:
    print(f"\nAttempting to load data with {backend} backend...")
    
    # Test duplicate timestamps
    print("\nTesting duplicate timestamps:")
    data = convert_to_backend(df_duplicates, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable strict validation
                      id_col="id",
                      verbose=True  # Enable verbose mode
                     )
        print("WARNING: Expected validation error did not occur!")
    except Exception as e:
        print(f"Got expected error with {backend} backend: {str(e)}")
    
    # Test unsorted timestamps
    print("\nTesting unsorted timestamps:")
    data = convert_to_backend(df_unsorted, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable strict validation
                      id_col="id",
                      verbose=True  # Enable verbose mode
                     )
        print("WARNING: Expected validation error did not occur!")
    except Exception as e:
        print(f"Got expected error with {backend} backend: {str(e)}")
    
    print_divider()

Supported TemporalScope backends:
['pandas', 'modin', 'pyarrow', 'polars', 'dask']
Created DataFrame with duplicate timestamps:
   id  time  feature1  feature2  target
0   1     1       0.1       1.1      10
1   2     1       0.2       1.2      20
2   3     2       0.3       1.3      30
3   4     3       0.4       1.4      40
Created DataFrame with unsorted timestamps:
   id  time  feature1  feature2  target
0   1     3       0.1       1.1      10
1   2     1       0.2       1.2      20
2   3     4       0.3       1.3      30
3   4     2       0.4       1.4      40

Attempting to load data with pandas backend...

Testing duplicate timestamps:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validation completed successfully.
Got expected error with pandas backend: Duplicate timestamps in id_col 'id' column 'time'.

Testing unsorted timestamps:
Converted data type: <class 'pandas.core.frame.DataFrame'>
Validation completed successfully.
TimeFrame successfully initialized with 

## Engineering Design Notes

The TimeFrame class enforces several constraints to ensure reliable XAI workflows:

1. **Data Validation**:
   - Checks for required columns
   - Validates time column format
   - Ensures numeric features
   - Verifies no missing values

2. **Backend Handling**:
   - Automatically infers appropriate backend
   - Validates backend compatibility
   - Maintains data integrity across conversions

3. **Temporal Ordering**:
   - Ensures proper time-based sorting
   - Critical for temporal feature importance analysis

These constraints ensure that data is properly prepared for downstream XAI workflows, particularly MASV computations on temporal partitions.