In [None]:
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

# TemporalScope Tutorial: TimeFrame and Backend-Agnostic Data Loading

## TimeFrame Modes

The `TimeFrame` class supports two key modes for handling temporal data:

1. **Implicit & Static Time Series** (Default Mode):
   - Time column is treated as a feature for static modeling
   - Supports mixed-frequency workflows
   - No strict temporal ordering enforced
   - Use when: Building ML models where time is just another feature
   - Example: `enforce_temporal_uniqueness=False` (default)

2. **Strict Time Series**:
   - Enforces strict temporal ordering and uniqueness
   - Suitable for forecasting tasks
   - Can validate by groups using `id_col`
   - Use when: Building forecasting models requiring temporal integrity
   - Example: `enforce_temporal_uniqueness=True`

## Engineering Design Overview

The `TimeFrame` class is designed with several key assumptions to ensure performance, scalability, and flexibility across temporal XAI workflows:

1. **Preprocessed Data Assumption**:
   - TemporalScope assumes users provide clean, preprocessed data
   - Similar to TensorFlow and GluonTS, preprocessing (categorical encoding, missing data, scaling) should be handled before using TemporalScope

2. **Time Column Constraints**:
   - `time_col` must be numeric index or timestamp
   - Critical for operations like sliding window partitioning and temporal XAI (e.g., MASV computations)

3. **Numeric Features Requirement**:
   - All features (except `time_col`) must be numeric
   - Ensures compatibility with ML models and XAI techniques like SHAP

4. **Universal Model Assumption**:
   - Models operate on entire dataset without hidden groupings
   - Enables seamless integration with SHAP, Boruta-SHAP, and LIME

## Backend Support

TemporalScope supports multiple DataFrame backends through its core utilities:

- **Core Backends**:
  - `pandas`: Core DataFrame library
  - `modin`: Parallelized Pandas operations
  - `pyarrow`: Apache Arrow-based processing
  - `polars`: High-performance Rust implementation
  - `dask`: Distributed computing framework

This tutorial demonstrates loading data with pandas, polars, and modin as examples, but the principles apply across all supported backends.

## Core Purpose

TimeFrame's primary purpose is ensuring data quality and compatibility for:
- Computing Mean Absolute SHAP Values (MASV) on partitions
- Temporal feature importance analysis
- Integration with model-agnostic explainability tools

By validating data upfront, TimeFrame ensures reliable XAI workflows downstream.



# Example 1: Default Mode: Implicit & Static Time Series

This example demonstrates TimeFrame's default mode where time is treated as a static feature without strict ordering requirements. The following examples are for illustrative and academic purposes only. This software is distributed under the Apache License 2.0 "AS IS" and without warranties or conditions of any kind.

| Domain | Example Use Case | Time Column Treatment | Benefits |
|--------|-----------------|----------------------|-----------|
| Engineering/Signal Processing | Sensor data analysis where multiple sensors record at different frequencies | Time as a feature allows handling asynchronous measurements | - Handles mixed sampling rates<br>- Supports missing timestamps<br>- Enables cross-sensor analysis |
| Financial Markets | Factor model analysis examining price dynamics over different windows | Timestamps from different exchanges/assets treated as features | - Handles market hours differences<br>- Supports varying trade frequencies<br>- Enables cross-asset analysis |
| Healthcare Analytics | Patient monitoring where different measurements occur at irregular intervals | Treatment timestamps as features rather than strict sequence | - Handles irregular visit schedules<br>- Supports missing measurements<br>- Enables cross-patient analysis |

In this mode (enforce_temporal_uniqueness=False), TimeFrame treats time as just another feature, allowing for flexible analysis of mixed-frequency and asynchronous data. Let's demonstrate this default behavior:

In [None]:
import modin.pandas as mpd
import pandas as pd
import polars as pl

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends
from temporalscope.core.temporal_data_loader import TimeFrame as tf
from temporalscope.datasets.datasets import DatasetLoader

# First, let's see all supported backends
print("Supported TemporalScope backends:")
print(get_temporalscope_backends())
print_divider()

def init_timeframes_for_backends(target_col: str):
    """Initialize TimeFrame objects for demonstration backends.
    
    :param target_col: The target column for prediction
    :type target_col: str
    :return: A dictionary containing TimeFrame objects for each backend
    :rtype: dict
    """
    loader = DatasetLoader("macrodata")
    timeframes = {}
    
    for backend in ["pandas", "polars", "modin"]:
        print(f"Loading data with {backend} backend...")
        data = loader.load_data(backend=backend)
        # Default mode: time treated as static feature
        timeframes[backend] = tf(data, time_col="ds", target_col=target_col, enforce_temporal_uniqueness=False)
        print(f"Successfully created TimeFrame with {backend} backend")
        print_divider()
    
    return timeframes

if __name__ == "__main__":
    timeframes = init_timeframes_for_backends(target_col="realgdp")
    
    # Demonstrate with Modin backend
    print("\nDetailed Example with Modin Backend:")
    macro_modin_tf = timeframes["modin"]
    
    # Verify backend
    print(f"Backend type: {macro_modin_tf.backend}")
    
    print("\nPreview of the DataFrame:")
    print(macro_modin_tf.df.head())
    print_divider()
    
    print("TimeFrame Configuration:")
    print(f"Mode: {macro_modin_tf.mode}")
    print(f"Sort Order: {'Ascending' if macro_modin_tf.ascending else 'Descending'}")
    print_divider()

In [None]:
# Show all available attributes of TimeFrame
print("All TimeFrame attributes:")
print([attr for attr in dir(macro_modin_tf) if not attr.startswith('_')])
print_divider()

# Show key properties
print("Key TimeFrame Properties:")
print(f"Backend: {macro_modin_tf.backend}")
print(f"Mode: {macro_modin_tf.mode}")
print(f"Sort Order: {'Ascending' if macro_modin_tf.ascending else 'Descending'}")
print(f"Metadata: {macro_modin_tf.metadata}")
print_divider()

In [None]:
macro_modin_tf.df.head()

# Example 2: Group-Level Temporal Uniqueness Validation

The TimeFrame class supports validation of temporal uniqueness at the group level, which is essential for multi-entity time series applications. When enabled, this validation ensures that within each group (defined by id_col), timestamps are unique, while different groups can share the same timestamps. This aligns with modern approaches in time series analysis where multiple entities can be observed concurrently.

Key points:
- `enforce_temporal_uniqueness`: When True, ensures unique timestamps within each group
- `id_col`: Defines the grouping entity (e.g., patient_id, sensor_id)
- Different groups can have events at the same timestamps
- Supports mixed-frequency and asynchronous observations

Note: Modern machine learning and deep learning frameworks often treat observations as IID (Independent and Identically Distributed), making strict temporal ordering optional. This includes recent transformer architectures that can learn from irregularly sampled or mixed-frequency data. The choice of temporal validation should match your specific modeling requirements.

| Domain | Use Case | Group-Level Uniqueness | Cross-Group Flexibility |
|--------|-----------|----------------------|------------------------|
| Time Series Forecasting | Multi-asset price prediction | Each asset requires unique timestamps in its history | Different assets can share observation times |
| Clinical Trials | Multi-patient monitoring | Each patient needs unique measurement times | Different patients can have concurrent visits |
| IoT Monitoring | Multi-sensor networks | Each sensor needs unique reading times | Different sensors can record simultaneously |
| Anomaly Detection | Multi-system monitoring | Each system needs unique state recordings | Different systems can have concurrent states |

This aligns with findings in recent literature on universal time series representation learning (Trirat et al., 2024) and mixed-frequency forecasting (Filho et al., 2024), where group-level temporal consistency is maintained while allowing cross-group temporal overlap.

In [None]:
import modin.pandas as mpd
import pandas as pd
import polars as pl
import numpy as np

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends, convert_to_backend
from temporalscope.core.temporal_data_loader import TimeFrame as tf

# First, let's see all supported backends
print("Supported TemporalScope backends:")
print(get_temporalscope_backends())
print_divider()

# Create synthetic data with temporal uniqueness violations
# Case 1: Data with duplicate timestamps within same ID group
df_duplicates = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 1, 2, 2],  # Duplicate timestamps within each group
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):")
print(df_duplicates)
print_divider()

# Case 2: Valid data with unique timestamps within each ID group
df_valid = pd.DataFrame({
    'id': [1, 1, 2, 2],  # Two groups with IDs 1 and 2
    'time': [1, 2, 1, 3],  # Each ID has unique timestamps
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with valid temporal uniqueness (unique timestamps per ID):")
print(df_valid)
print_divider()

# Try to initialize with each backend and test temporal uniqueness validation
for backend in ["pandas", "polars", "modin"]:
    print(f"\nTesting with {backend} backend...")
    
    # Test data with temporal uniqueness violations
    print("\nTesting data with temporal uniqueness violations:")
    data = convert_to_backend(df_duplicates, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("WARNING: Expected validation error did not occur!")
    except Exception as e:
        print(f"Got expected error with {backend} backend: {str(e)}")
    
    # Test valid data
    print("\nTesting valid data:")
    data = convert_to_backend(df_valid, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",  # Group by ID for validation
                      verbose=True
                     )
        print("Successfully created TimeFrame with valid data")
    except Exception as e:
        print(f"Unexpected error with {backend} backend: {str(e)}")
    
    print_divider()

# Example 3: Temporal Uniqueness Validation - Invalid Case

The TimeFrame class enforces temporal uniqueness within groups when id_col is provided. Let's examine a case where this validation fails due to duplicate timestamps within groups.

## Invalid Case Example
In this example, we create a DataFrame where:
- Group 1 (id=1) has two rows with timestamp=1
- Group 2 (id=2) has two rows with timestamp=2

This violates temporal uniqueness because each group contains duplicate timestamps. When we try to create a TimeFrame with this data and enforce_temporal_uniqueness=True, it will raise a ValueError with the message "Duplicate timestamps in id_col 'id' column 'time'."

In [None]:
import modin.pandas as mpd
import pandas as pd
import polars as pl
import numpy as np

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends, convert_to_backend
from temporalscope.core.temporal_data_loader import TimeFrame as tf

# Create synthetic data with temporal uniqueness violations
df_duplicates = pd.DataFrame({
    'id': [1, 1, 2, 2],      # Two groups with IDs 1 and 2
    'time': [1, 1, 2, 2],    # Duplicate timestamps within each group
    'feature1': [0.1, 0.2, 0.3, 0.4],
    'feature2': [1.1, 1.2, 1.3, 1.4],
    'target': [10, 20, 30, 40]
})

print("Created DataFrame with temporal uniqueness violations (duplicate timestamps per ID):")
print(df_duplicates)
print_divider()

# Try to initialize with each backend
for backend in ["pandas", "polars", "modin"]:
    print(f"\nTesting with {backend} backend...")
    data = convert_to_backend(df_duplicates, backend=backend)
    print(f"Converted data type: {type(data)}")
    try:
        timeframe = tf(data, 
                      time_col="time", 
                      target_col="target",
                      enforce_temporal_uniqueness=True,  # Enable temporal uniqueness validation
                      id_col="id",                      # Group by ID for validation
                      verbose=True
                     )
        print("WARNING: Expected validation error did not occur!")
    except Exception as e:
        print(f"Got expected error with {backend} backend: {str(e)}")
    print_divider()

# Example 4: TimeFrame Metadata for XAI Workflows

TimeFrame's metadata container enables custom end-to-end workflows, particularly for XAI analysis. The metadata property can store temporal feature importance metrics like MASV (Moving Aggregated SHAP Values) and support custom partitioning schemes.

Note: This example is for illustrative and academic purposes only. This software is distributed under the Apache License 2.0 "AS IS" and without warranties or conditions of any kind.

In [None]:
import modin.pandas as mpd
import pandas as pd
import polars as pl
import numpy as np

from temporalscope.core.core_utils import print_divider, get_temporalscope_backends, convert_to_backend
from temporalscope.core.temporal_data_loader import TimeFrame as tf

# Create sample data
df = pd.DataFrame({
    'time': pd.date_range(start='2023-01-01', periods=5),
    'feature1': np.random.randn(5),
    'feature2': np.random.randn(5),
    'target': np.random.randn(5)
})

# Initialize TimeFrame
timeframe = tf(df, time_col='time', target_col='target', verbose=True)

# Store temporal feature importance metadata
timeframe.metadata['temporal_importance'] = {
    'config': {
        'window_size': 30,
        'stride': 10,
        'feature_importance': 'shap'
    },
    'partitions': [
        {
            'time_bounds': ('2023-01-01', '2023-01-30'),
            'feature_importance': {
                'feature1': 0.5,
                'feature2': 0.3
            }
        },
        {
            'time_bounds': ('2023-01-10', '2023-02-09'),
            'feature_importance': {
                'feature1': 0.4,
                'feature2': 0.6
            }
        }
    ]
}

print("TimeFrame Metadata Structure:")
print(timeframe.metadata['temporal_importance'])