# TemporalScope Tutorial: Synthetic Health Monitoring Analysis

## Overview

This tutorial demonstrates using TemporalScope with synthetic health data. While we plan to integrate standard academic healthcare datasets in future releases, this synthetic example illustrates the core functionality of TimeFrame and SingleStepTargetShifter.

### Current Features Demonstrated

1. **TimeFrame**:
   - Backend-agnostic data loading
   - Data validation for XAI workflows
   - Support for temporal data structures

2. **SingleStepTargetShifter**:
   - One-step-ahead target preparation
   - Clean separation of validation/transformation
   - Backend-agnostic operations

### Future Enhancements

- Integration with standard healthcare datasets
- Multi-step sequence prediction (planned MultiStepTargetShifter)
- Advanced temporal partitioning strategies

## Engineering Design

This tutorial follows TemporalScope's core engineering principles:

1. **Data Quality**:
   - Clean, preprocessed data assumption
   - Proper time column formatting
   - Numeric features requirement

2. **Backend Agnostic**:
   - Works with pandas, polars, modin
   - Pure Narwhals operations
   - Consistent behavior across backends

3. **XAI Ready**:
   - Prepared for MASV computations
   - Compatible with temporal feature importance
   - Supports model-agnostic explainability

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

from temporalscope.core.temporal_data_loader import TimeFrame
from temporalscope.target_shifters.single_step import SingleStepTargetShifter
from temporalscope.core.core_utils import print_divider

def generate_health_data(start_date: str = '2023-01-01', days: int = 365):
    """Generate synthetic health monitoring data.
    
    This synthetic data includes realistic patterns:
    - Seasonal effects (yearly cycles)
    - Weekly patterns (work stress)
    - Daily variations
    
    :param start_date: Starting date for the data
    :type start_date: str
    :param days: Number of days to generate
    :type days: int
    :return: DataFrame with synthetic health data
    :rtype: pd.DataFrame
    """
    # Create date range
    dates = pd.date_range(start=start_date, periods=days, freq='D')
    t = np.arange(days)
    
    # Generate patterns
    seasonal = 5 * np.sin(2 * np.pi * t / 365)  # Yearly cycle
    weekly = 3 * np.sin(2 * np.pi * t / 7)      # Weekly cycle
    
    # Generate metrics
    systolic = 120 + seasonal + weekly + np.random.normal(0, 3, days)
    heart_rate = 70 + weekly + np.random.normal(0, 3, days)
    
    return pd.DataFrame({
        'ds': dates,
        'systolic': systolic,
        'heart_rate': heart_rate
    })

# Generate synthetic data
print("Generating synthetic health data...")
health_df = generate_health_data()
print("Preview of generated health data:")
print(health_df.head())
print_divider()

Generating synthetic health data...
Preview of generated health data:
          ds    systolic  heart_rate
0 2023-01-01  114.150804   69.539203
1 2023-01-02  123.514289   69.639102
2 2023-01-03  126.973515   76.641613
3 2023-01-04  122.455920   71.553269
4 2023-01-05  125.396055   67.757135


In [2]:
# Explore the synthetic data

print("Data Overview:")
print(f"Shape: {health_df.shape}")
print("\nColumn Information:")
print(health_df.info())

print("\nSummary Statistics:")
health_df.describe()

Data Overview:
Shape: (365, 3)

Column Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   ds          365 non-null    datetime64[ns]
 1   systolic    365 non-null    float64       
 2   heart_rate  365 non-null    float64       
dtypes: datetime64[ns](1), float64(2)
memory usage: 8.7 KB
None

Summary Statistics:


Unnamed: 0,ds,systolic,heart_rate
count,365,365.0,365.0
mean,2023-07-02 00:00:00,119.76367,69.75069
min,2023-01-01 00:00:00,107.842383,58.621709
25%,2023-04-02 00:00:00,116.178238,67.26319
50%,2023-07-02 00:00:00,119.66149,69.834875
75%,2023-10-01 00:00:00,123.514289,72.64032
max,2023-12-31 00:00:00,131.785669,79.212598
std,,4.942698,3.723178


In [3]:
# Initialize TimeFrame for systolic blood pressure
systolic_tf = TimeFrame(
    df=health_df,
    time_col='ds',
    target_col='systolic'
)

print("Original TimeFrame:")
print(systolic_tf.df.head())
print_divider()

# Initialize SingleStepTargetShifter
shifter = SingleStepTargetShifter(n_lags=1, verbose=True)

# Transform data for one-step-ahead prediction
transformed_tf = shifter.fit_transform(systolic_tf)

print("\nTransformed TimeFrame:")
print(transformed_tf.df.head())
print_divider()

Original TimeFrame:
          ds    systolic  heart_rate
0 2023-01-01  114.150804   69.539203
1 2023-01-02  123.514289   69.639102
2 2023-01-03  126.973515   76.641613
3 2023-01-04  122.455920   71.553269
4 2023-01-05  125.396055   67.757135
Initialized SingleStepTargetShifter with target_col=None, n_lags=1
Rows before: 365; Rows after: 364; Dropped: 1

Transformed TimeFrame:
          ds  heart_rate  systolic_shift_1
0 2023-01-01   69.539203        123.514289
1 2023-01-02   69.639102        126.973515
2 2023-01-03   76.641613        122.455920
3 2023-01-04   71.553269        125.396055
4 2023-01-05   67.757135        113.028165


In [4]:
# Explore the transformed data

print("Original vs Transformed Shape:")
print(f"Original: {systolic_tf.df.shape}")
print(f"Transformed: {transformed_tf.df.shape}")
print("\nNote: One row less due to target shifting")

print("\nTransformed Data Preview:")
transformed_tf.df.head()

Original vs Transformed Shape:
Original: (365, 3)
Transformed: (364, 3)

Note: One row less due to target shifting

Transformed Data Preview:


Unnamed: 0,ds,heart_rate,systolic_shift_1
0,2023-01-01,69.539203,123.514289
1,2023-01-02,69.639102,126.973515
2,2023-01-03,76.641613,122.45592
3,2023-01-04,71.553269,125.396055
4,2023-01-05,67.757135,113.028165


## Implementation Notes

### Current Limitations

1. **Synthetic Data**:
   - Currently using synthetic data for demonstration
   - Future releases will integrate standard healthcare datasets
   - Academic dataset integration planned

2. **Single-Step Prediction**:
   - Current focus on one-step-ahead forecasting
   - Multi-step sequence prediction planned (MultiStepTargetShifter)
   - Deep learning support in development

### Best Practices

1. **Data Preparation**:
   - Ensure clean, preprocessed data
   - Proper datetime formatting
   - Handle missing values before using TemporalScope

2. **Backend Selection**:
   - Choose based on data size and compute resources
   - pandas: Small to medium datasets
   - polars/modin: Larger datasets

3. **XAI Workflows**:
   - TimeFrame ensures data quality for MASV
   - SingleStepTargetShifter preserves temporal structure
   - Ready for temporal feature importance analysis