# Debug Labels - Activity Co-occurrence Analysis with Efficient Label-Centered Windowing

This notebook performs comprehensive activity co-occurrence analysis using an **efficient label-centered windowing approach** that generates 3-minute windows around the start of target activities:

## 🔧 **Raw Data Processing** (like raw_data_processor.py)
1. **Loading raw sensor data** directly from CSV files for the selected subject
2. **Applying configurable time corrections**:
   - Sensor timestamp shifts (easily toggle on/off)
   - Sensor drift corrections (easily toggle on/off) 
   - Label time corrections (easily toggle on/off)
3. **Signal processing**:
   - Duplicate timestamp handling
   - Missing data interpolation
   - Optional resampling to target frequency
   - Optional lowpass filtering

## 🎯 **Efficient Activity Co-occurrence Detection** ⚡
4. **Building continuous activity timeline** from corrected labels
5. **Applying activity label remapping** using Activity_Mapping_v2.csv
6. **Smart window generation**: Creates 3-minute windows **centered around target activity starts** (much faster than exhaustive search)
7. **Duplicate window removal**: Merges overlapping windows automatically
8. **Co-occurrence validation**: Ensures both activities are present with minimum duration requirements

## 📈 **Window Analysis & Comprehensive Visualization** 🎨
9. **Per-window activity timelines**: Each window stores its own activity timeline for efficient processing
10. **Generating PDF plots** with one window per page showing:
    - **Complete activity timeline** with ALL present activities highlighted and color-coded
    - **Target activities** marked with special indicators (⭐ stars)
    - **Trigger activity** marked with special indicator (🎯 target)
    - All sensor channels during the 3-minute window
    - **Unique color backgrounds** for each activity type (blue for target 1, green for target 2, other colors for additional activities)
    - Window boundaries, center markers, and trigger information
    - **Comprehensive legend** showing all activities with clear identification

## ⚙️ **Easy Configuration**
- **Toggle time corrections** independently to see their effects
- **Adjust window parameters** (size, minimum durations)
- **Modify activities and subjects** for different analyses
- **Control processing parameters** (frequency, filtering, etc.)

## 🚀 **Key Advantages of New Approach**
- **⚡ Much faster**: Only generates windows around actual activity starts (not exhaustive search)
- **🎯 More targeted**: Focuses on periods when target activities actually occur
- **📊 Better coverage**: Captures the most relevant activity transition periods
- **🔄 Efficient storage**: Each window contains its own activity timeline
- **📈 Scalable**: Performance doesn't degrade with longer data sequences
- **🎨 Complete visualization**: ALL activities highlighted, not just targets

## 🎯 **Windowing Strategy**
- **Trigger-based**: Windows are centered on target activity start times
- **3-minute duration**: 1.5 minutes before and after each activity start
- **Automatic deduplication**: Overlapping windows are merged
- **Co-occurrence validation**: Both activities must be present with minimum duration

## 🎨 **Enhanced Activity Visualization**
- **🎯 Target activities**: Special highlighting with blue (target 1) and green (target 2)
- **⭐ Star markers**: Target activities clearly marked in legend
- **🎪 Other activities**: Unique color coding (orange, purple, brown, pink, etc.)
- **🎯 Trigger indicators**: Shows which activity triggered window creation
- **📊 Background coloring**: All activity periods shown with colored backgrounds on sensor plots
- **🔍 Complete legend**: Shows all present activities with clear identification
- **💡 Activity context**: Easy to see activity transitions and overlaps

## 📊 **Analysis Output**
The output PDF contains detailed plots showing:
- **Complete activity timeline** - when ALL activities occur within the 3-minute window
- **Enhanced activity identification** - stars for targets, trigger markers, unique colors
- **Trigger information** - which activity triggered the window creation
- **Multi-sensor data** - all sensor channels synchronized to the same time window
- **Visual markers** - window boundaries, center points, and activity periods
- **Duration information** - how long each activity is present in the window
- **Activity context** - see all activities happening during the analysis period

Perfect for analyzing sensor behavior when multiple activities occur, understanding complex activity patterns, labeling accuracy during multi-activity scenarios, and debugging time alignment issues. The enhanced visualization shows the complete activity context, not just the two target activities.

## 1. Setup and Configuration

In [1]:
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import seaborn as sns
import logging
from datetime import datetime, timedelta
import warnings
import yaml
warnings.filterwarnings('ignore')

# Set up paths
script_dir = os.getcwd()
project_root = os.path.abspath(os.path.join(script_dir, '..'))
if script_dir not in sys.path:
    sys.path.insert(0, script_dir)
if project_root not in sys.path:
    sys.path.insert(0, project_root)

# Import pipeline modules
try:
    import config_loader
    import utils
    # Import specific data loading functions
    from raw_data_processor import (
        data_loader_no_dir,
        data_loader_with_dir, 
        select_data_loader,
        modify_modality_names
    )
except ImportError as e:
    print(f"Error importing modules: {e}")
    print("Make sure all pipeline modules are available")

# Setup logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

print(f"Project root: {project_root}")
print(f"Script directory: {script_dir}")

Project root: /scai_data3/scratch/stirnimann_r
Script directory: /scai_data3/scratch/stirnimann_r/src


In [2]:
# Load configuration
config_path = os.path.join(project_root, 'config.yaml')
cfg = config_loader.load_config(config_path)

# Load sync parameters for timestamp shifts
sync_params_path = os.path.join(project_root, 'Sync_Parameters.yaml')
with open(sync_params_path, 'r') as f:
    sync_params = yaml.safe_load(f)

# Set seed for reproducibility
utils.set_seed(cfg.get('seed_number', 42))

print("Configuration loaded successfully")
print(f"Available subjects to process: {cfg.get('subjects_to_process', [])}")

2025-06-23 23:47:07,337 - INFO - Configuration loaded successfully from /scai_data3/scratch/stirnimann_r/config.yaml
2025-06-23 23:47:07,470 - INFO - Random seeds set to 42
2025-06-23 23:47:07,470 - INFO - Random seeds set to 42


Configuration loaded successfully
Available subjects to process: None


## 2.5. Load Activity Label Mapping

In [3]:
# Load activity label mapping from CSV
mapping_file_path = os.path.join(project_root, 'Activity_Mapping_v2.csv')

print(f"Loading activity mapping from: {mapping_file_path}")

if os.path.exists(mapping_file_path):
    # Load the mapping CSV
    mapping_df = pd.read_csv(mapping_file_path)
    print(f"Loaded mapping file with {len(mapping_df)} entries")
    print("Mapping columns:", mapping_df.columns.tolist())
    print("\nFirst few mapping entries:")
    print(mapping_df.head())
    
    # Create label mapping dictionary
    # Assuming the CSV has 'original_label' and 'mapped_label' columns
    # Adjust column names based on actual CSV structure
    if 'original_label' in mapping_df.columns and 'mapped_label' in mapping_df.columns:
        label_mapping = dict(zip(mapping_df['original_label'], mapping_df['mapped_label']))
    elif 'Original_Label' in mapping_df.columns and 'Mapped_Label' in mapping_df.columns:
        label_mapping = dict(zip(mapping_df['Original_Label'], mapping_df['Mapped_Label']))
    elif len(mapping_df.columns) >= 2:
        # Use first two columns if exact names don't match
        label_mapping = dict(zip(mapping_df.iloc[:, 0], mapping_df.iloc[:, 1]))
        print(f"Using columns '{mapping_df.columns[0]}' -> '{mapping_df.columns[1]}' for mapping")
    else:
        print("Warning: Could not determine mapping columns, using identity mapping")
        label_mapping = {}
    
    print(f"\nLabel mapping dictionary created with {len(label_mapping)} entries:")
    for orig, mapped in list(label_mapping.items())[:10]:  # Show first 10 mappings
        print(f"  '{orig}' -> '{mapped}'")
    
    # Define a function to apply label mapping
    def apply_label_mapping(labels, mapping_dict):
        """Apply label mapping to an array of labels"""
        if not mapping_dict:
            print("No mapping applied (empty mapping dictionary)")
            return labels
        
        # Apply mapping, keeping original if not found in mapping
        mapped_labels = np.array([mapping_dict.get(label, label) for label in labels])
        
        # Count how many labels were mapped
        n_mapped = np.sum(labels != mapped_labels)
        print(f"Applied label mapping: {n_mapped}/{len(labels)} labels were remapped")
        
        return mapped_labels
    
else:
    print(f"Warning: Mapping file not found at {mapping_file_path}")
    print("Proceeding without label remapping")
    label_mapping = {}
    
    def apply_label_mapping(labels, mapping_dict):
        """Identity function when no mapping is available"""
        print("No label mapping applied (mapping file not found)")
        return labels

print("Label mapping setup complete")

Loading activity mapping from: /scai_data3/scratch/stirnimann_r/Activity_Mapping_v2.csv
Loaded mapping file with 143 entries
Mapping columns: ['Former_Label', 'New_Label']

First few mapping entries:
                Former_Label      New_Label
0  wheelchair_to_sitting_bed       Transfer
1          wheelchair_to_sit       Transfer
2              handling_oven  handling_oven
3               conversation   Conversation
4                 exercising     Exercising
Using columns 'Former_Label' -> 'New_Label' for mapping

Label mapping dictionary created with 142 entries:
  'wheelchair_to_sitting_bed' -> 'Transfer'
  'wheelchair_to_sit' -> 'Transfer'
  'handling_oven' -> 'handling_oven'
  'conversation' -> 'Conversation'
  'exercising' -> 'Exercising'
  'petting_dog' -> 'petting_dog'
  'not recognized' -> 'Unknown'
  'using_laptop' -> 'Using Computer'
  'wheelchair_to_sit_bed' -> 'Transfer'
  'using_pc' -> 'Using Computer'
Label mapping setup complete


## 2. Subject and Transition Configuration

In [None]:
# Configuration for analysis
SUBJECT_ID = "OutSense-498"  # Change this to the subject you want to debug

# Original activity names (before mapping)
FROM_ACTIVITY_ORIGINAL = "self_propulsion"  # First activity to look for
TO_ACTIVITY_ORIGINAL = "waiting"            # Second activity to look for

# Apply label mapping to get the actual activity names used in the data
FROM_ACTIVITY = label_mapping.get(FROM_ACTIVITY_ORIGINAL, FROM_ACTIVITY_ORIGINAL)
TO_ACTIVITY = label_mapping.get(TO_ACTIVITY_ORIGINAL, TO_ACTIVITY_ORIGINAL)

OUTPUT_PDF_NAME = f"activity_windows_{SUBJECT_ID}_{FROM_ACTIVITY}_and_{TO_ACTIVITY}.pdf"

# Time window configuration for finding both activities
TIME_FRAME_MINUTES = 3.0  # Size of time frames to analyze (3 minutes)
OVERLAP_MINUTES = 3.0     # Overlap between consecutive windows (to catch activities at boundaries)
MIN_ACTIVITY_DURATION_SEC = 2.0  # Minimum duration each activity must be present in the window

# Time correction toggles - easily enable/disable different corrections
ENABLE_SENSOR_TIME_SHIFT = False     # Apply timestamp shift correction
ENABLE_SENSOR_DRIFT_CORRECTION = False  # Apply drift correction  
ENABLE_LABEL_TIME_CORRECTION = False  # Apply label time shift correction

# Processing configuration
DOWNSAMPLE_TO_TARGET_FREQ = True    # Whether to resample to target frequency
TARGET_FREQUENCY = cfg.get('downsample_freq', 25)  # Hz
APPLY_FILTERING = True              # Whether to apply lowpass filtering

print(f"Selected subject: {SUBJECT_ID}")
print(f"Original activities: {FROM_ACTIVITY_ORIGINAL} and {TO_ACTIVITY_ORIGINAL}")
print(f"Mapped activities: {FROM_ACTIVITY} and {TO_ACTIVITY}")
print(f"Analysis: Find {TIME_FRAME_MINUTES}-minute windows containing both activities")
print(f"Output PDF: {OUTPUT_PDF_NAME}")

print(f"\nTime frame settings:")
print(f"  - Window size: {TIME_FRAME_MINUTES} minutes")
print(f"  - Window overlap: {OVERLAP_MINUTES} minutes")
print(f"  - Min activity duration: {MIN_ACTIVITY_DURATION_SEC} seconds")

print(f"\nTime correction settings:")
print(f"  - Sensor time shift: {'ENABLED' if ENABLE_SENSOR_TIME_SHIFT else 'DISABLED'}")
print(f"  - Sensor drift correction: {'ENABLED' if ENABLE_SENSOR_DRIFT_CORRECTION else 'DISABLED'}")
print(f"  - Label time correction: {'ENABLED' if ENABLE_LABEL_TIME_CORRECTION else 'DISABLED'}")

print(f"\nProcessing settings:")
print(f"  - Target frequency: {TARGET_FREQUENCY} Hz")
print(f"  - Resampling: {'ENABLED' if DOWNSAMPLE_TO_TARGET_FREQ else 'DISABLED'}")
print(f"  - Filtering: {'ENABLED' if APPLY_FILTERING else 'DISABLED'}")

# Show mapping status
if FROM_ACTIVITY != FROM_ACTIVITY_ORIGINAL:
    print(f"\nNote: '{FROM_ACTIVITY_ORIGINAL}' was mapped to '{FROM_ACTIVITY}'")
if TO_ACTIVITY != TO_ACTIVITY_ORIGINAL:
    print(f"Note: '{TO_ACTIVITY_ORIGINAL}' was mapped to '{TO_ACTIVITY}'")

Selected subject: OutSense-498
Original activities: self_propulsion and waiting
Mapped activities: Self Propulsion and Resting
Analysis: Find 3.0-minute windows containing both activities
Output PDF: activity_windows_OutSense-498_Self Propulsion_and_Resting.pdf

Time frame settings:
  - Window size: 3.0 minutes
  - Window overlap: 3.0 minutes
  - Min activity duration: 2.0 seconds

Time correction settings:
  - Sensor time shift: ENABLED
  - Sensor drift correction: DISABLED
  - Label time correction: DISABLED

Processing settings:
  - Target frequency: 25 Hz
  - Resampling: ENABLED
  - Filtering: ENABLED

Note: 'self_propulsion' was mapped to 'Self Propulsion'
Note: 'waiting' was mapped to 'Resting'


## 3. Load and Process Raw Sensor Data

## 4. Efficient Window Generation Around Target Activity Starts

**New Approach**: Instead of exhaustively searching all possible 3-minute windows (which is slow), we now generate windows more efficiently by:

1. **Finding target activity starts**: Look for all instances where our target activities (`FROM_ACTIVITY` and `TO_ACTIVITY`) begin
2. **Creating centered windows**: Generate 3-minute windows centered around each target activity start (1.5 minutes before, 1.5 minutes after)
3. **Removing duplicates**: If multiple activities start close together, merge overlapping windows
4. **Validating co-occurrence**: Check each window to ensure both target activities are present with sufficient duration

**Benefits**:
- ⚡ **Much faster**: Only generates windows where target activities actually occur
- 🎯 **More targeted**: Focuses on periods when activities are actually happening
- 📊 **Better coverage**: Ensures we capture the most relevant time periods
- 🔄 **Per-window timelines**: Each window stores its own activity timeline for efficient plotting

**Window Criteria**:
- Window size: 3 minutes (configurable)
- Minimum activity duration: 2 seconds each (configurable)
- Window overlap: Handled automatically by duplicate removal
- Trigger: Centered on target activity start times

In [5]:
# Load and process raw sensor data using the same approach as raw_data_processor.py
print("Loading and processing raw sensor data...")

# Import additional functions from raw_data_processor
from raw_data_processor import (
    correct_timestamp_drift,
    process_modality_duplicates,
    handle_missing_data_interpolation,
    butter_lowpass_sos,
    apply_filter_combined
)

try:
    # Load global labels for the subject
    global_labels_path = os.path.join(project_root, cfg.get('global_labels_file'))
    global_labels_df = pd.read_csv(global_labels_path, parse_dates=['Real_Start_Time', 'Real_End_Time'])
    
    # Filter labels for the selected subject
    subject_labels = global_labels_df[global_labels_df['Video_File'].str.contains(SUBJECT_ID, na=False)].copy()
    
    if not subject_labels.empty:
        subject_labels['Real_Start_Time'] = pd.to_datetime(subject_labels['Real_Start_Time'], errors='coerce')
        subject_labels['Real_End_Time'] = pd.to_datetime(subject_labels['Real_End_Time'], errors='coerce')
        subject_labels.dropna(subset=['Real_Start_Time', 'Real_End_Time'], inplace=True)
    
    print(f"Loaded {len(subject_labels)} labels for subject {SUBJECT_ID}")
    
    # Apply label time correction if enabled
    if ENABLE_LABEL_TIME_CORRECTION and not subject_labels.empty:
        print("Applying label time correction...")
        subject_correction_params = sync_params.get(SUBJECT_ID, {})
        label_time_shift_str = subject_correction_params.get('Label_Time_Shift', '0h 0min 0s')
        
        # Parse time shift string
        import re
        shift_match = re.match(r'(?:(-?\d+)h)?\s*(?:(-?\d+)min)?\s*(?:(-?\d+)s)?', label_time_shift_str)
        if shift_match:
            shift_hours = int(shift_match.group(1) or 0)
            shift_minutes = int(shift_match.group(2) or 0) 
            shift_seconds = int(shift_match.group(3) or 0)
            total_shift_seconds = (shift_hours * 3600) + (shift_minutes * 60) + shift_seconds
            
            if total_shift_seconds != 0:
                time_delta_shift = pd.Timedelta(seconds=total_shift_seconds)
                subject_labels['Real_Start_Time'] += time_delta_shift
                subject_labels['Real_End_Time'] += time_delta_shift
                print(f"Applied label time shift: {label_time_shift_str} ({total_shift_seconds}s)")
    
    # Apply label mapping to loaded labels
    if label_mapping and not subject_labels.empty:
        print("Applying label mapping to loaded labels...")
        subject_labels['Label'] = subject_labels['Label'].map(lambda x: label_mapping.get(x, x))
        print(f"Unique mapped labels: {sorted(subject_labels['Label'].unique())}")
    
    # Determine labeled time range for filtering sensor data
    if not subject_labels.empty:
        min_label_time = subject_labels['Real_Start_Time'].min()
        max_label_time = subject_labels['Real_End_Time'].max()
        
        # Add buffer around labeled timeframe to avoid edge effects
        buffer_time = pd.Timedelta(minutes=TIME_FRAME_MINUTES)
        filter_start_time = min_label_time - buffer_time
        filter_end_time = max_label_time + buffer_time
        
        print(f"Label time range: {min_label_time} to {max_label_time}")
        print(f"Sensor data will be filtered to: {filter_start_time} to {filter_end_time}")
        print(f"(Added {TIME_FRAME_MINUTES}-minute buffer on each side)")
    else:
        filter_start_time = None
        filter_end_time = None
        print("Warning: No valid labels found - will not filter sensor data by time")
    
except Exception as e:
    print(f"Error loading labels: {e}")
    subject_labels = pd.DataFrame()
    filter_start_time = None
    filter_end_time = None

# Load and process each sensor
raw_data_base_dir = os.path.join(project_root, cfg.get('raw_data_input_dir'))
subject_dir = os.path.join(raw_data_base_dir, SUBJECT_ID)
raw_data_parsing_config = cfg.get('raw_data_parsing_config', {})
subject_correction_params = sync_params.get(SUBJECT_ID, {})

print(f"\nProcessing sensors for subject: {SUBJECT_ID}")
print(f"Subject directory: {subject_dir}")
print(f"Available sensors: {list(raw_data_parsing_config.keys())}")

processed_sensors = {}
sensor_time_ranges = {}

for sensor_name, sensor_settings in raw_data_parsing_config.items():
    print(f"\n--- Processing sensor: {sensor_name} ---")
    
    try:
        # Load raw sensor data
        loader = select_data_loader(sensor_name)
        sensor_data_raw = loader(subject_dir, sensor_name, sensor_settings)
        
        if sensor_data_raw.empty or 'time' not in sensor_data_raw.columns:
            print(f"No data loaded for {sensor_name}")
            continue
        
        print(f"Loaded {len(sensor_data_raw)} raw samples")
        
        # Apply time corrections
        sensor_corr_params = subject_correction_params.get(sensor_name, {'unit': 's'})
        time_unit = sensor_corr_params.get('unit', 's')
        time_col_num = sensor_data_raw['time'].astype(float)
        
        # Convert to seconds if needed
        if time_unit == 'ms':
            time_col_num = time_col_num / 1000.0
        
        # Apply shift correction if enabled
        if ENABLE_SENSOR_TIME_SHIFT:
            shift_val = sensor_corr_params.get('shift', 0)
            if shift_val != 0:
                time_col_num = time_col_num + shift_val
                print(f"Applied time shift: {shift_val}s")
        
        # Apply drift correction if enabled
        time_col_final_num = time_col_num
        if ENABLE_SENSOR_DRIFT_CORRECTION:
            drift_params = sensor_corr_params.get('drift')
            if drift_params and all(k in drift_params for k in ['t0', 't1', 'drift_secs']):
                t0_ts = pd.Timestamp(drift_params['t0'])
                t1_ts = pd.Timestamp(drift_params['t1'])
                if not pd.isna(t0_ts) and not pd.isna(t1_ts):
                    t0, t1 = t0_ts.timestamp(), t1_ts.timestamp()
                    drift = drift_params['drift_secs']
                    time_col_final_num = time_col_num.apply(correct_timestamp_drift, args=(t0, t1, drift))
                    print(f"Applied drift correction: {drift}s over {t1-t0:.1f}s interval")
        
        # Convert to datetime
        corrected_timestamps = pd.to_datetime(time_col_final_num, unit='s', errors='coerce')
        sensor_data_corrected = sensor_data_raw.drop(columns=['time']).copy()
        sensor_data_corrected['time'] = corrected_timestamps
        sensor_data_corrected.dropna(subset=['time'], inplace=True)
        
        if sensor_data_corrected.empty:
            print(f"No valid data after time correction for {sensor_name}")
            continue
        
        # Filter by labeled timeframe if available
        if filter_start_time is not None and filter_end_time is not None:
            print(f"Filtering data to labeled timeframe...")
            pre_filter_count = len(sensor_data_corrected)
            
            # Remove timezone info if present for comparison
            if filter_start_time.tzinfo is not None:
                filter_start_time_tz = filter_start_time.tz_localize(None)
            else:
                filter_start_time_tz = filter_start_time
                
            if filter_end_time.tzinfo is not None:
                filter_end_time_tz = filter_end_time.tz_localize(None)
            else:
                filter_end_time_tz = filter_end_time
            
            time_mask = (sensor_data_corrected['time'] >= filter_start_time_tz) & (sensor_data_corrected['time'] <= filter_end_time_tz)
            sensor_data_corrected = sensor_data_corrected[time_mask]
            
            post_filter_count = len(sensor_data_corrected)
            print(f"Filtered from {pre_filter_count} to {post_filter_count} samples ({post_filter_count/pre_filter_count*100:.1f}% retained)")
            
            if sensor_data_corrected.empty:
                print(f"No data remaining after time filtering for {sensor_name}")
                continue
        
        # Set time as index
        sensor_data_corrected.set_index('time', inplace=True)
        sensor_data_corrected.sort_index(inplace=True)
        
        # Apply modality-specific processing
        sample_rate = sensor_settings.get('sample_rate', TARGET_FREQUENCY)
        processed_data = process_modality_duplicates(sensor_data_corrected, sample_rate)
        processed_data = handle_missing_data_interpolation(processed_data, max_interp_gap_s=2, target_freq=TARGET_FREQUENCY)
        
        # Apply column renaming
        new_name, processed_data = modify_modality_names(processed_data, sensor_name)
        
        if processed_data.empty:
            print(f"No data after processing for {sensor_name}")
            continue
        
        print(f"Processed data shape: {processed_data.shape}")
        print(f"Time range: {processed_data.index.min()} to {processed_data.index.max()}")
        
        processed_sensors[new_name] = processed_data
        sensor_time_ranges[new_name] = (processed_data.index.min(), processed_data.index.max())
        
    except Exception as e:
        print(f"Error processing sensor {sensor_name}: {e}")
        import traceback
        traceback.print_exc()

print(f"\nSuccessfully processed {len(processed_sensors)} sensors:")
for name, (start, end) in sensor_time_ranges.items():
    duration = end - start
    print(f"  {name}: {duration} duration")

if not processed_sensors:
    print("ERROR: No sensor data was successfully processed!")
    raise ValueError("No sensor data available for analysis")

Loading and processing raw sensor data...
Loaded 640 labels for subject OutSense-498
Applying label mapping to loaded labels...
Unique mapped labels: ['Assisted Propulsion', 'Changing Clothes', 'Conversation', 'Eating', 'Exercising', 'Resting', 'Self Propulsion', 'Toileting', 'Transfer', 'Using Computer', 'Using Phone', 'Washing Hands', 'beard_hair_styling', 'bending', 'brushing_teeth', 'dark', 'housework', 'lying', 'manipulating', 'preparing_meal', 'putting_toothpast', 'reading', 'rinsing_mouth', 'writing']
Label time range: 2024-02-15 11:08:30 to 2024-02-17 08:46:23
Sensor data will be filtered to: 2024-02-15 11:05:30 to 2024-02-17 08:49:23
(Added 3.0-minute buffer on each side)

Processing sensors for subject: OutSense-498
Subject directory: /scai_data2/scai_datasets/interim/scai-outsense/OutSense-498
Available sensors: ['corsano_wrist_acc', 'cosinuss_ear_acc_x_acc_y_acc_z', 'mbient_imu_wc_accelerometer', 'mbient_imu_wc_gyroscope', 'vivalnk_vv330_acceleration', 'sensomative_bottom_l

In [6]:
# Find 3-minute time frames containing both target activities
print(f"Searching for {TIME_FRAME_MINUTES}-minute windows containing both '{FROM_ACTIVITY}' and '{TO_ACTIVITY}'...")

if subject_labels.empty:
    print("ERROR: No labels available for window identification!")
    raise ValueError("No labels available for analysis")

# Create a continuous timeline for activity labels
if not sensor_time_ranges:
    print("ERROR: No sensor time ranges available!")
    raise ValueError("No sensor data available")

# Determine overall time range from all sensors
all_start_times = [start for start, end in sensor_time_ranges.values()]
all_end_times = [end for start, end in sensor_time_ranges.values()]
overall_start = min(all_start_times)
overall_end = max(all_end_times)

print(f"Overall data time range: {overall_start} to {overall_end}")
print(f"Duration: {overall_end - overall_start}")

# Create a timeline with target frequency for activity labels
timeline_freq = f"{1/TARGET_FREQUENCY}S"  # Convert Hz to period string
timeline_index = pd.date_range(start=overall_start, end=overall_end, freq=timeline_freq)

print(f"Created timeline with {len(timeline_index)} time points at {TARGET_FREQUENCY} Hz")

# Initialize activity timeline
activity_timeline = pd.Series(index=timeline_index, data='Unknown', name='Activity')

# Apply labels to timeline
print(f"Applying {len(subject_labels)} labels to timeline...")
labels_applied = 0
for _, row in subject_labels.iterrows():
    start_ts = row['Real_Start_Time']
    end_ts = row['Real_End_Time']
    activity = row['Label']
    
    if pd.isna(start_ts) or pd.isna(end_ts):
        continue
    
    # Remove timezone info if present
    if start_ts.tzinfo is not None:
        start_ts = start_ts.tz_localize(None)
    if end_ts.tzinfo is not None:
        end_ts = end_ts.tz_localize(None)
    
    try:
        # Apply label to timeline
        mask = (activity_timeline.index >= start_ts) & (activity_timeline.index <= end_ts)
        activity_timeline.loc[mask] = activity
        labels_applied += 1
        print(f"  Applied '{activity}' from {start_ts} to {end_ts}")
    except Exception as e:
        print(f"  Error applying label '{activity}': {e}")

print(f"Successfully applied {labels_applied}/{len(subject_labels)} labels")

# Check activity distribution
activity_counts = activity_timeline.value_counts()
print(f"\nActivity distribution in timeline:")
print(activity_counts)

# Find 3-minute windows around the start of each target activity
print(f"\nGenerating {TIME_FRAME_MINUTES}-minute windows around target activity starts...")

window_duration = pd.Timedelta(minutes=TIME_FRAME_MINUTES)
min_duration_samples = int(MIN_ACTIVITY_DURATION_SEC * TARGET_FREQUENCY)

# Find all instances where target activities start
target_activity_starts = []

print(f"Looking for start times of activities: '{FROM_ACTIVITY}' and '{TO_ACTIVITY}'")

# Process each label to find target activity starts
for _, label_row in subject_labels.iterrows():
    if label_row['Label'] in [FROM_ACTIVITY, TO_ACTIVITY]:
        start_time = label_row['Real_Start_Time']
        target_activity_starts.append({
            'activity': label_row['Label'],
            'start_time': start_time,
            'end_time': label_row['Real_End_Time']
        })

print(f"Found {len(target_activity_starts)} instances of target activities")

# Generate 3-minute windows centered around each target activity start
candidate_windows = []
for activity_start in target_activity_starts:
    start_time = activity_start['start_time']
    
    # Create window centered on activity start (1.5 minutes before, 1.5 minutes after)
    window_start = start_time - pd.Timedelta(minutes=TIME_FRAME_MINUTES/2)
    window_end = start_time + pd.Timedelta(minutes=TIME_FRAME_MINUTES/2)
    
    # Ensure window is within our data range
    if window_start >= overall_start and window_end <= overall_end:
        candidate_windows.append({
            'window_start': window_start,
            'window_end': window_end,
            'window_center': start_time,
            'trigger_activity': activity_start['activity'],
            'trigger_start': start_time,
            'trigger_end': activity_start['end_time']
        })

print(f"Generated {len(candidate_windows)} candidate windows")

# Remove duplicate windows (if multiple activities start near each other)
unique_windows = []
for window in candidate_windows:
    # Check if this window overlaps significantly with any existing window
    is_duplicate = False
    for existing_window in unique_windows:
        # Calculate overlap
        overlap_start = max(window['window_start'], existing_window['window_start'])
        overlap_end = min(window['window_end'], existing_window['window_end'])
        
        if overlap_start < overlap_end:
            overlap_duration = overlap_end - overlap_start
            # If overlap is more than 50% of window duration, consider it a duplicate
            if overlap_duration > window_duration * 0.5:
                is_duplicate = True
                break
    
    if not is_duplicate:
        unique_windows.append(window)

print(f"After removing duplicates: {len(unique_windows)} unique windows")

# For each unique window, check if it contains both target activities with sufficient duration
valid_windows = []

for window_idx, window in enumerate(unique_windows):
    window_start = window['window_start']
    window_end = window['window_end']
    
    # Get activity data for this window
    window_mask = (activity_timeline.index >= window_start) & (activity_timeline.index < window_end)
    window_activities = activity_timeline[window_mask]
    
    if len(window_activities) == 0:
        continue
    
    # Count duration of each activity in this window
    activity_durations = {}
    current_activity = None
    current_start_idx = 0
    
    for i, activity in enumerate(window_activities.values):
        if activity != current_activity:
            # Activity changed, record previous activity duration
            if current_activity is not None:
                duration_samples = i - current_start_idx
                if current_activity not in activity_durations:
                    activity_durations[current_activity] = 0
                activity_durations[current_activity] += duration_samples
            
            current_activity = activity
            current_start_idx = i
    
    # Record the last activity segment
    if current_activity is not None:
        duration_samples = len(window_activities) - current_start_idx
        if current_activity not in activity_durations:
            activity_durations[current_activity] = 0
        activity_durations[current_activity] += duration_samples
    
    # Check if both target activities are present with sufficient duration
    from_activity_duration = activity_durations.get(FROM_ACTIVITY, 0)
    to_activity_duration = activity_durations.get(TO_ACTIVITY, 0)
    
    if (from_activity_duration >= min_duration_samples and 
        to_activity_duration >= min_duration_samples):
        
        # Convert durations back to seconds for reporting
        from_duration_sec = from_activity_duration / TARGET_FREQUENCY
        to_duration_sec = to_activity_duration / TARGET_FREQUENCY
        
        # Create activity timeline for this specific window
        window_activity_timeline = window_activities.copy()
        
        valid_windows.append({
            'window_start': window_start,
            'window_end': window_end,
            'window_center': window['window_center'],
            'trigger_activity': window['trigger_activity'],
            'from_activity_duration': from_duration_sec,
            'to_activity_duration': to_duration_sec,
            'total_labeled_duration': sum(activity_durations.values()) / TARGET_FREQUENCY,
            'activity_counts': dict(activity_durations),
            'window_idx': len(valid_windows),
            'activity_timeline': window_activity_timeline  # Store per-window timeline
        })
        
        print(f"  Valid window {len(valid_windows)}: {window_start} to {window_end}")
        print(f"    - Triggered by: {window['trigger_activity']} at {window['trigger_start']}")
        print(f"    - {FROM_ACTIVITY}: {from_duration_sec:.1f}s")
        print(f"    - {TO_ACTIVITY}: {to_duration_sec:.1f}s")

print(f"\nFound {len(valid_windows)} windows containing both '{FROM_ACTIVITY}' and '{TO_ACTIVITY}'")
print(f"Window generation approach: Centered on target activity starts (much faster than exhaustive search)")

if len(valid_windows) == 0:
    print(f"\nNo windows found containing both activities!")
    print(f"Requirements:")
    print(f"  - Window size: {TIME_FRAME_MINUTES} minutes")
    print(f"  - Minimum duration for {FROM_ACTIVITY}: {MIN_ACTIVITY_DURATION_SEC}s")
    print(f"  - Minimum duration for {TO_ACTIVITY}: {MIN_ACTIVITY_DURATION_SEC}s")
    
    print(f"\nActivity distribution across all data:")
    activity_counts = activity_timeline.value_counts()
    for activity, count in activity_counts.items():
        duration_sec = count / TARGET_FREQUENCY
        print(f"  {activity}: {duration_sec:.1f}s ({count} samples)")
    
    print(f"\nTarget activity start times found:")
    for activity_start in target_activity_starts:
        print(f"  {activity_start['activity']}: {activity_start['start_time']}")
    
    print(f"\nTips:")
    print(f"  1. Reduce TIME_FRAME_MINUTES if activities don't co-occur in {TIME_FRAME_MINUTES}-minute windows")
    print(f"  2. Reduce MIN_ACTIVITY_DURATION_SEC if activities are too brief")
    print(f"  3. Check if the activity names match those in your labels")
    print(f"  4. Verify time corrections are properly aligning data")
    
    raise ValueError("No valid windows found for the specified activities")
else:
    print(f"\nWindow summary:")
    total_from_duration = sum(w['from_activity_duration'] for w in valid_windows)
    total_to_duration = sum(w['to_activity_duration'] for w in valid_windows)
    
    print(f"  - Total {FROM_ACTIVITY} duration: {total_from_duration:.1f}s")
    print(f"  - Total {TO_ACTIVITY} duration: {total_to_duration:.1f}s")
    print(f"  - Average {FROM_ACTIVITY} per window: {total_from_duration/len(valid_windows):.1f}s")
    print(f"  - Average {TO_ACTIVITY} per window: {total_to_duration/len(valid_windows):.1f}s")
    
    print(f"\nFirst 5 windows:")
    for i, window in enumerate(valid_windows[:5]):
        print(f"  {i+1}. {window['window_start']} - {window['window_end']} (triggered by {window['trigger_activity']})")
    if len(valid_windows) > 5:
        print(f"  ... and {len(valid_windows) - 5} more")

# Store the results for later use
activity_timeline_data = activity_timeline
analysis_windows = valid_windows

Searching for 3.0-minute windows containing both 'Self Propulsion' and 'Resting'...
Overall data time range: 2024-02-15 11:05:30 to 2024-02-17 08:49:23
Duration: 1 days 21:43:53
Created timeline with 4115826 time points at 25 Hz
Applying 640 labels to timeline...
  Applied 'Conversation' from 2024-02-15 16:21:00 to 2024-02-15 16:22:05
  Applied 'Eating' from 2024-02-15 16:22:07 to 2024-02-15 16:22:11
  Applied 'Self Propulsion' from 2024-02-15 16:22:13 to 2024-02-15 16:22:18
  Applied 'Self Propulsion' from 2024-02-15 16:22:39 to 2024-02-15 16:22:46
  Applied 'Self Propulsion' from 2024-02-15 16:23:23 to 2024-02-15 16:23:38
  Applied 'Self Propulsion' from 2024-02-15 16:23:45 to 2024-02-15 16:23:58
  Applied 'Self Propulsion' from 2024-02-15 16:23:45 to 2024-02-15 16:23:58
  Applied 'Using Computer' from 2024-02-15 16:22:48 to 2024-02-15 16:23:15
  Applied 'Using Phone' from 2024-02-15 16:24:50 to 2024-02-15 16:28:49
  Applied 'Using Phone' from 2024-02-15 16:31:39 to 2024-02-15 16:32:

## 4. Prepare Sensor Data for Window Analysis

In [7]:
def load_shifted_sensor_data(subject_id, sensor_name, start_time, end_time, cfg, sync_params):
    """
    Load raw sensor data for a specific time window with only timestamp shifting applied.
    No drift correction is performed.
    """
    print(f"Loading {sensor_name} data from {start_time} to {end_time}")
    
    # Get configuration
    raw_data_base_dir = os.path.join(project_root, cfg.get('raw_data_input_dir'))
    subject_dir = os.path.join(raw_data_base_dir, subject_id)
    raw_data_parsing_config = cfg.get('raw_data_parsing_config', {})
    sensor_settings = raw_data_parsing_config.get(sensor_name)
    subject_correction_params = sync_params.get(subject_id, {})
    sensor_corr_params = subject_correction_params.get(sensor_name, {})
    
    if not sensor_settings:
        print(f"No parsing config found for sensor {sensor_name}")
        return pd.DataFrame()
    
    # Load raw data
    loader = select_data_loader(sensor_name) 
    df_raw = loader(subject_dir, sensor_name, sensor_settings)
    
    if df_raw.empty or 'time' not in df_raw.columns:
        print(f"No raw data loaded for {sensor_name}")
        return pd.DataFrame()
    
    # Apply only timestamp shifting (no drift correction)
    try:
        time_unit = sensor_corr_params.get('unit', 's')
        time_col_num = df_raw['time'].astype(float)
        
        # Convert to seconds if needed
        if time_unit == 'ms':
            time_col_num = time_col_num / 1000.0
        
        # Apply shift only
        shift_val = sensor_corr_params.get('shift', 0)
        time_col_final = time_col_num + shift_val
        
        # Convert to datetime
        corrected_timestamps = pd.to_datetime(time_col_final, unit='s', errors='coerce')
        df_raw['corrected_time'] = corrected_timestamps
        df_raw = df_raw.dropna(subset=['corrected_time'])
        
        # Filter by time window
        time_mask = (df_raw['corrected_time'] >= start_time) & (df_raw['corrected_time'] <= end_time)
        df_filtered = df_raw[time_mask].copy()
        
        if df_filtered.empty:
            print(f"No data in time window for {sensor_name}")
            return pd.DataFrame()
        
        # Set corrected time as index and remove original time column
        df_filtered = df_filtered.set_index('corrected_time').sort_index()
        if 'time' in df_filtered.columns:
            df_filtered = df_filtered.drop(columns=['time'])
        
        # Apply column renaming
        df_renamed = modify_modality_names(df_filtered.copy(), sensor_name)
        if isinstance(df_renamed, tuple):
            df_renamed = df_renamed[1]  # Extract DataFrame from tuple if returned
        
        print(f"Loaded {len(df_renamed)} samples for {sensor_name}")
        return df_renamed
        
    except Exception as e:
        print(f"Error processing {sensor_name}: {e}")
        return pd.DataFrame()

# Resample and combine all sensor data if needed
print("Preparing sensor data for analysis...")

combined_sensor_data = pd.DataFrame()
feature_columns = []

if DOWNSAMPLE_TO_TARGET_FREQ:
    print(f"Resampling all sensors to {TARGET_FREQUENCY} Hz...")
    
    # Create unified time index based on filtered sensor data
    unified_index = pd.date_range(
        start=overall_start, 
        end=overall_end, 
        freq=f"{1/TARGET_FREQUENCY}S"
    )
    
    resampled_sensors = {}
    for sensor_name, sensor_data in processed_sensors.items():
        print(f"  Resampling {sensor_name}...")
        
        # Resample to target frequency
        resampled_data = sensor_data.resample(f"{1/TARGET_FREQUENCY}S").mean()
        
        # Reindex to unified timeline
        resampled_data = resampled_data.reindex(unified_index)
        
        # Handle missing data
        resampled_data = resampled_data.ffill(limit=int(TARGET_FREQUENCY*2))
        resampled_data = resampled_data.bfill(limit=int(TARGET_FREQUENCY*2))
        resampled_data = resampled_data.fillna(0)
        
        resampled_sensors[sensor_name] = resampled_data
        feature_columns.extend(resampled_data.columns.tolist())
        
        print(f"    Shape after resampling: {resampled_data.shape}")
    
    # Combine all sensor data
    if resampled_sensors:
        combined_sensor_data = pd.concat(resampled_sensors.values(), axis=1)
        print(f"Combined resampled data shape: {combined_sensor_data.shape}")
    
    processed_sensors_final = resampled_sensors
    data_index = unified_index

else:
    print("Using original sampling rates (no resampling)")
    processed_sensors_final = processed_sensors
    
    # Use the sensor with most data as reference
    reference_sensor = max(processed_sensors.keys(), key=lambda k: len(processed_sensors[k]))
    data_index = processed_sensors[reference_sensor].index
    print(f"Using {reference_sensor} as reference timeline")
    
    for sensor_name, sensor_data in processed_sensors.items():
        feature_columns.extend(sensor_data.columns.tolist())

# Apply filtering if enabled
if APPLY_FILTERING and not combined_sensor_data.empty:
    print("Applying lowpass filtering...")
    
    filter_params = cfg.get('filter_parameters', {})
    highcut = filter_params.get('highcut_kinematic', 9.9)
    filter_order = filter_params.get('filter_order', 4)
    
    # Create filter
    sos = butter_lowpass_sos(highcut, TARGET_FREQUENCY, filter_order)
    
    if sos is not None:
        # Apply filter to all numeric columns
        numeric_cols = combined_sensor_data.select_dtypes(include=[np.number]).columns.tolist()
        print(f"Filtering {len(numeric_cols)} numeric columns...")
        
        combined_sensor_data = apply_filter_combined(combined_sensor_data, sos, numeric_cols)
        print("Filtering completed")
    else:
        print("Could not create filter - skipping filtering")

print(f"\nFinal processed data summary:")
print(f"  Time range: {data_index.min()} to {data_index.max()}")
print(f"  Duration: {data_index.max() - data_index.min()}")
print(f"  Sampling points: {len(data_index)}")
if not combined_sensor_data.empty:
    print(f"  Combined data shape: {combined_sensor_data.shape}")
print(f"  Total feature columns: {len(feature_columns)}")

def get_sensor_data_for_window(start_time, end_time):
    """
    Get sensor data for a specific time window.
    Returns a dictionary with sensor data for the specified time range.
    """
    sensor_data_window = {}
    
    for sensor_name, sensor_data in processed_sensors_final.items():
        # Filter data for the time window
        time_mask = (sensor_data.index >= start_time) & (sensor_data.index <= end_time)
        windowed_data = sensor_data[time_mask].copy()
        
        if not windowed_data.empty:
            sensor_data_window[sensor_name] = windowed_data
            print(f"  {sensor_name}: {len(windowed_data)} samples")
    
    return sensor_data_window

# Test the function with the first window if available
if len(analysis_windows) > 0:
    test_window = analysis_windows[0]
    test_start = test_window['window_start']
    test_end = test_window['window_end']
    
    print(f"\nTesting data extraction for first window:")
    print(f"Window: {test_start} to {test_end}")
    test_data = get_sensor_data_for_window(test_start, test_end)
    print(f"Retrieved data for {len(test_data)} sensors")
else:
    print(f"\nNo analysis windows available for testing")

Preparing sensor data for analysis...
Resampling all sensors to 25 Hz...
  Resampling corsano_wrist...
    Shape after resampling: (4115826, 3)
  Resampling cosinuss_ear...
    Shape after resampling: (4115826, 3)
  Resampling cosinuss_ear...
    Shape after resampling: (4115826, 3)
  Resampling mbient_acc...
    Shape after resampling: (4115826, 3)
  Resampling mbient_acc...
    Shape after resampling: (4115826, 3)
  Resampling mbient_gyro...
    Shape after resampling: (4115826, 3)
  Resampling mbient_gyro...
    Shape after resampling: (4115826, 3)
  Resampling vivalnk_acc...
    Shape after resampling: (4115826, 3)
  Resampling vivalnk_acc...
    Shape after resampling: (4115826, 3)
  Resampling sensomative_bottom...
    Shape after resampling: (4115826, 3)
  Resampling sensomative_bottom...
    Shape after resampling: (4115826, 11)
  Resampling corsano_bioz...
    Shape after resampling: (4115826, 11)
  Resampling corsano_bioz...
    Shape after resampling: (4115826, 3)
    Shape 

2025-06-23 23:48:38,400 - INFO - Applying filter column by column IN-PLACE to 29 columns...


Filtering 29 numeric columns...
Filtering completed

Final processed data summary:
  Time range: 2024-02-15 11:05:30 to 2024-02-17 08:49:23
  Duration: 1 days 21:43:53
  Sampling points: 4115826
  Combined data shape: (4115826, 29)
  Total feature columns: 29

Testing data extraction for first window:
Window: 2024-02-15 16:38:15 to 2024-02-15 16:41:15
  corsano_wrist: 4501 samples
  cosinuss_ear: 4501 samples
  mbient_acc: 4501 samples
  mbient_gyro: 4501 samples
  vivalnk_acc: 4501 samples
  sensomative_bottom: 4501 samples
  corsano_bioz: 4501 samples
Retrieved data for 7 sensors
Filtering completed

Final processed data summary:
  Time range: 2024-02-15 11:05:30 to 2024-02-17 08:49:23
  Duration: 1 days 21:43:53
  Sampling points: 4115826
  Combined data shape: (4115826, 29)
  Total feature columns: 29

Testing data extraction for first window:
Window: 2024-02-15 16:38:15 to 2024-02-15 16:41:15
  corsano_wrist: 4501 samples
  cosinuss_ear: 4501 samples
  mbient_acc: 4501 samples
  m

## 5. Generate PDF with Transition Plots

In [8]:
def create_activity_windows_pdf(analysis_windows, subject_id, from_activity, to_activity, 
                               processed_sensors, activity_timeline, output_filename):
    """
    Create PDF with activity window plots using efficient per-window approach.
    Each window already contains its own activity timeline.
    """
    def get_sensor_data_for_window(window_start, window_end):
        """Extract sensor data for a specific time window"""
        sensor_data_window = {}
        
        for sensor_name, sensor_data in processed_sensors.items():
            # Filter data to the window timeframe
            mask = (sensor_data.index >= window_start) & (sensor_data.index <= window_end)
            windowed_data = sensor_data[mask]
            
            if not windowed_data.empty:
                sensor_data_window[sensor_name] = windowed_data
        
        return sensor_data_window
    
    # Define a color palette for different activities
    def get_activity_color(activity, activity_list, target_activities):
        """Get color for activity - special colors for targets, others get unique colors"""
        if activity == target_activities[0]:  # FROM_ACTIVITY
            return 'blue', 0.3, 8  # color, alpha, linewidth
        elif activity == target_activities[1]:  # TO_ACTIVITY
            return 'green', 0.3, 8
        elif activity == 'Unknown':
            return 'lightgray', 0.15, 4
        else:
            # Use a color palette for other activities
            colors = ['orange', 'purple', 'brown', 'pink', 'olive', 'cyan', 'red', 'yellow']
            try:
                # Get index of activity in the unique list (excluding targets and Unknown)
                other_activities = [a for a in activity_list if a not in target_activities and a != 'Unknown']
                if activity in other_activities:
                    color_idx = other_activities.index(activity) % len(colors)
                    return colors[color_idx], 0.25, 6
                else:
                    return 'gray', 0.2, 4
            except:
                return 'gray', 0.2, 4
    
    def get_background_color(activity, target_activities):
        """Get background color for activity spans"""
        if activity == target_activities[0]:  # FROM_ACTIVITY
            return 'blue', 0.1
        elif activity == target_activities[1]:  # TO_ACTIVITY
            return 'green', 0.1
        elif activity == 'Unknown':
            return 'lightgray', 0.03
        else:
            # Use lighter colors for background
            colors = ['orange', 'purple', 'brown', 'pink', 'olive', 'cyan', 'red', 'yellow']
            color_idx = hash(activity) % len(colors)  # Simple hash-based color assignment
            return colors[color_idx], 0.06
    
    # Create PDF
    pdf_path = os.path.join(project_root, output_filename)
    
    with PdfPages(pdf_path) as pdf:
        for window_idx, window in enumerate(analysis_windows):
            print(f"Processing window {window_idx + 1}/{len(analysis_windows)}")
            
            window_start = window['window_start']
            window_end = window['window_end']
            window_center = window['window_center']
            trigger_activity = window.get('trigger_activity', 'Unknown')
            
            # Get sensor data for this time window
            sensor_data_window = get_sensor_data_for_window(window_start, window_end)
            
            # Use the per-window activity timeline stored in the window
            activity_window = window.get('activity_timeline', pd.Series(dtype=object))
            
            if not sensor_data_window:
                print(f"No sensor data available for window {window_idx + 1}")
                continue
            
            # Create plot
            n_sensors = len(sensor_data_window)
            fig, axes = plt.subplots(n_sensors + 1, 1, figsize=(15, 3*(n_sensors + 1)))
            if n_sensors == 0:
                axes = [axes]
            
            # Create title with window information
            window_duration = window_end - window_start
            from_dur = window['from_activity_duration']
            to_dur = window['to_activity_duration']
            
            fig.suptitle(f'Window {window_idx + 1}: {from_activity} & {to_activity}\\n'
                        f'Subject: {subject_id} | Duration: {window_duration}\\n'
                        f'Triggered by: {trigger_activity} | Center: {window_center}\\n'
                        f'{from_activity}: {from_dur:.1f}s | {to_activity}: {to_dur:.1f}s', 
                        fontsize=14, y=0.98)
            
            # Plot activity timeline first
            ax_activity = axes[0]
            if not activity_window.empty:
                # Create numeric representation of activities for plotting
                unique_activities = activity_window.unique()
                activity_to_num = {act: i for i, act in enumerate(unique_activities)}
                numeric_activities = [activity_to_num[act] for act in activity_window.values]
                
                # Plot as step function
                ax_activity.step(activity_window.index, numeric_activities, where='post', linewidth=2)
                ax_activity.set_yticks(range(len(unique_activities)))
                ax_activity.set_yticklabels(unique_activities)
                ax_activity.set_ylabel('Activity')
                ax_activity.set_title(f'Activity Timeline (Triggered by {trigger_activity})')
                ax_activity.grid(True, alpha=0.3)
                
                # Highlight ALL activities present in the window
                target_activities = [from_activity, to_activity]
                for i, act in enumerate(unique_activities):
                    color, alpha, linewidth = get_activity_color(act, unique_activities, target_activities)
                    
                    # Create label with special marking for target activities
                    if act == from_activity:
                        label = f'{act} ⭐ (Target 1)'
                    elif act == to_activity:
                        label = f'{act} ⭐ (Target 2)'
                    elif act == trigger_activity and act not in target_activities:
                        label = f'{act} 🎯 (Trigger)'
                    else:
                        label = f'{act}'
                    
                    ax_activity.axhline(y=i, color=color, alpha=alpha, linewidth=linewidth, label=label)
                
                # Mark window boundaries and center
                ax_activity.axvline(x=window_start, color='red', linestyle='-', linewidth=2, alpha=0.8, label='Window Start')
                ax_activity.axvline(x=window_end, color='red', linestyle='-', linewidth=2, alpha=0.8, label='Window End')
                ax_activity.axvline(x=window_center, color='orange', linestyle='--', linewidth=2, alpha=0.8, label='Window Center')
                
                # Create legend with better formatting
                ax_activity.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize='small', ncol=1)
            else:
                ax_activity.text(0.5, 0.5, 'No activity data for this window', 
                               ha='center', va='center', transform=ax_activity.transAxes)
                ax_activity.set_title('Activity Timeline - No data')
            
            # Plot sensor data
            for sensor_idx, (sensor_name, data) in enumerate(sensor_data_window.items()):
                ax = axes[sensor_idx + 1]
                
                # Plot all numeric columns for this sensor
                numeric_cols = data.select_dtypes(include=[np.number]).columns
                
                if len(numeric_cols) == 0:
                    ax.text(0.5, 0.5, f'No numeric data for {sensor_name}', 
                           ha='center', va='center', transform=ax.transAxes)
                    ax.set_title(f'{sensor_name} - No Data')
                    continue
                
                # Plot each column
                for col in numeric_cols:
                    ax.plot(data.index, data[col], label=col, alpha=0.7, linewidth=1)
                
                # Mark window boundaries and center
                ax.axvline(x=window_start, color='red', linestyle='-', linewidth=2, alpha=0.8, label='Window Start')
                ax.axvline(x=window_end, color='red', linestyle='-', linewidth=2, alpha=0.8, label='Window End')
                ax.axvline(x=window_center, color='orange', linestyle='--', linewidth=2, alpha=0.8, label='Window Center')
                
                # Add background colors for ALL activities using the per-window timeline
                if not activity_window.empty:
                    # Find activity changes in the window
                    activity_changes = []
                    current_activity = None
                    
                    for timestamp, activity in activity_window.items():
                        if activity != current_activity:
                            activity_changes.append((timestamp, activity))
                            current_activity = activity
                    
                    # Add background colors for each activity segment
                    target_activities = [from_activity, to_activity]
                    for i, (start_time, activity_label) in enumerate(activity_changes):
                        end_time = activity_changes[i+1][0] if i+1 < len(activity_changes) else window_end
                        
                        # Get background color for this activity
                        color, alpha = get_background_color(activity_label, target_activities)
                        ax.axvspan(start_time, end_time, alpha=alpha, color=color)
                
                ax.set_title(f'{sensor_name} ({len(numeric_cols)} channels)')
                ax.set_ylabel('Value')
                ax.grid(True, alpha=0.3)
                
                # Only show legend if not too many columns
                if len(numeric_cols) <= 4:
                    ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize='small')
                
                # Set time limits
                ax.set_xlim(window_start, window_end)
            
            # Set x-label only on bottom subplot
            if len(axes) > 0:
                axes[-1].set_xlabel('Time')
            
            plt.tight_layout(rect=[0, 0.03, 0.85, 0.95])
            pdf.savefig(fig, bbox_inches='tight')
            plt.close(fig)
    
    print(f"PDF saved to: {pdf_path}")
    return pdf_path

# Generate the PDF using the efficient windowing approach
print(f"\\nGenerating PDF with {len(analysis_windows)} activity window plots...")
print(f"Using efficient per-window approach (each window contains its own activity timeline)")
print(f"ALL present activities will be highlighted and color-coded in the plots")

pdf_output_path = create_activity_windows_pdf(
    analysis_windows, SUBJECT_ID, FROM_ACTIVITY, TO_ACTIVITY,
    processed_sensors_final, activity_timeline_data, OUTPUT_PDF_NAME
)

print(f"\\nPDF generation complete!")
print(f"Output file: {pdf_output_path}")
print(f"Total pages: {len(analysis_windows)}")

# Display processing summary
print(f"\\n=== PROCESSING SUMMARY ===")
print(f"Subject: {SUBJECT_ID}")
print(f"Sensors processed: {len(processed_sensors_final)}")
print(f"Windowing approach: Label-centered (efficient)")
print(f"Activity visualization: ALL present activities highlighted")
print(f"Time corrections applied:")
print(f"  - Sensor shift: {'✓' if ENABLE_SENSOR_TIME_SHIFT else '✗'}")
print(f"  - Sensor drift: {'✓' if ENABLE_SENSOR_DRIFT_CORRECTION else '✗'}")
print(f"  - Label time: {'✓' if ENABLE_LABEL_TIME_CORRECTION else '✗'}")
print(f"Signal processing:")
print(f"  - Resampling: {'✓' if DOWNSAMPLE_TO_TARGET_FREQ else '✗'} ({TARGET_FREQUENCY} Hz)")
print(f"  - Filtering: {'✓' if APPLY_FILTERING else '✗'}")
print(f"Window analysis:")
print(f"  - Window size: {TIME_FRAME_MINUTES} minutes")
print(f"  - Min activity duration: {MIN_ACTIVITY_DURATION_SEC}s")
print(f"  - Activities: {FROM_ACTIVITY} & {TO_ACTIVITY}")
print(f"  - Windows found: {len(analysis_windows)}")
print(f"  - Approach: Windows centered on target activity starts")

if len(analysis_windows) > 0:
    print(f"\\nWindow summary:")
    for i, window in enumerate(analysis_windows[:3]):  # Show first 3 windows
        print(f"  {i+1}. {window['window_start']} to {window['window_end']}")
        print(f"     - Triggered by: {window['trigger_activity']}")
        print(f"     - {FROM_ACTIVITY}: {window['from_activity_duration']:.1f}s")
        print(f"     - {TO_ACTIVITY}: {window['to_activity_duration']:.1f}s")
    if len(analysis_windows) > 3:
        print(f"  ... and {len(analysis_windows) - 3} more windows")

print(f"\\n=== ACTIVITY VISUALIZATION FEATURES ===")
print(f"✨ Enhanced activity highlighting:")
print(f"  🎯 Target activities: {FROM_ACTIVITY} (blue) & {TO_ACTIVITY} (green)")
print(f"  🎪 Other activities: Unique colors (orange, purple, brown, etc.)")
print(f"  ⭐ Special markers: Target activities marked with stars")
print(f"  🎯 Trigger indicator: Activity that triggered the window")
print(f"  📊 Background colors: All activity periods shown with colored backgrounds")
print(f"  🔍 Legend: Complete activity list with clear identification")

\nGenerating PDF with 20 activity window plots...
Using efficient per-window approach (each window contains its own activity timeline)
ALL present activities will be highlighted and color-coded in the plots
Processing window 1/20
Processing window 2/20
Processing window 2/20
Processing window 3/20
Processing window 3/20
Processing window 4/20
Processing window 4/20
Processing window 5/20
Processing window 5/20
Processing window 6/20
Processing window 6/20
Processing window 7/20
Processing window 7/20
Processing window 8/20
Processing window 8/20
Processing window 9/20
Processing window 9/20
Processing window 10/20
Processing window 10/20
Processing window 11/20
Processing window 11/20
Processing window 12/20
Processing window 12/20
Processing window 13/20
Processing window 13/20
Processing window 14/20
Processing window 14/20
Processing window 15/20
Processing window 15/20
Processing window 16/20
Processing window 16/20
Processing window 17/20
Processing window 17/20
Processing window 

## 6. Summary and Analysis

In [9]:
# Provide comprehensive analysis summary
print("=== COMPREHENSIVE ANALYSIS SUMMARY ===")
print(f"Subject: {SUBJECT_ID}")
print(f"Activities analyzed: {FROM_ACTIVITY} & {TO_ACTIVITY}")

if not subject_labels.empty:
    print(f"\\nLabel information:")
    print(f"  - Labels loaded: {len(subject_labels)}")
    print(f"  - Label time range: {subject_labels['Real_Start_Time'].min()} to {subject_labels['Real_End_Time'].max()}")
    print(f"  - Unique activities: {sorted(subject_labels['Label'].unique())}")

if processed_sensors_final:
    print(f"\\nSensor data processing:")
    print(f"  - Sensors processed: {len(processed_sensors_final)}")
    for sensor_name, sensor_data in processed_sensors_final.items():
        print(f"    * {sensor_name}: {sensor_data.shape[0]} samples, {sensor_data.shape[1]} features")
    
    print(f"  - Overall time range: {overall_start} to {overall_end}")
    print(f"  - Total duration: {overall_end - overall_start}")

print(f"\\nTime corrections applied:")
print(f"  - Sensor timestamp shift: {'ENABLED' if ENABLE_SENSOR_TIME_SHIFT else 'DISABLED'}")
print(f"  - Sensor drift correction: {'ENABLED' if ENABLE_SENSOR_DRIFT_CORRECTION else 'DISABLED'}")
print(f"  - Label time correction: {'ENABLED' if ENABLE_LABEL_TIME_CORRECTION else 'DISABLED'}")

print(f"\\nSignal processing:")
print(f"  - Target frequency: {TARGET_FREQUENCY} Hz")
print(f"  - Resampling: {'ENABLED' if DOWNSAMPLE_TO_TARGET_FREQ else 'DISABLED'}")
print(f"  - Lowpass filtering: {'ENABLED' if APPLY_FILTERING else 'DISABLED'}")

print(f"\\nWindow analysis:")
print(f"  - Window size: {TIME_FRAME_MINUTES} minutes")
print(f"  - Window overlap: {OVERLAP_MINUTES} minutes")
print(f"  - Min activity duration: {MIN_ACTIVITY_DURATION_SEC} seconds")
print(f"  - Activity timeline points: {len(activity_timeline_data) if 'activity_timeline_data' in locals() else 'N/A'}")
print(f"  - Valid windows found: {len(analysis_windows)}")

if len(analysis_windows) > 0:
    print(f"\\nWindow details:")
    for i, window in enumerate(analysis_windows):
        print(f"  {i+1}. {window['window_start']} to {window['window_end']}")
        print(f"     - {FROM_ACTIVITY}: {window['from_activity_duration']:.1f}s")
        print(f"     - {TO_ACTIVITY}: {window['to_activity_duration']:.1f}s")
    
    # Calculate statistics
    from_durations = [w['from_activity_duration'] for w in analysis_windows]
    to_durations = [w['to_activity_duration'] for w in analysis_windows]
    
    print(f"\\nActivity duration statistics:")
    print(f"  {FROM_ACTIVITY}:")
    print(f"    - Mean: {np.mean(from_durations):.1f}s")
    print(f"    - Min: {np.min(from_durations):.1f}s")
    print(f"    - Max: {np.max(from_durations):.1f}s")
    print(f"  {TO_ACTIVITY}:")
    print(f"    - Mean: {np.mean(to_durations):.1f}s")
    print(f"    - Min: {np.min(to_durations):.1f}s")
    print(f"    - Max: {np.max(to_durations):.1f}s")

print(f"\\nOutput:")
print(f"  PDF file: {pdf_output_path}")
print(f"  Pages generated: {len(analysis_windows)}")

print(f"\\n=== ANALYSIS COMPLETE ===")

# Instructions for next steps
print("\\n=== CONFIGURATION OPTIONS ===")
print("To modify the analysis, change these variables in the configuration cell:")
print("\\n1. Subject and Activities:")
print("   - SUBJECT_ID: Change to analyze different subjects")
print("   - FROM_ACTIVITY_ORIGINAL, TO_ACTIVITY_ORIGINAL: Set the two activities to find together")
print("\\n2. Window Parameters:")
print("   - TIME_FRAME_MINUTES: Size of analysis windows (default: 3 minutes)")
print("   - OVERLAP_MINUTES: Overlap between windows (default: 1 minute)")
print("   - MIN_ACTIVITY_DURATION_SEC: Minimum time each activity must be present")
print("\\n3. Time Corrections (True/False):")
print("   - ENABLE_SENSOR_TIME_SHIFT: Apply sensor timestamp shifts")
print("   - ENABLE_SENSOR_DRIFT_CORRECTION: Apply sensor drift correction")
print("   - ENABLE_LABEL_TIME_CORRECTION: Apply label time shifts")
print("\\n4. Processing Options (True/False):")
print("   - DOWNSAMPLE_TO_TARGET_FREQ: Resample to unified frequency")
print("   - APPLY_FILTERING: Apply lowpass filtering")
print("   - TARGET_FREQUENCY: Set target sampling frequency (Hz)")

print("\\n=== TROUBLESHOOTING ===")
if len(analysis_windows) == 0:
    print("No valid windows found? Try:")
    print("  1. Reduce TIME_FRAME_MINUTES - activities might not co-occur in long windows")
    print("  2. Reduce MIN_ACTIVITY_DURATION_SEC - activities might be brief")
    print("  3. Increase OVERLAP_MINUTES - catch activities at window boundaries")
    print("  4. Check if activity names match those in your labels")
    print("  5. Verify time corrections are properly aligning sensor and label data")
elif 'Unknown' in activity_timeline_data.unique():
    unknown_pct = (activity_timeline_data == 'Unknown').sum() / len(activity_timeline_data) * 100
    if unknown_pct > 50:
        print(f"Warning: {unknown_pct:.1f}% of timeline is 'Unknown'")
        print("  - Check label coverage and time alignment")
        print("  - Verify time corrections are working properly")

print("\\n=== INTERPRETATION ===")
print("Each PDF page shows:")
print("  1. Activity timeline - when each activity occurs in the 3-minute window")
print("  2. Sensor data - all sensor channels during the window")
print("  3. Colored backgrounds - blue for first activity, green for second activity")
print("  4. Vertical lines - window boundaries and center point")
print("\\nUse this to:")
print("  - Verify both activities are properly detected in the window")
print("  - Examine sensor patterns when both activities are present")
print("  - Check for labeling accuracy and time alignment issues")
print("  - Understand sensor behavior during activity co-occurrence")

=== COMPREHENSIVE ANALYSIS SUMMARY ===
Subject: OutSense-498
Activities analyzed: Self Propulsion & Resting
\nLabel information:
  - Labels loaded: 640
  - Label time range: 2024-02-15 11:08:30 to 2024-02-17 08:46:23
  - Unique activities: ['Assisted Propulsion', 'Changing Clothes', 'Conversation', 'Eating', 'Exercising', 'Resting', 'Self Propulsion', 'Toileting', 'Transfer', 'Using Computer', 'Using Phone', 'Washing Hands', 'beard_hair_styling', 'bending', 'brushing_teeth', 'dark', 'housework', 'lying', 'manipulating', 'preparing_meal', 'putting_toothpast', 'reading', 'rinsing_mouth', 'writing']
\nSensor data processing:
  - Sensors processed: 7
    * corsano_wrist: 4115826 samples, 3 features
    * cosinuss_ear: 4115826 samples, 3 features
    * mbient_acc: 4115826 samples, 3 features
    * mbient_gyro: 4115826 samples, 3 features
    * vivalnk_acc: 4115826 samples, 3 features
    * sensomative_bottom: 4115826 samples, 11 features
    * corsano_bioz: 4115826 samples, 3 features
  - O