# Gait Deterioration Patterns During the 6-Minute Walk Test: Discriminating Fallers from Non-Fallers

**Author:** Ferdinand Delgado, PhD  
**Affiliation:** Move, Measure, Analyze LLC  

---

## Overview

This project examines whether **gait deterioration patterns** during the 6-minute walk test (6MWT) can discriminate between older adults who have experienced falls and those who have not. Rather than relying on overall gait averages, this analysis segments each participant's gait cycles into **temporal quarters** (Q1–Q4) to capture fatigue-related changes that may reveal underlying stability deficits.

**Key Idea:** Fallers may not differ from non-fallers at the start of a walk, but their gait may deteriorate differently as fatigue accumulates. The *pattern of change* across the walk may be more informative than any single metric.

### Analytical Approach
1. Parse raw cycle-by-cycle gait data from IMU-based gait analysis (APDM Mobility Lab)
2. Segment each participant's gait cycles into quarters (Q1–Q4) based on their individual cycle count
3. Engineer features capturing fatigue effects: quarter means, Q4–Q1 differences, linear slope across quarters, and within-quarter variability
4. Compare fallers vs. non-fallers using appropriate statistical tests with assumption checking
5. Identify the most discriminating features and build a parsimonious classification model

### Dataset
- **N = 60** community-dwelling older adults
- **23 fallers** (experienced ≥1 fall in the past year) vs. **37 non-fallers**
- **6-minute walk test** with cycle-by-cycle gait metrics from APDM Mobility Lab wearable sensors
- **34 gait metrics** including speed, cadence, stride length, double support time, asymmetry, trunk/lumbar ROM, and arm swing

In [None]:
import pandas as pd
import numpy as np
from scipy import stats
from scipy.stats import shapiro, levene, mannwhitneyu, ttest_ind
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, auc, classification_report, confusion_matrix
from sklearn.model_selection import LeaveOneOut, cross_val_predict
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

# Plot style
plt.rcParams.update({
    'figure.facecolor': 'white',
    'axes.facecolor': 'white',
    'axes.grid': True,
    'grid.alpha': 0.3,
    'font.family': 'sans-serif',
    'font.size': 11,
    'axes.titlesize': 13,
    'axes.labelsize': 11,
    'figure.dpi': 120,
})

GOLD = '#c5a55a'
BLACK = '#1a1a1a'
GRAY = '#888888'
FALLER_COLOR = '#d64545'
NON_FALLER_COLOR = '#3a86a8'

print("All packages loaded successfully.")

## 1. Data Loading & Preparation

The raw walk trial CSV files from APDM Mobility Lab contain cycle-by-cycle values for each gait metric. Each row is a metric, and each column (after the first five) is an individual gait cycle. Participant demographics and fall history are stored in a separate REDCap export.

In [None]:
# Load demographics
demo = pd.read_csv('data/demographics.csv')
print(f"Participants: {len(demo)}")
print(f"\nFall Status Distribution:")
print(demo['Experienced a fall in past year'].value_counts())
print(f"\nAge: Mean = {demo['Age (years)'].mean():.1f} ± {demo['Age (years)'].std():.1f} years")
print(f"Sex: {demo['Biological Sex'].value_counts().to_dict()}")

In [None]:
def parse_walk_trial_csv(filepath):
    """Parse APDM Mobility Lab walk trial CSV into metadata and cycle-by-cycle data."""
    with open(filepath, 'r', encoding='utf-8-sig') as f:
        lines = f.readlines()
    
    # Extract metadata from header rows
    metadata = {}
    for line in lines[:15]:
        parts = line.strip().split(',')
        if len(parts) >= 2:
            metadata[parts[0].strip('"')] = parts[1].strip('"')
    
    # Find the data header row
    header_row = None
    for i, line in enumerate(lines):
        if line.startswith('"Measure"'):
            header_row = i
            break
    
    if header_row is None:
        raise ValueError("Could not find header row in CSV")
    
    # Parse cycle-by-cycle values for each metric
    data = {}
    for line in lines[header_row + 1:]:
        parts = line.strip().split(',')
        if len(parts) > 5:
            measure = parts[0].strip('"')
            vals = []
            for v in parts[5:]:  # Skip Measure, Normative Mean/SD, Mean, SD
                v = v.strip('"')
                try:
                    vals.append(float(v) if v else np.nan)
                except ValueError:
                    vals.append(np.nan)
            data[measure] = vals
    
    return metadata, data

# Test with one file
meta, data = parse_walk_trial_csv('data/walk_trials/20251212-104206EST_Walk_Trial.csv')
print(f"Subject ID: {meta.get('Subject Public ID')}")
print(f"Condition: {meta.get('Condition')}")
print(f"Gait metrics available: {len(data)}")

# Show an example metric
example_metric = 'Gait - Lower Limb - Gait Speed L (m/s)'
if example_metric in data:
    valid_cycles = [x for x in data[example_metric] if not np.isnan(x)]
    print(f"\n{example_metric}:")
    print(f"  Total gait cycles: {len(valid_cycles)}")
    print(f"  Mean: {np.mean(valid_cycles):.3f} m/s")
    print(f"  Range: {np.min(valid_cycles):.3f} – {np.max(valid_cycles):.3f} m/s")

## 2. Quarters Segmentation

The central methodological contribution of this analysis is segmenting the 6-minute walk into **temporal quarters**. Each participant's total gait cycles are divided into four equal (or near-equal) segments:

- **Q1** — Early walk (fresh, baseline performance)
- **Q2** — Early-mid walk
- **Q3** — Late-mid walk  
- **Q4** — Late walk (fatigued, end-of-test performance)

This is done per-participant since individuals with different cadences produce different numbers of cycles during the fixed 6-minute duration. A participant with 152 cycles gets ~38 per quarter; someone with 100 cycles gets ~25 per quarter.

In [None]:
def split_into_quarters(cycle_data):
    """
    Split cycle-by-cycle data into four temporal quarters.
    
    Each participant's valid gait cycles are divided into 4 equal segments.
    Remainder cycles are distributed to later quarters (Q4 gets extras first),
    reflecting that the end of the walk is where fatigue effects accumulate.
    
    Returns None if fewer than 8 valid cycles (minimum 2 per quarter).
    """
    valid = [x for x in cycle_data if not np.isnan(x)]
    n = len(valid)
    
    if n < 8:
        return None
    
    q_size = n // 4
    remainder = n % 4
    sizes = [q_size] * 4
    for i in range(remainder):
        sizes[3 - i] += 1  # Distribute remainder to later quarters
    
    quarters = []
    start = 0
    for s in sizes:
        quarters.append(np.array(valid[start:start + s]))
        start += s
    
    return quarters


def calculate_quarter_features(quarters):
    """
    Engineer features from quarterly gait data.
    
    For each quarter:
      - Mean, SD, CV (within-quarter variability)
    
    Across quarters:
      - Q4 - Q1 difference (fatigue magnitude)
      - Q4 - Q1 percent change (relative fatigue)
      - Linear slope Q1→Q4 (deterioration rate)
      - Slope R² (linearity of deterioration)
      - Overall mean, SD, CV
    """
    if quarters is None:
        return None
    
    features = {}
    
    # Per-quarter statistics
    for i, q in enumerate(quarters, 1):
        features[f'Q{i}_mean'] = np.mean(q)
        features[f'Q{i}_sd'] = np.std(q, ddof=1)
        features[f'Q{i}_cv'] = (np.std(q, ddof=1) / abs(np.mean(q)) * 100) if np.mean(q) != 0 else np.nan
        features[f'Q{i}_n'] = len(q)
    
    # Fatigue features
    features['Q4_Q1_diff'] = features['Q4_mean'] - features['Q1_mean']
    features['Q4_Q1_pct'] = ((features['Q4_mean'] - features['Q1_mean']) / abs(features['Q1_mean']) * 100) if features['Q1_mean'] != 0 else np.nan
    
    # Linear trend across quarters
    q_means = [features[f'Q{i}_mean'] for i in range(1, 5)]
    slope, intercept, r, p, se = stats.linregress([1, 2, 3, 4], q_means)
    features['slope'] = slope
    features['slope_r2'] = r ** 2
    
    # Overall statistics
    all_data = np.concatenate(quarters)
    features['overall_mean'] = np.mean(all_data)
    features['overall_sd'] = np.std(all_data, ddof=1)
    features['overall_cv'] = (features['overall_sd'] / abs(features['overall_mean']) * 100) if features['overall_mean'] != 0 else np.nan
    features['total_cycles'] = len(all_data)
    
    return features


# Demonstrate on example data
example_cycles = data[example_metric]
quarters = split_into_quarters(example_cycles)
features = calculate_quarter_features(quarters)

print(f"Example: {example_metric}")
print(f"{'─' * 50}")
for i in range(1, 5):
    print(f"  Q{i}: Mean = {features[f'Q{i}_mean']:.3f}, SD = {features[f'Q{i}_sd']:.3f}, n = {features[f'Q{i}_n']:.0f}")
print(f"{'─' * 50}")
print(f"  Q4 − Q1 diff:    {features['Q4_Q1_diff']:+.4f}")
print(f"  Q4 − Q1 change:  {features['Q4_Q1_pct']:+.2f}%")
print(f"  Linear slope:     {features['slope']:+.5f}")
print(f"  Slope R²:         {features['slope_r2']:.3f}")

## 3. Feature Extraction — All Participants

In [None]:
import os
import glob

# Key gait metrics to analyze
KEY_METRICS = [
    'Gait - Lower Limb - Gait Speed L (m/s)',
    'Gait - Lower Limb - Gait Speed R (m/s)',
    'Gait - Lower Limb - Cadence L (steps/min)',
    'Gait - Lower Limb - Cadence R (steps/min)',
    'Gait - Lower Limb - Stride Length L (m)',
    'Gait - Lower Limb - Stride Length R (m)',
    'Gait - Lower Limb - Double Support L (%GCT)',
    'Gait - Lower Limb - Double Support R (%GCT)',
    'Gait - Lower Limb - Single Limb Support L (%GCT)',
    'Gait - Lower Limb - Single Limb Support R (%GCT)',
    'Gait - Lower Limb - Stance L (%GCT)',
    'Gait - Lower Limb - Swing L (%GCT)',
    'Gait - Lower Limb - Step Duration L (s)',
    'Gait - Lower Limb - Gait Cycle Duration L (s)',
    'Gait - Lower Limb - Gait Speed Asymmetry (%Diff)',
    'Gait - Lower Limb - Cadence Asymmetry (%Diff)',
    'Gait - Lower Limb - Double Support Asymmetry (%Diff)',
    'Gait - Lower Limb - Single Limb Support Asymmetry (%Diff)',
    'Gait - Lower Limb - Stride Length Asymmetry (%Diff)',
    'Gait - Lower Limb - Step Duration Asymmetry (%Diff)',
    'Gait - Lower Limb - Foot Strike Angle L (degrees)',
    'Gait - Lower Limb - Foot Strike Angle R (degrees)',
    'Gait - Lower Limb - Toe Off Angle L (degrees)',
    'Gait - Lower Limb - Toe Off Angle R (degrees)',
    'Gait - Lower Limb - Elevation at Midswing L (cm)',
    'Gait - Lower Limb - Circumduction L (cm)',
    'Gait - Lumbar - Coronal Range of Motion (degrees)',
    'Gait - Lumbar - Sagittal Range of Motion (degrees)',
    'Gait - Lumbar - Transverse Range of Motion (degrees)',
    'Gait - Trunk - Coronal Range of Motion (degrees)',
    'Gait - Trunk - Sagittal Range of Motion (degrees)',
    'Gait - Upper Limb - Arm Swing Velocity L (degrees/s)',
    'Gait - Upper Limb - Arm Swing Velocity R (degrees/s)',
    'Gait - Upper Limb - Arm Range of Motion L (degrees)',
]

def shorten_metric_name(metric):
    """Create a concise feature name from the full APDM metric label."""
    short = metric.replace('Gait - Lower Limb - ', 'LL_')
    short = short.replace('Gait - Lumbar - ', 'Lumb_')
    short = short.replace('Gait - Trunk - ', 'Trunk_')
    short = short.replace('Gait - Upper Limb - ', 'UL_')
    for unit in ['(%GCT)', '(m/s)', '(steps/min)', '(degrees/s)', '(degrees)', '(%Diff)', '(cm)', '(m)', '(s)']:
        short = short.replace(f' {unit}', '')
    return short.replace(' ', '_')


# Process all walk trial files
trial_files = sorted(glob.glob('data/walk_trials/*_Walk_Trial.csv'))
print(f"Walk trial files found: {len(trial_files)}")

all_results = []
for fp in trial_files:
    try:
        meta, data = parse_walk_trial_csv(fp)
        sid = int(meta.get('Subject Public ID', 0))
        if sid == 0:
            continue
        
        row = {'Subject_ID': sid}
        
        for metric in KEY_METRICS:
            if metric in data:
                quarters = split_into_quarters(data[metric])
                feats = calculate_quarter_features(quarters)
                if feats:
                    short = shorten_metric_name(metric)
                    for fn, fv in feats.items():
                        row[f'{short}__{fn}'] = fv
        
        all_results.append(row)
    except Exception as e:
        print(f"  Error processing {os.path.basename(fp)}: {e}")

results = pd.DataFrame(all_results)

# Merge with demographics
demo_sub = demo[['Participant ID', 'Age (years)', 'Biological Sex', 'Experienced a fall in past year']].copy()
demo_sub.columns = ['Subject_ID', 'Age', 'Sex', 'Fall_Status']
demo_sub['Faller'] = (demo_sub['Fall_Status'] == 'Yes').astype(int)

results = results.merge(demo_sub, on='Subject_ID', how='left')

feat_cols = [c for c in results.columns if '__' in c]
print(f"\nParticipants processed: {len(results)}")
print(f"  Fallers: {results['Faller'].sum()}")
print(f"  Non-Fallers: {(results['Faller'] == 0).sum()}")
print(f"Features extracted: {len(feat_cols)}")

## 4. Visualization — Gait Trajectories Across Quarters

If fallers deteriorate differently than non-fallers, we should see the groups diverge across Q1 → Q4. Let's visualize this for key metrics.

In [None]:
def plot_quarter_trajectories(results, metrics_to_plot, title_prefix=''):
    """Plot mean ± SE quarter trajectories for fallers vs non-fallers."""
    fallers = results[results['Faller'] == 1]
    non_fallers = results[results['Faller'] == 0]
    
    n_metrics = len(metrics_to_plot)
    n_cols = 3
    n_rows = (n_metrics + n_cols - 1) // n_cols
    
    fig, axes = plt.subplots(n_rows, n_cols, figsize=(14, 4 * n_rows))
    if n_rows == 1:
        axes = axes.reshape(1, -1)
    axes_flat = axes.flatten()
    
    for idx, metric_short in enumerate(metrics_to_plot):
        ax = axes_flat[idx]
        
        for group, color, label in [(fallers, FALLER_COLOR, 'Fallers'), 
                                     (non_fallers, NON_FALLER_COLOR, 'Non-Fallers')]:
            means = []
            ses = []
            for q in range(1, 5):
                col = f'{metric_short}__Q{q}_mean'
                if col in group.columns:
                    vals = group[col].dropna()
                    means.append(vals.mean())
                    ses.append(vals.std() / np.sqrt(len(vals)))
                else:
                    means.append(np.nan)
                    ses.append(np.nan)
            
            x = [1, 2, 3, 4]
            ax.errorbar(x, means, yerr=ses, marker='o', markersize=6,
                       color=color, linewidth=2, capsize=4, label=label)
        
        # Clean metric name for title
        display_name = metric_short.replace('LL_', '').replace('Lumb_', 'Lumbar ').replace('Trunk_', 'Trunk ').replace('UL_', 'Arm ').replace('_', ' ')
        ax.set_title(display_name, fontsize=11, fontweight='500')
        ax.set_xticks([1, 2, 3, 4])
        ax.set_xticklabels(['Q1\n(Fresh)', 'Q2', 'Q3', 'Q4\n(Fatigued)'])
        ax.set_xlabel('')
    
    # Remove empty subplots
    for idx in range(len(metrics_to_plot), len(axes_flat)):
        axes_flat[idx].set_visible(False)
    
    axes_flat[0].legend(framealpha=0.9, fontsize=9)
    fig.suptitle(f'{title_prefix}Gait Quarter Trajectories: Fallers vs Non-Fallers (Mean ± SE)', 
                 fontsize=14, fontweight='600', y=1.02)
    plt.tight_layout()
    plt.savefig('figures/quarter_trajectories.png', dpi=150, bbox_inches='tight')
    plt.show()

# Select key metrics to visualize
key_plot_metrics = [
    'LL_Gait_Speed_L', 'LL_Stride_Length_L', 'LL_Cadence_L',
    'LL_Double_Support_L', 'LL_Gait_Speed_Asymmetry', 'LL_Stride_Length_Asymmetry',
    'Trunk_Sagittal_Range_of_Motion', 'Lumb_Coronal_Range_of_Motion', 'UL_Arm_Swing_Velocity_L',
]

plot_quarter_trajectories(results, key_plot_metrics)

## 5. Statistical Analysis — Group Comparisons

For each extracted feature, we perform:
1. **Normality testing** (Shapiro-Wilk) for both groups
2. **Variance homogeneity** (Levene's test)
3. **Appropriate group comparison**: Student's t (parametric, equal variance), Welch's t (parametric, unequal variance), or Mann-Whitney U (non-parametric)
4. **Effect sizes**: Hedge's g (parametric) or rank-biserial r (non-parametric)
5. **Multiple comparison correction**: Benjamini-Hochberg FDR at α = 0.05

In [None]:
def compare_groups(f_data, nf_data):
    """Run assumption-appropriate statistical comparison between groups."""
    f = f_data.dropna().values
    nf = nf_data.dropna().values
    
    if len(f) < 3 or len(nf) < 3:
        return None
    
    # Assumption checks
    f_shapiro_p = shapiro(f)[1]
    nf_shapiro_p = shapiro(nf)[1]
    levene_p = levene(f, nf)[1]
    
    # Select test based on assumptions
    if f_shapiro_p >= 0.05 and nf_shapiro_p >= 0.05:
        if levene_p >= 0.05:
            stat, p = ttest_ind(f, nf, equal_var=True)
            test = "Student's t"
        else:
            stat, p = ttest_ind(f, nf, equal_var=False)
            test = "Welch's t"
        # Hedge's g
        n1, n2 = len(f), len(nf)
        pooled_sd = np.sqrt(((n1-1)*np.var(f, ddof=1) + (n2-1)*np.var(nf, ddof=1)) / (n1+n2-2))
        d = (np.mean(f) - np.mean(nf)) / pooled_sd if pooled_sd > 0 else 0
        g = d * (1 - 3 / (4*(n1+n2) - 9))  # Hedge's correction
        effect, effect_type = g, "Hedge's g"
    else:
        stat, p = mannwhitneyu(f, nf, alternative='two-sided')
        test = "Mann-Whitney U"
        r = 1 - (2 * stat) / (len(f) * len(nf))
        effect, effect_type = r, "rank-biserial r"
    
    # Effect size interpretation
    es = abs(effect)
    if 'g' in effect_type:
        interp = 'Negligible' if es < 0.2 else 'Small' if es < 0.5 else 'Medium' if es < 0.8 else 'Large'
    else:
        interp = 'Negligible' if es < 0.1 else 'Small' if es < 0.3 else 'Medium' if es < 0.5 else 'Large'
    
    return {
        'test': test, 'statistic': stat, 'p_value': p,
        'effect_size': effect, 'effect_type': effect_type, 'effect_interp': interp,
        'f_shapiro_p': f_shapiro_p, 'nf_shapiro_p': nf_shapiro_p, 'levene_p': levene_p
    }


# Run comparisons on all features
fallers = results[results['Faller'] == 1]
non_fallers = results[results['Faller'] == 0]

comparison_rows = []
for col in feat_cols:
    comp = compare_groups(fallers[col], non_fallers[col])
    if comp:
        comparison_rows.append({
            'Feature': col,
            'Fallers_Mean': fallers[col].mean(),
            'Fallers_SD': fallers[col].std(),
            'NonFallers_Mean': non_fallers[col].mean(),
            'NonFallers_SD': non_fallers[col].std(),
            **comp
        })

comp_df = pd.DataFrame(comparison_rows)

# FDR correction
from scipy.stats import rankdata
p_vals = comp_df['p_value'].values
ranks = rankdata(p_vals)
n_tests = len(p_vals)
fdr_threshold = ranks / n_tests * 0.05
comp_df['FDR_threshold'] = fdr_threshold
comp_df['Sig_uncorrected'] = comp_df['p_value'] < 0.05
comp_df['Sig_FDR'] = comp_df['p_value'] <= comp_df['FDR_threshold']

# Sort by absolute effect size
comp_df['abs_effect'] = comp_df['effect_size'].abs()
comp_df = comp_df.sort_values('abs_effect', ascending=False)

print(f"Features tested: {len(comp_df)}")
print(f"Significant (p < 0.05, uncorrected): {comp_df['Sig_uncorrected'].sum()}")
print(f"Significant (FDR corrected):         {comp_df['Sig_FDR'].sum()}")
print(f"\n{'─' * 90}")
print(f"{'TOP 15 DISCRIMINATING FEATURES (by effect size)':^90}")
print(f"{'─' * 90}")

display_cols = ['Feature', 'effect_size', 'effect_type', 'effect_interp', 'p_value', 'test', 'Sig_uncorrected', 'Sig_FDR']
print(comp_df[display_cols].head(15).to_string(index=False))

## 6. Effect Size Forest Plot

A forest plot of the top discriminating features ranked by effect size — this shows *which* gait characteristics differ most between fallers and non-fallers, and whether those differences are clinically meaningful.

In [None]:
# Forest plot of top features by effect size
top_n = 20
top_features = comp_df.head(top_n).copy()
top_features = top_features.iloc[::-1]  # Reverse for bottom-to-top plotting

fig, ax = plt.subplots(figsize=(10, 8))

# Clean feature names for display
def clean_feature_name(name):
    parts = name.split('__')
    metric = parts[0].replace('LL_', '').replace('Lumb_', 'Lumbar ').replace('Trunk_', 'Trunk ').replace('UL_', 'Arm ').replace('_', ' ')
    feature_type = parts[1] if len(parts) > 1 else ''
    return f"{metric} — {feature_type}"

labels = [clean_feature_name(f) for f in top_features['Feature']]
effects = top_features['effect_size'].values
colors = [FALLER_COLOR if top_features.iloc[i]['Sig_uncorrected'] else GRAY for i in range(len(top_features))]

y_pos = range(len(labels))
ax.barh(y_pos, effects, color=colors, height=0.7, edgecolor='white', linewidth=0.5)
ax.set_yticks(y_pos)
ax.set_yticklabels(labels, fontsize=8)
ax.axvline(x=0, color=BLACK, linewidth=0.8)
ax.set_xlabel('Effect Size', fontsize=11)
ax.set_title(f'Top {top_n} Discriminating Features: Fallers vs Non-Fallers', fontsize=13, fontweight='600')

# Add legend
from matplotlib.patches import Patch
legend_elements = [Patch(facecolor=FALLER_COLOR, label='p < 0.05'),
                   Patch(facecolor=GRAY, label='p ≥ 0.05')]
ax.legend(handles=legend_elements, loc='lower right', fontsize=9)

plt.tight_layout()
plt.savefig('figures/effect_size_forest_plot.png', dpi=150, bbox_inches='tight')
plt.show()

## 7. Classification — Logistic Regression with LOO Cross-Validation

Using the top discriminating features, we build a parsimonious logistic regression model and evaluate it with Leave-One-Out cross-validation (appropriate for small samples). We deliberately limit the model to 2–3 predictors to avoid overfitting with n=60.

In [None]:
# Select top features for classification (limit to 2-3 to avoid overfitting)
# Choose features that are conceptually distinct (not redundant)
candidate_features = comp_df.head(30)

# Pick features from different categories to minimize collinearity
# We'll select the top feature from: speed/efficiency, variability, and asymmetry
speed_feats = [f for f in candidate_features['Feature'] if any(k in f for k in ['Gait_Speed', 'Stride_Length', 'Cadence'])]
variability_feats = [f for f in candidate_features['Feature'] if any(k in f for k in ['cv', 'sd', 'slope'])]
asymmetry_feats = [f for f in candidate_features['Feature'] if 'Asymmetry' in f]

selected = []
for feat_list, label in [(speed_feats, 'Speed/Efficiency'), 
                          (variability_feats, 'Variability'),
                          (asymmetry_feats, 'Asymmetry')]:
    if feat_list:
        selected.append(feat_list[0])
        print(f"  {label}: {feat_list[0]}")

# If we don't get 3 distinct categories, fill from top overall
while len(selected) < 3 and len(candidate_features) > len(selected):
    for f in candidate_features['Feature']:
        if f not in selected:
            selected.append(f)
            break
    if len(selected) >= 3:
        break

print(f"\nSelected {len(selected)} features for classification model")

# Prepare data
X = results[selected].dropna()
valid_idx = X.index
y = results.loc[valid_idx, 'Faller'].values
X = X.values

print(f"Samples with complete data: {len(X)} (Fallers: {y.sum()}, Non-Fallers: {(y==0).sum()})")

# LOO Cross-Validation
scaler = StandardScaler()
loo = LeaveOneOut()
model = LogisticRegression(penalty='l2', C=1.0, solver='lbfgs', max_iter=1000)

y_pred_proba = np.zeros(len(y))
y_pred_class = np.zeros(len(y))

for train_idx, test_idx in loo.split(X):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train = y[train_idx]
    
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    model.fit(X_train_scaled, y_train)
    y_pred_proba[test_idx] = model.predict_proba(X_test_scaled)[:, 1]
    y_pred_class[test_idx] = model.predict(X_test_scaled)

# Results
fpr, tpr, thresholds = roc_curve(y, y_pred_proba)
roc_auc = auc(fpr, tpr)

print(f"\n{'═' * 50}")
print(f"  LOO Cross-Validation Results")
print(f"{'═' * 50}")
print(f"  AUC:          {roc_auc:.3f}")
print(f"  Accuracy:     {(y_pred_class == y).mean():.3f}")

cm = confusion_matrix(y, y_pred_class)
tn, fp, fn, tp = cm.ravel()
sensitivity = tp / (tp + fn) if (tp + fn) > 0 else 0
specificity = tn / (tn + fp) if (tn + fp) > 0 else 0
print(f"  Sensitivity:  {sensitivity:.3f}")
print(f"  Specificity:  {specificity:.3f}")

In [None]:
# ROC Curve
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# ROC
ax1.plot(fpr, tpr, color=GOLD, linewidth=2.5, label=f'AUC = {roc_auc:.3f}')
ax1.plot([0, 1], [0, 1], color=GRAY, linewidth=1, linestyle='--', alpha=0.5)
ax1.fill_between(fpr, tpr, alpha=0.1, color=GOLD)
ax1.set_xlabel('False Positive Rate (1 − Specificity)')
ax1.set_ylabel('True Positive Rate (Sensitivity)')
ax1.set_title('ROC Curve — LOO Cross-Validation', fontweight='600')
ax1.legend(fontsize=11, loc='lower right')
ax1.set_xlim([-0.02, 1.02])
ax1.set_ylim([-0.02, 1.02])

# Confusion Matrix
sns.heatmap(cm, annot=True, fmt='d', cmap='YlOrBr', ax=ax2,
            xticklabels=['Non-Faller', 'Faller'],
            yticklabels=['Non-Faller', 'Faller'],
            annot_kws={'size': 16}, linewidths=1, linecolor='white')
ax2.set_xlabel('Predicted', fontsize=11)
ax2.set_ylabel('Actual', fontsize=11)
ax2.set_title('Confusion Matrix', fontweight='600')

plt.tight_layout()
plt.savefig('figures/roc_and_confusion.png', dpi=150, bbox_inches='tight')
plt.show()

## 8. Summary & Key Findings

### Methodology
- Segmented 6-minute walk test gait cycles into temporal quarters to capture **fatigue-related deterioration patterns**
- Engineered features capturing quarter means, fatigue magnitude (Q4−Q1), deterioration rate (slope), and within-quarter variability
- Applied assumption-appropriate statistical tests with FDR correction for multiple comparisons
- Built a parsimonious logistic regression classifier evaluated with Leave-One-Out cross-validation

### Clinical Implications
The quarters-based approach to gait analysis may provide **additional discriminative value** beyond traditional overall gait averages. Fatigue-related changes during sustained walking could serve as early indicators of fall risk that are not apparent in short-duration or averaged gait assessments.

### Limitations
- Small sample size (n=60) limits statistical power and generalizability
- Retrospective fall classification (self-reported falls in past year)
- Single assessment timepoint
- LOO cross-validation may overestimate performance; external validation is needed

### Tools & Technologies
`Python` · `pandas` · `NumPy` · `SciPy` · `scikit-learn` · `matplotlib` · `seaborn` · `APDM Mobility Lab`

---
*Analysis by Ferdinand Delgado, PhD — [ferdinanddelgadophd@gmail.com](mailto:ferdinanddelgadophd@gmail.com) · [LinkedIn](https://www.linkedin.com/in/ferdinanddelgado/)*