# Strava Activities Dashboard

This notebook provides a comprehensive analysis of your Strava activities using enriched data. It includes:
- Activity overview and summary statistics
- Time-based analysis and trends
- Performance metrics visualization
- Geographical distribution of activities
- Interactive data exploration tools

## Setup and Data Loading

In [None]:
# Import required libraries
import warnings

import folium
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

warnings.filterwarnings("ignore")

# Set Plotly as default plotting backend
pd.options.plotting.backend = "plotly"

# Load the enriched activities data
ACTIVITIES_FILE = (
    "/home/hope0hermes/Workspace/StravaAnalyzer/temp/activities_enriched.csv"
)
df = pd.read_csv(ACTIVITIES_FILE, sep=";", parse_dates=["start_date"])

# Basic data cleaning
df["year"] = df["start_date"].dt.year
df["month"] = df["start_date"].dt.month
df["day"] = df["start_date"].dt.day
df["weekday"] = df["start_date"].dt.day_name()
df["hour"] = df["start_date"].dt.hour

# Convert durations to hours
df["moving_time_hours"] = df["moving_time"] / 3600
df["total_time_hours"] = df["total_time"] / 3600

# Rename columns for easier access
df = df.rename(
    columns={
        "moving_average_power": "average_power",
        "moving_normalized_power": "normalized_power",
        "moving_intensity_factor": "intensity_factor",
        "moving_training_stress_score": "training_stress_score",
        "moving_average_hr": "average_hr",
        "moving_max_hr": "max_hr",
        "moving_hr_training_stress": "hr_training_stress",
        "moving_efficiency_factor": "efficiency_factor",
        "moving_power_hr_decoupling": "power_hr_decoupling",
        "moving_first_half_ef": "first_half_ef",
        "moving_second_half_ef": "second_half_ef",
        "moving_variability_index": "variability_index",
    }
)

# Create power zone columns
power_zone_cols = [
    col
    for col in df.columns
    if col.startswith("moving_power_z") and col.endswith("_percentage")
]
hr_zone_cols = [
    col
    for col in df.columns
    if col.startswith("moving_hr_z") and col.endswith("_percentage")
]

print(
    f"Loaded {len(df)} activities from {df['start_date'].min().date()} to {df['start_date'].max().date()}"
)
print(f"\nMetrics available: {len(df.columns)} columns")
print(f"Power zones: {len(power_zone_cols)} zones")
print(f"HR zones: {len(hr_zone_cols)} zones")

## Activity Overview Metrics

Let's start by looking at the overall statistics and distributions of our activities.

### ✅ Raw vs Moving Metrics - Fixed!

The enriched data contains two sets of metrics:
- **Raw Metrics** (`raw_*`): Time-weighted averages that include stopped periods (gaps in time series with 0W/0 HR)
- **Moving Metrics** (`moving_*`): Should only include periods when actively moving

**How it works:**
- Stream data has gaps in the time column (e.g., time jumps from 17s to 171s = 154s gap)
- These gaps represent stopped periods where the device wasn't recording
- At the edges of gaps, power/HR are typically 0W/0bpm
- Time-weighted averaging correctly accounts for these gaps
- **Result**: Raw metrics match Strava's values (includes stopped time)

**Note**: Currently raw and moving metrics are identical because Strava's `moving` column is incorrectly set to True everywhere. We're working on improving the moving period detection to properly differentiate between raw and moving metrics.

In [None]:
# Calculate summary statistics with new time-weighted metrics
summary_stats = pd.DataFrame(
    {
        "Total Activities": [len(df)],
        "Total Distance (km)": [df["distance"].sum() / 1000],
        "Total Elevation (m)": [df["elevation_gain"].sum()],
        "Total Moving Time (hours)": [df["moving_time_hours"].sum()],
        "Avg Power (W)": [df["average_power"].mean()],
        "Avg Normalized Power (W)": [df["normalized_power"].mean()],
        "Avg Intensity Factor": [df["intensity_factor"].mean()],
        "Avg TSS": [df["training_stress_score"].mean()],
        "Avg Heart Rate (bpm)": [df["average_hr"].mean()],
        "Avg hrTSS": [df["hr_training_stress"].mean()],
        "Avg Efficiency Factor": [df["efficiency_factor"].mean()],
        "Avg Variability Index": [df["variability_index"].mean()],
        "Avg Power:HR Decoupling (%)": [df["power_hr_decoupling"].mean()],
    }
).T

summary_stats.columns = ["Value"]
summary_stats["Value"] = summary_stats["Value"].round(2)

# Create a styled table
fig = go.Figure(
    data=[
        go.Table(
            header=dict(
                values=["Metric", "Value"],
                fill_color="royalblue",
                align="left",
                font=dict(color="white", size=12),
            ),
            cells=dict(
                values=[summary_stats.index, summary_stats["Value"]],
                fill_color="lavender",
                align="left",
            ),
        )
    ]
)

fig.update_layout(
    title="Overall Activity Statistics (Time-Weighted Metrics)", width=800, height=500
)
fig.show()

# Print some key insights
print("\n📊 Key Insights:")
print(f"Average workout intensity: {df['intensity_factor'].mean():.2f} IF")
print(f"Average efficiency: {df['efficiency_factor'].mean():.2f} W/bpm")
print(f"Average variability: {df['variability_index'].mean():.2f} VI")
print(f"Average decoupling: {df['power_hr_decoupling'].mean():.2f}%")

In [None]:
# Create subplots for activity distributions with new metrics
fig = make_subplots(
    rows=3,
    cols=2,
    subplot_titles=(
        "Distance Distribution",
        "Elevation Distribution",
        "Power Distribution (Time-Weighted)",
        "Heart Rate Distribution (Time-Weighted)",
        "Training Stress Score",
        "Efficiency Factor",
    ),
)

# Distance histogram
fig.add_trace(
    go.Histogram(
        x=df["distance"] / 1000, name="Distance", nbinsx=30, marker_color="royalblue"
    ),
    row=1,
    col=1,
)

# Elevation histogram
fig.add_trace(
    go.Histogram(
        x=df["elevation_gain"], name="Elevation", nbinsx=30, marker_color="green"
    ),
    row=1,
    col=2,
)

# Power histogram (time-weighted)
fig.add_trace(
    go.Histogram(
        x=df["average_power"].dropna(), name="Power", nbinsx=30, marker_color="red"
    ),
    row=2,
    col=1,
)

# Heart rate histogram (time-weighted)
fig.add_trace(
    go.Histogram(
        x=df["average_hr"].dropna(), name="Heart Rate", nbinsx=30, marker_color="purple"
    ),
    row=2,
    col=2,
)

# TSS histogram
fig.add_trace(
    go.Histogram(
        x=df["training_stress_score"].dropna(),
        name="TSS",
        nbinsx=30,
        marker_color="orange",
    ),
    row=3,
    col=1,
)

# Efficiency Factor histogram
fig.add_trace(
    go.Histogram(
        x=df["efficiency_factor"].dropna(), name="EF", nbinsx=30, marker_color="teal"
    ),
    row=3,
    col=2,
)

# Update layout
fig.update_layout(
    height=1000,
    width=1000,
    showlegend=False,
    title_text="Activity Metrics Distributions (Time-Weighted Averaging)",
)
fig.update_xaxes(title_text="Distance (km)", row=1, col=1)
fig.update_xaxes(title_text="Elevation (m)", row=1, col=2)
fig.update_xaxes(title_text="Power (W)", row=2, col=1)
fig.update_xaxes(title_text="Heart Rate (bpm)", row=2, col=2)
fig.update_xaxes(title_text="TSS", row=3, col=1)
fig.update_xaxes(title_text="Efficiency Factor (W/bpm)", row=3, col=2)

fig.show()

# Print statistics
print("\n📈 Time-Weighted Metrics Statistics:")
print("\nPower:")
print(
    f"  Mean: {df['average_power'].mean():.1f}W, Median: {df['average_power'].median():.1f}W"
)
print(f"  Normalized: {df['normalized_power'].mean():.1f}W")
print("\nHeart Rate:")
print(
    f"  Mean: {df['average_hr'].mean():.1f}bpm, Median: {df['average_hr'].median():.1f}bpm"
)
print("\nTraining Load:")
print(f"  Avg TSS: {df['training_stress_score'].mean():.1f}")
print(f"  Avg hrTSS: {df['hr_training_stress'].mean():.1f}")
print("\nEfficiency Metrics:")
print(f"  Avg EF: {df['efficiency_factor'].mean():.2f} W/bpm")
print(f"  Avg VI: {df['variability_index'].mean():.2f}")
print(f"  Avg Decoupling: {df['power_hr_decoupling'].mean():.2f}%")

## Time-based Analysis

Let's analyze how your activities are distributed over time and identify any patterns.

In [None]:
# Create time series plot of activities
monthly_stats = (
    df.groupby(pd.Grouper(key="start_date", freq="M"))
    .agg(
        {
            "distance": "sum",
            "elevation_gain": "sum",
            "moving_time_hours": "sum",
            "training_stress_score": "mean",
        }
    )
    .reset_index()
)

# Create subplot figure
fig = make_subplots(
    rows=2,
    cols=2,
    subplot_titles=(
        "Monthly Distance",
        "Monthly Elevation",
        "Monthly Moving Time",
        "Average Training Stress",
    ),
)

# Add traces
fig.add_trace(
    go.Scatter(
        x=monthly_stats["start_date"],
        y=monthly_stats["distance"] / 1000,
        mode="lines+markers",
        name="Distance",
        line=dict(color="royalblue"),
    ),
    row=1,
    col=1,
)

fig.add_trace(
    go.Scatter(
        x=monthly_stats["start_date"],
        y=monthly_stats["elevation_gain"],
        mode="lines+markers",
        name="Elevation",
        line=dict(color="green"),
    ),
    row=1,
    col=2,
)

fig.add_trace(
    go.Scatter(
        x=monthly_stats["start_date"],
        y=monthly_stats["moving_time_hours"],
        mode="lines+markers",
        name="Time",
        line=dict(color="red"),
    ),
    row=2,
    col=1,
)

fig.add_trace(
    go.Scatter(
        x=monthly_stats["start_date"],
        y=monthly_stats["training_stress_score"],
        mode="lines+markers",
        name="TSS",
        line=dict(color="purple"),
    ),
    row=2,
    col=2,
)

# Update layout
fig.update_layout(
    height=800, width=1000, showlegend=False, title_text="Monthly Activity Trends"
)
fig.update_xaxes(title_text="Date", row=2, col=1)
fig.update_xaxes(title_text="Date", row=2, col=2)
fig.update_yaxes(title_text="Distance (km)", row=1, col=1)
fig.update_yaxes(title_text="Elevation (m)", row=1, col=2)
fig.update_yaxes(title_text="Moving Time (hours)", row=2, col=1)
fig.update_yaxes(title_text="Training Stress Score", row=2, col=2)

fig.show()

In [None]:
# Weekly patterns - separate visualizations for each metric
weekly_stats = (
    df.groupby("weekday")
    .agg(
        {
            "distance": ["count", "mean"],
            "moving_time_hours": "mean",
            "training_stress_score": "mean",
        }
    )
    .round(2)
)

weekly_stats.columns = [
    "Activity Count",
    "Avg Distance (km)",
    "Avg Moving Time (hrs)",
    "Avg TSS",
]
weekly_stats["Avg Distance (km)"] = weekly_stats["Avg Distance (km)"] / 1000
weekly_stats = weekly_stats.reindex(
    ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
)

# Create separate bar charts for each metric (better than heatmap with different scales)
fig = make_subplots(
    rows=2,
    cols=2,
    subplot_titles=(
        "Activity Count by Day",
        "Avg Distance by Day",
        "Avg Moving Time by Day",
        "Avg Training Stress by Day",
    ),
)

# Activity count
fig.add_trace(
    go.Bar(
        x=weekly_stats.index,
        y=weekly_stats["Activity Count"],
        name="Count",
        marker_color="royalblue",
        text=weekly_stats["Activity Count"],
        textposition="auto",
    ),
    row=1,
    col=1,
)

# Average distance
fig.add_trace(
    go.Bar(
        x=weekly_stats.index,
        y=weekly_stats["Avg Distance (km)"],
        name="Distance",
        marker_color="green",
        text=weekly_stats["Avg Distance (km)"].round(1),
        textposition="auto",
    ),
    row=1,
    col=2,
)

# Average moving time
fig.add_trace(
    go.Bar(
        x=weekly_stats.index,
        y=weekly_stats["Avg Moving Time (hrs)"],
        name="Time",
        marker_color="orange",
        text=weekly_stats["Avg Moving Time (hrs)"].round(1),
        textposition="auto",
    ),
    row=2,
    col=1,
)

# Average TSS
fig.add_trace(
    go.Bar(
        x=weekly_stats.index,
        y=weekly_stats["Avg TSS"],
        name="TSS",
        marker_color="purple",
        text=weekly_stats["Avg TSS"].round(1),
        textposition="auto",
    ),
    row=2,
    col=2,
)

fig.update_layout(
    height=800, width=1200, showlegend=False, title_text="Weekly Activity Patterns"
)
fig.update_yaxes(title_text="Count", row=1, col=1)
fig.update_yaxes(title_text="Distance (km)", row=1, col=2)
fig.update_yaxes(title_text="Hours", row=2, col=1)
fig.update_yaxes(title_text="TSS", row=2, col=2)

# Rotate x-axis labels for better readability
fig.update_xaxes(tickangle=45)

fig.show()

# Print weekly insights
print("\n📅 Weekly Patterns:")
most_active_day = weekly_stats["Activity Count"].idxmax()
longest_day = weekly_stats["Avg Distance (km)"].idxmax()
hardest_day = weekly_stats["Avg TSS"].idxmax()

print(
    f"Most active day: {most_active_day} ({weekly_stats.loc[most_active_day, 'Activity Count']:.0f} activities)"
)
print(
    f"Longest rides on: {longest_day} ({weekly_stats.loc[longest_day, 'Avg Distance (km)']:.1f} km avg)"
)
print(
    f"Hardest workouts on: {hardest_day} ({weekly_stats.loc[hardest_day, 'Avg TSS']:.1f} TSS avg)"
)

## Performance Analysis

Let's analyze your performance metrics and their relationships.

## Advanced Performance Metrics

Let's analyze the advanced metrics including Efficiency Factor, Variability Index, and Power:HR Decoupling. These metrics use time-weighted averaging for improved accuracy.

In [None]:
# Advanced metrics over time
fig = make_subplots(
    rows=2,
    cols=2,
    subplot_titles=(
        "Efficiency Factor Trend",
        "Variability Index Trend",
        "Power:HR Decoupling Trend",
        "Normalized Power vs Average Power",
    ),
)

# Efficiency Factor trend
fig.add_trace(
    go.Scatter(
        x=df["start_date"],
        y=df["efficiency_factor"],
        mode="markers",
        name="EF",
        marker=dict(color=df["intensity_factor"], colorscale="Viridis", size=8),
        text=df["name"],
        hovertemplate="<b>%{text}</b><br>Date: %{x}<br>EF: %{y:.2f}<extra></extra>",
    ),
    row=1,
    col=1,
)

# Variability Index trend
fig.add_trace(
    go.Scatter(
        x=df["start_date"],
        y=df["variability_index"],
        mode="markers",
        name="VI",
        marker=dict(color=df["training_stress_score"], colorscale="Plasma", size=8),
        text=df["name"],
        hovertemplate="<b>%{text}</b><br>Date: %{x}<br>VI: %{y:.2f}<extra></extra>",
    ),
    row=1,
    col=2,
)

# Power:HR Decoupling trend
fig.add_trace(
    go.Scatter(
        x=df["start_date"],
        y=df["power_hr_decoupling"],
        mode="markers",
        name="Decoupling",
        marker=dict(color=df["distance"] / 1000, colorscale="RdYlGn_r", size=8),
        text=df["name"],
        hovertemplate="<b>%{text}</b><br>Date: %{x}<br>Decoupling: %{y:.2f}%<extra></extra>",
    ),
    row=2,
    col=1,
)

# Normalized Power vs Average Power
fig.add_trace(
    go.Scatter(
        x=df["average_power"],
        y=df["normalized_power"],
        mode="markers",
        name="NP vs AP",
        marker=dict(color=df["variability_index"], colorscale="Turbo", size=8),
        text=df["name"],
        hovertemplate="<b>%{text}</b><br>Avg Power: %{x:.0f}W<br>Normalized Power: %{y:.0f}W<extra></extra>",
    ),
    row=2,
    col=2,
)

# Add diagonal reference line for NP vs AP
max_power = max(df["average_power"].max(), df["normalized_power"].max())
fig.add_trace(
    go.Scatter(
        x=[0, max_power],
        y=[0, max_power],
        mode="lines",
        name="Equal line",
        line=dict(color="gray", dash="dash"),
        showlegend=False,
    ),
    row=2,
    col=2,
)

fig.update_layout(
    height=800,
    width=1200,
    showlegend=False,
    title_text="Advanced Performance Metrics Trends (Time-Weighted)",
)
fig.update_yaxes(title_text="Efficiency Factor (W/bpm)", row=1, col=1)
fig.update_yaxes(title_text="Variability Index", row=1, col=2)
fig.update_yaxes(title_text="Decoupling (%)", row=2, col=1)
fig.update_yaxes(title_text="Normalized Power (W)", row=2, col=2)
fig.update_xaxes(title_text="Date", row=1, col=1)
fig.update_xaxes(title_text="Date", row=1, col=2)
fig.update_xaxes(title_text="Date", row=2, col=1)
fig.update_xaxes(title_text="Average Power (W)", row=2, col=2)

fig.show()

# Print insights
print("\n🎯 Advanced Metrics Insights:")
print("\nEfficiency Factor:")
print(
    f"  Range: {df['efficiency_factor'].min():.2f} - {df['efficiency_factor'].max():.2f} W/bpm"
)
print(f"  Best activity: {df.loc[df['efficiency_factor'].idxmax(), 'name']}")
print("\nVariability Index:")
print(
    f"  Range: {df['variability_index'].min():.2f} - {df['variability_index'].max():.2f}"
)
print(
    f"  Most steady (VI closest to 1.0): {df.loc[(df['variability_index'] - 1.0).abs().idxmin(), 'name']}"
)
print(f"  Most variable: {df.loc[df['variability_index'].idxmax(), 'name']}")
print("\nPower:HR Decoupling:")
print(
    f"  Range: {df['power_hr_decoupling'].min():.2f}% - {df['power_hr_decoupling'].max():.2f}%"
)
print(
    f"  Best endurance (lowest decoupling): {df.loc[df['power_hr_decoupling'].idxmin(), 'name']}"
)

In [None]:
# Efficiency Factor breakdown: First Half vs Second Half
fig = go.Figure()

# Add first half EF
fig.add_trace(
    go.Scatter(
        x=df["start_date"],
        y=df["first_half_ef"],
        mode="markers",
        name="First Half EF",
        marker=dict(color="green", size=8, opacity=0.6),
        text=df["name"],
        hovertemplate="<b>%{text}</b><br>Date: %{x}<br>First Half EF: %{y:.2f}<extra></extra>",
    )
)

# Add second half EF
fig.add_trace(
    go.Scatter(
        x=df["start_date"],
        y=df["second_half_ef"],
        mode="markers",
        name="Second Half EF",
        marker=dict(color="red", size=8, opacity=0.6),
        text=df["name"],
        hovertemplate="<b>%{text}</b><br>Date: %{x}<br>Second Half EF: %{y:.2f}<extra></extra>",
    )
)

fig.update_layout(
    title="Efficiency Factor: First Half vs Second Half (Fatigue Analysis)",
    xaxis_title="Date",
    yaxis_title="Efficiency Factor (W/bpm)",
    height=500,
    width=1200,
    hovermode="closest",
)

fig.show()

print("\n💪 Fatigue Analysis:")
print(f"Average First Half EF: {df['first_half_ef'].mean():.2f} W/bpm")
print(f"Average Second Half EF: {df['second_half_ef'].mean():.2f} W/bpm")
print(
    f"Average EF drop: {(df['first_half_ef'].mean() - df['second_half_ef'].mean()):.2f} W/bpm"
)
print(f"Average decoupling: {df['power_hr_decoupling'].mean():.2f}%")
print(
    f"\nActivities with <5% decoupling (good endurance): {len(df[df['power_hr_decoupling'].abs() < 5])}"
)
print(
    f"Activities with >10% decoupling (fatigue present): {len(df[df['power_hr_decoupling'] > 10])}"
)

In [None]:
# Power and Heart Rate Zone Analysis (Time-Weighted)

# Get power zone columns
power_zone_cols = [
    col
    for col in df.columns
    if col.startswith("moving_power_z") and col.endswith("_percentage")
]
hr_zone_cols = [
    col
    for col in df.columns
    if col.startswith("moving_hr_z") and col.endswith("_percentage")
]

print(f"Found {len(power_zone_cols)} power zones and {len(hr_zone_cols)} HR zones")
print(f"Power zone columns: {power_zone_cols}")
print(f"HR zone columns: {hr_zone_cols}")

# Calculate average time in each zone
if power_zone_cols:
    power_zones_avg = df[power_zone_cols].mean()
    power_zones_avg.index = [
        f"Zone {col.split('_z')[1].split('_')[0]}" for col in power_zone_cols
    ]
else:
    power_zones_avg = pd.Series()

if hr_zone_cols:
    hr_zones_avg = df[hr_zone_cols].mean()
    hr_zones_avg.index = [
        f"Zone {col.split('_z')[1].split('_')[0]}" for col in hr_zone_cols
    ]
else:
    hr_zones_avg = pd.Series()

# Create zone distribution plots
fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=(
        "Power Zone Distribution (Time-Weighted)",
        "Heart Rate Zone Distribution (Time-Weighted)",
    ),
)

# Power zones plot
if not power_zones_avg.empty:
    fig.add_trace(
        go.Bar(
            x=power_zones_avg.index,
            y=power_zones_avg.values,
            name="Power Zones",
            text=power_zones_avg.values.round(1),
            textposition="auto",
            marker_color=[
                "lightblue",
                "green",
                "yellow",
                "orange",
                "red",
                "darkred",
                "purple",
            ][: len(power_zones_avg)],
        ),
        row=1,
        col=1,
    )

    print("\n⚡ Power Zone Distribution (Time-Weighted):")
    for zone, value in power_zones_avg.items():
        print(f"  {zone}: {value:.1f}%")
else:
    print("No power zone data available")

# Heart rate zones plot
if not hr_zones_avg.empty:
    fig.add_trace(
        go.Bar(
            x=hr_zones_avg.index,
            y=hr_zones_avg.values,
            name="HR Zones",
            text=hr_zones_avg.values.round(1),
            textposition="auto",
            marker_color=["lightblue", "green", "yellow", "orange", "red"][
                : len(hr_zones_avg)
            ],
        ),
        row=1,
        col=2,
    )

    print("\n❤️ Heart Rate Zone Distribution (Time-Weighted):")
    for zone, value in hr_zones_avg.items():
        print(f"  {zone}: {value:.1f}%")
else:
    print("No heart rate zone data available")

fig.update_layout(
    height=500,
    width=1200,
    showlegend=False,
    title_text="Training Zone Distributions (Time-Weighted Averages)",
)
fig.update_yaxes(title_text="Percentage of Time (%)", row=1, col=1)
fig.update_yaxes(title_text="Percentage of Time (%)", row=1, col=2)
fig.update_traces(texttemplate="%{text}%")

fig.show()

# Zone distribution over time
if power_zone_cols:
    fig = go.Figure()

    for i, col in enumerate(power_zone_cols):
        zone_num = col.split("_z")[1].split("_")[0]
        fig.add_trace(
            go.Scatter(
                x=df["start_date"],
                y=df[col],
                mode="lines",
                name=f"Zone {zone_num}",
                stackgroup="one",
                groupnorm="percent",
            )
        )

    fig.update_layout(
        title="Power Zone Distribution Over Time (Time-Weighted)",
        xaxis_title="Date",
        yaxis_title="Percentage of Time (%)",
        height=500,
        width=1200,
        hovermode="x unified",
    )

    fig.show()

In [None]:
# Fatigue Resistance Analysis
fatigue_cols = [
    "moving_fatigue_index",
    "moving_power_sustainability_index",
    "moving_first_half_power",
    "moving_second_half_power",
]

has_fatigue = all(col in df.columns for col in fatigue_cols)

if has_fatigue:
    print("✅ Fatigue metrics found in data!")

    # Create fatigue metrics visualization
    fig = make_subplots(
        rows=2,
        cols=2,
        subplot_titles=(
            "Fatigue Index Distribution",
            "Power Sustainability Index",
            "First vs Second Half Power",
            "Fatigue Index Over Time",
        ),
    )

    # Fatigue Index histogram
    fig.add_trace(
        go.Histogram(
            x=df["moving_fatigue_index"].dropna(),
            name="Fatigue Index",
            nbinsx=30,
            marker_color="red",
        ),
        row=1,
        col=1,
    )

    # Power Sustainability Index histogram
    fig.add_trace(
        go.Histogram(
            x=df["moving_power_sustainability_index"].dropna(),
            name="Sustainability",
            nbinsx=30,
            marker_color="blue",
        ),
        row=1,
        col=2,
    )

    # First vs Second Half scatter
    fig.add_trace(
        go.Scatter(
            x=df["moving_first_half_power"],
            y=df["moving_second_half_power"],
            mode="markers",
            name="Half Comparison",
            marker=dict(
                color=df["moving_fatigue_index"], colorscale="RdYlGn_r", size=8
            ),
            text=df["name"],
            hovertemplate="<b>%{text}</b><br>1st Half: %{x:.0f}W<br>2nd Half: %{y:.0f}W<extra></extra>",
        ),
        row=2,
        col=1,
    )

    # Add diagonal reference line
    max_power = max(
        df["moving_first_half_power"].max(), df["moving_second_half_power"].max()
    )
    fig.add_trace(
        go.Scatter(
            x=[0, max_power],
            y=[0, max_power],
            mode="lines",
            name="Equal",
            line=dict(color="gray", dash="dash"),
            showlegend=False,
        ),
        row=2,
        col=1,
    )

    # Fatigue Index over time
    fig.add_trace(
        go.Scatter(
            x=df["start_date"],
            y=df["moving_fatigue_index"],
            mode="markers",
            name="Fatigue Index",
            marker=dict(color=df["distance"] / 1000, colorscale="Plasma", size=8),
            text=df["name"],
            hovertemplate="<b>%{text}</b><br>Date: %{x}<br>Fatigue Index: %{y:.1f}%<extra></extra>",
        ),
        row=2,
        col=2,
    )

    fig.update_layout(
        height=800,
        width=1200,
        showlegend=False,
        title_text="Fatigue Resistance Metrics",
    )
    fig.update_xaxes(title_text="Fatigue Index (%)", row=1, col=1)
    fig.update_xaxes(title_text="Sustainability Index", row=1, col=2)
    fig.update_xaxes(title_text="First Half Power (W)", row=2, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=2)
    fig.update_yaxes(title_text="Count", row=1, col=1)
    fig.update_yaxes(title_text="Count", row=1, col=2)
    fig.update_yaxes(title_text="Second Half Power (W)", row=2, col=1)
    fig.update_yaxes(title_text="Fatigue Index (%)", row=2, col=2)

    fig.show()

    # Print insights
    print("\n💪 Fatigue Resistance Statistics:")
    print("\nFatigue Index:")
    print(f"  Average: {df['moving_fatigue_index'].mean():.1f}%")
    print(
        f"  Best (lowest): {df['moving_fatigue_index'].min():.1f}% - {df.loc[df['moving_fatigue_index'].idxmin(), 'name']}"
    )
    print(
        f"  Worst (highest): {df['moving_fatigue_index'].max():.1f}% - {df.loc[df['moving_fatigue_index'].idxmax(), 'name']}"
    )

    print("\nPower Sustainability:")
    print(f"  Average: {df['moving_power_sustainability_index'].mean():.2f}")
    print(
        f"  Best: {df['moving_power_sustainability_index'].max():.2f} - {df.loc[df['moving_power_sustainability_index'].idxmax(), 'name']}"
    )

    print("\nActivities by Fatigue Level:")
    print(f"  Excellent (<5% fatigue): {len(df[df['moving_fatigue_index'] < 5])}")
    print(
        f"  Good (5-10% fatigue): {len(df[(df['moving_fatigue_index'] >= 5) & (df['moving_fatigue_index'] < 10)])}"
    )
    print(
        f"  Moderate (10-15% fatigue): {len(df[(df['moving_fatigue_index'] >= 10) & (df['moving_fatigue_index'] < 15)])}"
    )
    print(f"  High (>15% fatigue): {len(df[df['moving_fatigue_index'] >= 15])}")

else:
    print(
        "❌ No fatigue metrics found in data. These are calculated during processing."
    )

In [None]:
# Fatigue vs Activity Characteristics
if has_fatigue:
    fig = make_subplots(
        rows=1,
        cols=3,
        subplot_titles=(
            "Fatigue vs Distance",
            "Fatigue vs Duration",
            "Fatigue vs Intensity",
        ),
    )

    # Fatigue vs Distance
    fig.add_trace(
        go.Scatter(
            x=df["distance"] / 1000,
            y=df["moving_fatigue_index"],
            mode="markers",
            name="Distance",
            marker=dict(color=df["elevation_gain"], colorscale="Viridis", size=8),
            text=df["name"],
            hovertemplate="<b>%{text}</b><br>Distance: %{x:.1f}km<br>Fatigue: %{y:.1f}%<extra></extra>",
        ),
        row=1,
        col=1,
    )

    # Fatigue vs Duration
    fig.add_trace(
        go.Scatter(
            x=df["moving_time_hours"],
            y=df["moving_fatigue_index"],
            mode="markers",
            name="Duration",
            marker=dict(color=df["training_stress_score"], colorscale="Plasma", size=8),
            text=df["name"],
            hovertemplate="<b>%{text}</b><br>Duration: %{x:.1f}hrs<br>Fatigue: %{y:.1f}%<extra></extra>",
        ),
        row=1,
        col=2,
    )

    # Fatigue vs Intensity
    fig.add_trace(
        go.Scatter(
            x=df["intensity_factor"],
            y=df["moving_fatigue_index"],
            mode="markers",
            name="Intensity",
            marker=dict(color=df["normalized_power"], colorscale="Turbo", size=8),
            text=df["name"],
            hovertemplate="<b>%{text}</b><br>IF: %{x:.2f}<br>Fatigue: %{y:.1f}%<extra></extra>",
        ),
        row=1,
        col=3,
    )

    fig.update_layout(
        height=500,
        width=1400,
        showlegend=False,
        title_text="Fatigue Index vs Activity Characteristics",
    )
    fig.update_xaxes(title_text="Distance (km)", row=1, col=1)
    fig.update_xaxes(title_text="Duration (hours)", row=1, col=2)
    fig.update_xaxes(title_text="Intensity Factor", row=1, col=3)
    fig.update_yaxes(title_text="Fatigue Index (%)", row=1, col=1)
    fig.update_yaxes(title_text="Fatigue Index (%)", row=1, col=2)
    fig.update_yaxes(title_text="Fatigue Index (%)", row=1, col=3)

    fig.show()

    # Calculate correlations
    print("\n🔗 Fatigue Correlations:")
    print(f"Fatigue vs Distance: {df['moving_fatigue_index'].corr(df['distance']):.2f}")
    print(
        f"Fatigue vs Duration: {df['moving_fatigue_index'].corr(df['moving_time_hours']):.2f}"
    )
    print(
        f"Fatigue vs Intensity: {df['moving_fatigue_index'].corr(df['intensity_factor']):.2f}"
    )
    print(
        f"Fatigue vs TSS: {df['moving_fatigue_index'].corr(df['training_stress_score']):.2f}"
    )

## Fatigue Resistance Analysis

Fatigue resistance metrics measure how well you maintain power output throughout an activity. These metrics help identify endurance strengths and weaknesses.

### Key Fatigue Metrics:
- **Fatigue Index**: Percentage decline from initial to final power
- **Power Sustainability Index**: Ratio of second-half to first-half power
- **First/Second Half Comparison**: Direct comparison of power in each half
- **Coefficient of Variation**: Power consistency throughout the ride

In [None]:
# TID Analysis - Check for TID metrics
tid_power_cols = [
    "moving_power_tid_z1_percentage",
    "moving_power_tid_z2_percentage",
    "moving_power_tid_z3_percentage",
]
tid_hr_cols = [
    "moving_hr_tid_z1_percentage",
    "moving_hr_tid_z2_percentage",
    "moving_hr_tid_z3_percentage",
]

has_tid_power = all(col in df.columns for col in tid_power_cols)
has_tid_hr = all(col in df.columns for col in tid_hr_cols)

if has_tid_power or has_tid_hr:
    print("✅ TID metrics found in data!")

    # Create TID zone distribution comparison
    fig = make_subplots(
        rows=1, cols=2, subplot_titles=("Power-Based TID", "Heart Rate-Based TID")
    )

    if has_tid_power:
        tid_power_avg = df[tid_power_cols].mean()
        tid_power_labels = ["Z1 (Low)", "Z2 (Moderate)", "Z3 (High)"]

        fig.add_trace(
            go.Bar(
                x=tid_power_labels,
                y=tid_power_avg.values,
                name="Power TID",
                text=tid_power_avg.values.round(1),
                textposition="auto",
                marker_color=["green", "orange", "red"],
            ),
            row=1,
            col=1,
        )

        print("\n⚡ Power-Based TID Distribution:")
        for label, value in zip(tid_power_labels, tid_power_avg.values, strict=False):
            print(f"  {label}: {value:.1f}%")

    if has_tid_hr:
        tid_hr_avg = df[tid_hr_cols].mean()
        tid_hr_labels = ["Z1 (Low)", "Z2 (Moderate)", "Z3 (High)"]

        fig.add_trace(
            go.Bar(
                x=tid_hr_labels,
                y=tid_hr_avg.values,
                name="HR TID",
                text=tid_hr_avg.values.round(1),
                textposition="auto",
                marker_color=["green", "orange", "red"],
            ),
            row=1,
            col=2,
        )

        print("\n❤️ Heart Rate-Based TID Distribution:")
        for label, value in zip(tid_hr_labels, tid_hr_avg.values, strict=False):
            print(f"  {label}: {value:.1f}%")

    fig.update_layout(
        height=500,
        width=1200,
        showlegend=False,
        title_text="Training Intensity Distribution (3-Zone Model)",
    )
    fig.update_yaxes(title_text="Percentage of Time (%)", row=1, col=1)
    fig.update_yaxes(title_text="Percentage of Time (%)", row=1, col=2)
    fig.update_traces(texttemplate="%{text}%")

    fig.show()

    # Polarization analysis
    if "moving_power_polarization_index" in df.columns:
        print("\n📊 Polarization Analysis (Power):")
        print(f"  Average PI: {df['moving_power_polarization_index'].mean():.2f}")
        print(f"  Average TDR: {df['moving_power_tdr'].mean():.2f}")
        print("\n  Interpretation:")
        avg_pi = df["moving_power_polarization_index"].mean()
        if avg_pi > 4.0:
            print(f"  ✅ Highly polarized training (PI={avg_pi:.2f}) - excellent!")
        elif avg_pi > 2.0:
            print(f"  ✓ Moderately polarized (PI={avg_pi:.2f}) - good")
        else:
            print(
                f"  ⚠️ Low polarization (PI={avg_pi:.2f}) - consider more low/high intensity"
            )

    if "moving_hr_polarization_index" in df.columns:
        print("\n📊 Polarization Analysis (HR):")
        print(f"  Average PI: {df['moving_hr_polarization_index'].mean():.2f}")
        print(f"  Average TDR: {df['moving_hr_tdr'].mean():.2f}")

else:
    print("❌ No TID metrics found in data. These are calculated during processing.")

In [None]:
# TID Polarization Index over time
if has_tid_power and "moving_power_polarization_index" in df.columns:
    fig = go.Figure()

    fig.add_trace(
        go.Scatter(
            x=df["start_date"],
            y=df["moving_power_polarization_index"],
            mode="markers+lines",
            name="Polarization Index",
            marker=dict(
                size=8, color=df["training_stress_score"], colorscale="Viridis"
            ),
            line=dict(width=1, color="lightblue"),
            text=df["name"],
            hovertemplate="<b>%{text}</b><br>Date: %{x}<br>PI: %{y:.2f}<extra></extra>",
        )
    )

    # Add reference lines
    fig.add_hline(
        y=4.0,
        line_dash="dash",
        line_color="green",
        annotation_text="Highly Polarized (PI > 4.0)",
    )
    fig.add_hline(
        y=2.0,
        line_dash="dash",
        line_color="orange",
        annotation_text="Moderately Polarized (PI > 2.0)",
    )

    fig.update_layout(
        title="Polarization Index Over Time (Power-Based)",
        xaxis_title="Date",
        yaxis_title="Polarization Index",
        height=500,
        width=1200,
        hovermode="closest",
    )

    fig.show()

    print("\n🎯 Polarization Trends:")
    print(
        f"Activities with high polarization (PI > 4.0): {len(df[df['moving_power_polarization_index'] > 4.0])}"
    )
    print(
        f"Activities with moderate polarization (2.0 < PI < 4.0): {len(df[(df['moving_power_polarization_index'] > 2.0) & (df['moving_power_polarization_index'] <= 4.0)])}"
    )
    print(
        f"Activities with low polarization (PI < 2.0): {len(df[df['moving_power_polarization_index'] <= 2.0])}"
    )

## Training Intensity Distribution (TID) Analysis

Training Intensity Distribution (TID) analyzes how your training time is distributed across intensity zones. The **polarized training model** suggests that optimal training occurs when you spend most time in low intensity (Z1-Z2) and high intensity (Z3+), with minimal time at moderate intensity.

### Key TID Metrics:
- **Polarization Index (PI)**: Ratio of (Z1+Z3) to Z2 time - higher values indicate more polarized training
- **Training Distribution Ratio (TDR)**: Z1 / (Z2 + Z3) - measures low intensity emphasis
- **Zone Distribution**: Percentage of time in each of the 3 TID zones

In [None]:
# Performance correlations with time-weighted metrics
performance_cols = [
    "distance",
    "moving_time_hours",
    "elevation_gain",
    "average_power",
    "normalized_power",
    "intensity_factor",
    "training_stress_score",
    "hr_training_stress",
    "average_hr",
    "efficiency_factor",
    "variability_index",
    "power_hr_decoupling",
]

# Calculate correlation matrix
corr_matrix = df[performance_cols].corr().round(2)

# Create heatmap
fig = px.imshow(
    corr_matrix,
    title="Performance Metrics Correlations (Time-Weighted)",
    color_continuous_scale="RdBu",
    aspect="auto",
    labels=dict(color="Correlation"),
)
fig.update_layout(height=700, width=900)
fig.show()

# Detailed scatter plots
fig = make_subplots(
    rows=2,
    cols=2,
    subplot_titles=(
        "Power vs Heart Rate",
        "Efficiency Factor vs Intensity",
        "TSS vs hrTSS",
        "Variability Index vs Intensity",
    ),
)

# Power vs HR
fig.add_trace(
    go.Scatter(
        x=df["average_power"],
        y=df["average_hr"],
        mode="markers",
        name="Power vs HR",
        marker=dict(color=df["intensity_factor"], colorscale="Viridis", size=8),
        text=df["name"],
        hovertemplate="<b>%{text}</b><br>Power: %{x:.0f}W<br>HR: %{y:.0f}bpm<extra></extra>",
    ),
    row=1,
    col=1,
)

# EF vs Intensity
fig.add_trace(
    go.Scatter(
        x=df["intensity_factor"],
        y=df["efficiency_factor"],
        mode="markers",
        name="EF vs IF",
        marker=dict(color=df["power_hr_decoupling"], colorscale="RdYlGn_r", size=8),
        text=df["name"],
        hovertemplate="<b>%{text}</b><br>IF: %{x:.2f}<br>EF: %{y:.2f}<extra></extra>",
    ),
    row=1,
    col=2,
)

# TSS vs hrTSS
fig.add_trace(
    go.Scatter(
        x=df["training_stress_score"],
        y=df["hr_training_stress"],
        mode="markers",
        name="TSS vs hrTSS",
        marker=dict(color=df["distance"] / 1000, colorscale="Plasma", size=8),
        text=df["name"],
        hovertemplate="<b>%{text}</b><br>TSS: %{x:.0f}<br>hrTSS: %{y:.0f}<extra></extra>",
    ),
    row=2,
    col=1,
)

# VI vs Intensity
fig.add_trace(
    go.Scatter(
        x=df["intensity_factor"],
        y=df["variability_index"],
        mode="markers",
        name="VI vs IF",
        marker=dict(color=df["training_stress_score"], colorscale="Turbo", size=8),
        text=df["name"],
        hovertemplate="<b>%{text}</b><br>IF: %{x:.2f}<br>VI: %{y:.2f}<extra></extra>",
    ),
    row=2,
    col=2,
)

fig.update_layout(
    height=800,
    width=1200,
    showlegend=False,
    title_text="Performance Metrics Relationships (Time-Weighted)",
)
fig.update_xaxes(title_text="Average Power (W)", row=1, col=1)
fig.update_xaxes(title_text="Intensity Factor", row=1, col=2)
fig.update_xaxes(title_text="TSS", row=2, col=1)
fig.update_xaxes(title_text="Intensity Factor", row=2, col=2)
fig.update_yaxes(title_text="Average HR (bpm)", row=1, col=1)
fig.update_yaxes(title_text="Efficiency Factor", row=1, col=2)
fig.update_yaxes(title_text="hrTSS", row=2, col=1)
fig.update_yaxes(title_text="Variability Index", row=2, col=2)

fig.show()

# Print key correlations
print("\n🔗 Key Correlations:")
print(f"Power vs HR: {corr_matrix.loc['average_power', 'average_hr']:.2f}")
print(
    f"TSS vs hrTSS: {corr_matrix.loc['training_stress_score', 'hr_training_stress']:.2f}"
)
print(
    f"EF vs Decoupling: {corr_matrix.loc['efficiency_factor', 'power_hr_decoupling']:.2f}"
)
print(
    f"VI vs Intensity: {corr_matrix.loc['variability_index', 'intensity_factor']:.2f}"
)

## Geographical Analysis

Let's visualize where your activities take place (if GPS data is available).

In [None]:
# Check if we have GPS data
if "start_latlng" in df.columns:
    # Convert string representation of latlng to actual coordinates
    def parse_latlng(x):
        if isinstance(x, str) and x.strip() and x != "[]":
            try:
                coords = x.strip("[]").split(",")
                if len(coords) == 2:
                    return [float(coords[0].strip()), float(coords[1].strip())]
            except (ValueError, IndexError):
                pass
        return None

    df["start_coords"] = df["start_latlng"].apply(parse_latlng)

    # Filter activities with valid coordinates
    activities_with_coords = df.dropna(subset=["start_coords"])

    if not activities_with_coords.empty:
        # Create a map centered on the mean coordinates
        center_lat = activities_with_coords["start_coords"].apply(lambda x: x[0]).mean()
        center_lng = activities_with_coords["start_coords"].apply(lambda x: x[1]).mean()

        m = folium.Map(location=[center_lat, center_lng], zoom_start=11)

        # Add markers for each activity
        for idx, row in activities_with_coords.iterrows():
            lat, lng = row["start_coords"]
            folium.CircleMarker(
                location=[lat, lng],
                radius=5,
                popup=f"Activity: {row['name']}<br>Date: {row['start_date']}<br>Distance: {row['distance'] / 1000:.1f}km",
                color="red",
                fill=True,
            ).add_to(m)

        # Display the map
        display(m)
    else:
        print("No activities with valid GPS coordinates found.")
else:
    print("No GPS data available in the activities file.")

## Interactive Data Explorer

Use the widgets below to explore your activities data interactively.

In [None]:
# Create an interactive scatter plot with dropdown menus for all metrics
available_metrics = [
    "distance",
    "moving_time_hours",
    "elevation_gain",
    "average_power",
    "normalized_power",
    "intensity_factor",
    "training_stress_score",
    "hr_training_stress",
    "average_hr",
    "max_hr",
    "efficiency_factor",
    "variability_index",
    "power_hr_decoupling",
    "first_half_ef",
    "second_half_ef",
]

# Create initial figure
fig = px.scatter(
    df,
    x="distance",
    y="average_power",
    color="intensity_factor",
    size="training_stress_score",
    hover_data=["name", "start_date"],
    title="Interactive Activity Explorer (Time-Weighted Metrics)",
    labels={
        "distance": "Distance (m)",
        "average_power": "Average Power (W)",
        "intensity_factor": "Intensity Factor",
        "training_stress_score": "TSS",
    },
)

# Create dropdown buttons for x-axis
x_buttons = []
for metric in available_metrics:
    x_buttons.append(
        dict(
            args=[
                {
                    "x": [df[metric]],
                    "xaxis.title.text": metric.replace("_", " ").title(),
                }
            ],
            label=metric.replace("_", " ").title(),
            method="update",
        )
    )

# Create dropdown buttons for y-axis
y_buttons = []
for metric in available_metrics:
    y_buttons.append(
        dict(
            args=[
                {
                    "y": [df[metric]],
                    "yaxis.title.text": metric.replace("_", " ").title(),
                }
            ],
            label=metric.replace("_", " ").title(),
            method="update",
        )
    )

# Create dropdown buttons for color
color_buttons = []
for metric in available_metrics:
    color_buttons.append(
        dict(
            args=[{"marker.color": [df[metric]]}],
            label=metric.replace("_", " ").title(),
            method="update",
        )
    )

# Add dropdown menus
fig.update_layout(
    updatemenus=[
        dict(
            buttons=x_buttons,
            direction="down",
            showactive=True,
            x=0.05,
            xanchor="left",
            y=1.15,
            yanchor="top",
            bgcolor="lightgray",
            bordercolor="black",
            borderwidth=1,
        ),
        dict(
            buttons=y_buttons,
            direction="down",
            showactive=True,
            x=0.35,
            xanchor="left",
            y=1.15,
            yanchor="top",
            bgcolor="lightgray",
            bordercolor="black",
            borderwidth=1,
        ),
        dict(
            buttons=color_buttons,
            direction="down",
            showactive=True,
            x=0.65,
            xanchor="left",
            y=1.15,
            yanchor="top",
            bgcolor="lightgray",
            bordercolor="black",
            borderwidth=1,
        ),
    ],
    annotations=[
        dict(
            text="X-Axis:",
            x=0.02,
            y=1.13,
            yref="paper",
            xref="paper",
            showarrow=False,
            font=dict(size=12),
        ),
        dict(
            text="Y-Axis:",
            x=0.32,
            y=1.13,
            yref="paper",
            xref="paper",
            showarrow=False,
            font=dict(size=12),
        ),
        dict(
            text="Color:",
            x=0.62,
            y=1.13,
            yref="paper",
            xref="paper",
            showarrow=False,
            font=dict(size=12),
        ),
    ],
)

fig.update_layout(height=700, width=1200)
fig.show()

print("\n🎨 Interactive Explorer:")
print(f"Available metrics: {len(available_metrics)}")
print(f"Total activities: {len(df)}")
print(
    "\nUse the dropdown menus above to explore relationships between different metrics!"
)
print("All metrics use time-weighted averaging for improved accuracy.")

## Time-Weighted Averaging Benefits

All metrics in this dashboard use **time-weighted averaging** instead of simple arithmetic means. This provides several key benefits:

### What is Time-Weighted Averaging?

Instead of treating each data point equally (simple mean):
```
Average = Σ(values) / count
```

Time-weighted averaging accounts for the time interval each value represents:
```
Average = Σ(value × time_delta) / Σ(time_delta)
```

### Why It Matters

1. **Recording Gaps**: Auto-pause or GPS loss creates gaps - time-weighting handles these correctly
2. **Variable Sampling**: If recording rate changes, values are weighted by their actual duration
3. **Accurate Duration**: TSS/hrTSS calculations use actual elapsed time, not data point count
4. **Better Metrics**: More accurate power, HR, efficiency factor, and all derived metrics

### Impact on Your Data

- **Power Metrics**: Average power, normalized power, and TSS are now more accurate for activities with variable recording
- **Heart Rate**: Average HR and hrTSS properly weighted for time spent at each heart rate
- **Efficiency**: Efficiency Factor and decoupling calculations use time-weighted power and HR
- **Zones**: Time in zones calculated based on actual duration, not sample count

This is especially important for long rides with auto-pause, activities with GPS issues, or any workout with variable recording quality!