# ECE5831 - Final Project: Multi-Task Rider Behavior Modeling for Micromobility Systems

# Phase 1: Data preparation

This phase downloads/loads the raw Divvy/Cyclistic trip data and applies core cleaning rules to produce a consistent, analysis-ready dataset. We validate schema and datatypes, remove duplicates and invalid trips (e.g., missing timestamps/stations, negative/zero durations), and standardize time fields so downstream steps use a reliable foundation.

In [8]:
!python phase1_data_preparation.py

[INFO] Ensuring clean and parsed datetime columns are consistent...
[INFO] Creating clean column 'started_at_clean' from 'started_at'...
[INFO] Creating parsed column 'started_at_parsed' from 'started_at_clean'...
[INFO] Column 'started_at_parsed': missing before=4,755,050, after=0 (fixed 4,755,050)
[INFO] Creating clean column 'ended_at_clean' from 'ended_at'...
[INFO] Creating parsed column 'ended_at_parsed' from 'ended_at_clean'...
[INFO] Column 'ended_at_parsed': missing before=4,755,050, after=0 (fixed 4,755,050)
[INFO] Validating columns and dtypes...
[CHECK] Null count in 'started_at_parsed': 0
[CHECK] Null count in 'ended_at_parsed': 0
[INFO] Adding duration features based on started_at_parsed / ended_at_parsed...
[INFO] Filtering trips with invalid durations (negative or > 24h)...
[INFO] Removing 7,041 trips with invalid durations.
[INFO] Running Phase 1 summary checks...

PHASE 1 SUMMARY: DATASET OVERVIEW

[1] SHAPE
  Rows   : 5,772,527
  Columns: 19

[2] DATE RANGE (started_

# Phase 2: Feature engineering

This phase derives modeling features from the cleaned trips, including time-based signals (year/month/day/hour/weekday, weekend flags), trip structure features (e.g., roundtrip), and station/location attributes used by models. We then perform a strict temporal split (Train → Validation → Test).

In [9]:
!python phase2_feature_engineering.py

[INFO] Loading Phase 1 dataset from: data/processed/full_bike_dataset_phase1.parquet
[INFO] Validating Phase 1 input schema...
[INFO] Filtering trips with invalid durations (negative or > 24h)...
[INFO] Found 0 trips with invalid durations.
[INFO] No invalid durations found; no rows removed.
[INFO] Adding time-based features...
[INFO] Adding behavioral features (is_roundtrip, log-duration)...
[INFO] Running Phase 2 summary checks...

PHASE 2 SUMMARY: FEATURE ENGINEERING OVERVIEW

[1] SHAPE
  Rows   : 5,772,527
  Columns: 29

[2] DATE RANGE (started_at_parsed)
  Min: 2024-04-01 00:00:42
  Max: 2025-03-31 23:50:16.157000

[3] TRIP DURATION SUMMARY AFTER FILTERING (minutes)
count    5.772527e+06
mean     1.527964e+01
std      2.990354e+01
min      0.000000e+00
1%       2.954000e-01
5%       2.250600e+00
50%      9.651567e+00
95%      4.205955e+01
99%      9.407585e+01
max      1.439935e+03
Name: trip_duration_min, dtype: float64

    Sanity checks (post-filter):
      Negative durations (

# Phase 3: Build datasets and artifacts

This phase converts engineered features into model-ready tensors/tables. We build and save station vocabularies/ID mappings, apply normalization/scaling to numerical features, and generate final datasets for both the baselines and HVAE training. Outputs are saved as reusable artifacts so experiments are reproducible and consistent across runs.

In [12]:
!python phase3_build_datasets.py

[INFO] Loading Phase 2 dataset from: data/processed/full_bike_dataset_phase2.parquet
[INFO] Validating Phase 2 schema...
[INFO] Computing demand contribution target (start_station_day_share)...
[INFO] Encoding categorical columns...
[INFO] Encoding categorical column 'start_station_id'...
[INFO] Encoding categorical column 'end_station_id'...
[INFO] Encoding categorical column 'rideable_type'...
[INFO] Encoding categorical column 'member_casual'...
[INFO] Adding integer versions of boolean flags...
[INFO] Performing time-based train/val/test split...
[INFO] Time-based split sizes:
  Train: 5,322,818
  Val  : 151,832
  Test : 297,877
[INFO] Imputing missing numeric features and targets using train means...
[INFO] Imputation mean for 'trip_duration_min': 15.564427
[INFO] Imputation mean for 'trip_duration_min_log1p': 2.415799
[INFO] Imputation mean for 'start_lat': 41.902416
[INFO] Imputation mean for 'start_lng': -87.646130
[INFO] Imputation mean for 'end_lat': 41.902805
[INFO] Imputati

# Phase 4–5: HVAE definition & training

Phase 4 trains and evaluates independent single-task baselines for each target (duration, demand contribution, rideable type). These benchmarks quantify what can be achieved without shared representations and provide a fair comparison point for measuring the benefit of multi-task learning.


Phase 5 trains the proposed HVAE with a shared latent structure to jointly model multiple outcomes. We report test performance on all tasks using the same feature set and the same temporal test split, then compare against baselines to quantify multi-task gains.


In [6]:
!python phase5_train_hvae.py --data-dir data/model_ready --checkpoint-dir checkpoints/hvae_v2 --epochs 10 --batch-size 2048 --lr 5e-4

[INFO] Using device: cuda
[INFO] Loading Phase 3 artifacts from: data/model_ready\phase3_artifacts.pkl
[INFO] Category sizes:
  num_start_stations: 1807
  num_end_stations  : 1802
  num_ride_types    : 4
  num_member_types  : 3
  num_numeric_features: 10
[INFO] Building datasets...
[INFO] Building DataLoaders...
HierarchicalVAE(
  (emb_start_station): Embedding(1807, 32)
  (emb_end_station): Embedding(1802, 32)
  (emb_member): Embedding(3, 4)
  (encoder_mlp): Sequential(
    (0): Linear(in_features=78, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
  )
  (fc_mu_global): Linear(in_features=256, out_features=16, bias=True)
  (fc_logvar_global): Linear(in_features=256, out_features=16, bias=True)
  (fc_individual_prep): Linear(in_features=272, out_features=256, bias=True)
  (fc_mu_individual): Linear(in_features=256, out_features=16, bias=True)
  (fc_logvar_individual): Linear(in_features=256, out_features=16, bia

  ckpt = torch.load(best_model_path, map_location=device)


In [None]:
!python phase5_train_hvae_2.py  --data-dir data/model_ready  --checkpoint-dir checkpoints/hvae_v3  --batch-size 4096  --lr 5e-4

[INFO] Using device: cuda
[INFO] Loading Phase 3 artifacts from: data/model_ready/phase3_artifacts.pkl
[INFO] Category sizes:
  num_start_stations: 1807
  num_end_stations  : 1802
  num_ride_types    : 4
  num_member_types  : 3
  numeric_feature_cols (raw): 10
[INFO] Train n=5,322,818  Val n=151,832  Test n=297,877
[WARN] Leakage guard: OFF. You may be leaking targets via x_num.
[INFO] Using 10 numeric cols for x_num.
[INFO] Using 10 numeric cols for x_num.
[INFO] Using 10 numeric cols for x_num.
HierarchicalVAE(
  (emb_start_station): Embedding(1807, 32)
  (emb_member): Embedding(3, 4)
  (encoder_mlp): Sequential(
    (0): Linear(in_features=46, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
  )
  (fc_mu_global): Linear(in_features=256, out_features=16, bias=True)
  (fc_logvar_global): Linear(in_features=256, out_features=16, bias=True)
  (fc_individual_prep): Linear(in_features=272, out_features=256, bias=Tru

# Phase 6: Anomaly scoring

This phase uses reconstruction-based anomaly scores to flag unusual trips without supervision. We analyze the score distribution, select operational percentile thresholds (e.g., p95/p99), and qualitatively inspect high-scoring trips to categorize anomaly types (temporal, behavioral, spatial, and vehicle mismatch).

In [8]:
!python phase6_anomaly_scoring.py --data-dir data/model_ready --checkpoint-path checkpoints/hvae_v2/best_model.pt --split test --output-dir data/anomaly_scores --batch-size 4096 --device cuda


[INFO] Using device: cuda
[INFO] Loading Phase 3 artifacts from: data/model_ready\phase3_artifacts.pkl
[INFO] Category sizes for model reconstruction:
  num_start_stations: 1807
  num_end_stations  : 1802
  num_ride_types    : 4
  num_member_types  : 3
  num_numeric_features: 10
[INFO] Loading model checkpoint from: checkpoints/hvae_v2/best_model.pt
[INFO] Building anomaly dataset for split='test'...
[INFO] Computing anomaly scores on 297,877 samples...
  [INFO] Processed 204,800 samples...

ANOMALY SCORE SUMMARY (test split)
count    297877.000000
mean         -1.039350
std          21.731602
min          -5.626651
50%          -1.601471
90%          -0.902468
95%          -0.273548
99%           3.682961
max        8967.341797
Name: anomaly_score, dtype: float64

[TOP 10 MOST ANOMALOUS TRIPS]
                 ride_id       started_at_parsed  ... anomaly_score anomaly_rank
171004  7E66493C044A73FE 2025-03-24 23:23:26.152  ...   8967.341797            1
76658   6B562428E92AB639 2025-03

  ckpt = torch.load(checkpoint_path, map_location=device)


# Phase 7: Latent Space Analysis (Intent Discovery)

This phase evaluates interpretability of the learned global latent space by extracting latent codes and applying clustering (e.g., k-means). We summarize cluster-level behavior (duration, casual/member share, weekend share, roundtrips, peak hour patterns) to identify “intent-like” modes such as commute, leisure, and tourism.


In [1]:
!python phase7_latent_analysis.py --data-dir data/model_ready --anomaly-path data/anomaly_scores/anomaly_scores_test.parquet --checkpoint-path checkpoints/hvae_v2/best_model.pt --split test --output-dir data/latent_analysis --num-clusters 8 --batch-size 4096 --device cuda


[INFO] Using device: cuda
[INFO] Loading Phase 3 artifacts from: data/model_ready\phase3_artifacts.pkl
[INFO] Category sizes for model reconstruction:
  num_start_stations: 1807
  num_end_stations  : 1802
  num_ride_types    : 4
  num_member_types  : 3
  num_numeric_features: 10
[INFO] Loading model checkpoint from: checkpoints/hvae_v2/best_model.pt
[INFO] Building latent dataset from: data/anomaly_scores/anomaly_scores_test.parquet
[INFO] Extracting latents for 297,877 samples...
  [INFO] Processed 204,800 samples...
[INFO] Latent shapes: z_global=(297877, 16), z_individual=(297877, 16)
[INFO] Running k-means with K=8 on z_global...

CLUSTER SUMMARY (split=test, K=8)

[Cluster sizes]
intent_cluster
0    61219
1    40764
2    43541
3    22789
4    55084
5     9656
6     3696
7    61128
Name: count, dtype: int64

[Cluster: duration & anomaly stats]
                n_trips  mean_duration  median_duration  mean_anom    p95_anom
intent_cluster                                               

  ckpt = torch.load(checkpoint_path, map_location=device)


## Phase 8: Case studies

Phase 8 runs case studies, this is typically where we take representative trips or clusters and explain model behavior in a more concrete way, including why certain trips appear unusual.

In [2]:
!python phase8_case_studies.py --input-path data/latent_analysis/latent_analysis_test.parquet --split test --output-dir data/case_studies --top-k-global 200 --top-k-per-cluster 50 --p95-thresh 0.95 --p99-thresh 0.99


[INFO] Loading latent+anomaly data from: data/latent_analysis/latent_analysis_test.parquet
[INFO] Selecting global top-200 anomalies...
[INFO] Saved global top-200 anomalies to:
       data/case_studies\global_top200_anomalies_test.parquet
       data/case_studies\global_top200_anomalies_test.csv
[INFO] Selecting top-50 anomalies per cluster...
[INFO] Saved per-cluster top-50 anomalies to:
       data/case_studies\cluster_top50_anomalies_test.parquet
       data/case_studies\cluster_top50_anomalies_test.csv
[INFO] Computing threshold-based anomaly flags...
  [STATS] p95 threshold (top 5%): -0.2735
          Count >= p95: 14894 (5.00 % of trips)
  [STATS] p99 threshold (top 1%): 3.6830
          Count >= p99: 2979 (1.00 % of trips)
[INFO] Saved latent+flags dataframe to: data/case_studies\latent_with_flags_test.parquet

[Cluster-level anomaly counts (p99 flag)]
                n_anom_p99  n_total  pct_anom_p99
intent_cluster                                   
0                        2 

In [3]:
!python phase9_plots_and_results.py

[INFO] Using device: cuda
[INFO] Loading Phase 3 artifacts from: data/model_ready\phase3_artifacts.pkl
[INFO] Loading model checkpoint from: checkpoints/hvae_v2/best_model.pt

=== Rideable-type classification (test) ===

[RESULT] Test rideable_type accuracy: 84.70%

[CONFUSION MATRIX (raw counts)]
[[     0      0      0      0]
 [     0  94324  16427      0]
 [     0  29132 157992      2]
 [     0      0      0      0]]

[CLASSIFICATION REPORT]
                  precision    recall  f1-score   support

           <UNK>       0.00      0.00      0.00         0
    classic_bike       0.76      0.85      0.81    110751
   electric_bike       0.91      0.84      0.87    187126
electric_scooter       0.00      0.00      0.00         0

        accuracy                           0.85    297877
       macro avg       0.42      0.42      0.42    297877
    weighted avg       0.85      0.85      0.85    297877

[INFO] Saved confusion matrix figure to: figures\fig_rideable_confusion_test.png

==

  ckpt = torch.load(checkpoint_path, map_location=device)
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
