# WiDS Datathon 2026 - TRACK 1
Predicting Delayed Evacuation Alerts for Equitable Emergency Response

### Problem Definition

Wildfires pose a rapidly evolving threat to communities, where timely evacuation alerts can be the difference between safety and severe harm.  
However, evacuation alerts are not always issued promptly after a wildfire event is first reported. Delays in alert issuance may disproportionately affect vulnerable populations and reduce the effectiveness of emergency response.

### Decision Context

Emergency managers must decide **when** to issue evacuation alerts under uncertainty, often based on incomplete or evolving information.  
This project aims to support that decision-making process by identifying situations where an evacuation alert is likely to be **issued late**, allowing authorities to intervene earlier and reduce risk.

Rather than predicting wildfire severity, our focus is on **alert timeliness** — a critical but underexplored operational dimension of emergency management.


### Selected Track: Accelerating Equitable Evacuations

This project follows **Track 1: Accelerating Equitable Evacuations**.

Our objective is to analyze and model delays in evacuation alerts following wildfire event reports, with the goal of:
- identifying patterns associated with delayed alerts,
- quantifying alert lag in a data-driven way,
- and proposing a predictive framework that can flag high-risk situations early.

The ultimate goal is not only predictive performance, but actionable insight for emergency response planning.


### Data Sources

We use the official **WiDS Datathon WatchDuty wildfire dataset**, which integrates wildfire events, evacuation zones, and detailed change logs.

The key data sources used in this project are:

- **Geo Events (`geo_events_geoevent.csv`)**  
  Records wildfire-related events and incidents.

- **Geo Event Change Log (`geo_events_geoeventchangelog.csv`)**  
  Tracks updates and modifications to wildfire events over time.  
  This dataset is critical for identifying the earliest moment an event was reported.

- **Evacuation Zones (`evac_zones_gis_evaczone.csv`)**  
  Contains evacuation zone declarations, including the timestamp at which alerts become effective.

- **Evacuation Zone – Geo Event Mapping (`evac_zone_status_geo_event_map.csv`)**  
  Links wildfire events to affected evacuation zones.

These datasets allow us to reconstruct the timeline between the first report of a wildfire event and the issuance of evacuation alerts.


### Analytical Unit

The primary unit of analysis in this project is a **(wildfire event × evacuation zone)** pair.

This choice reflects how evacuation decisions are made in practice:
- a single wildfire event may affect multiple evacuation zones,
- and each zone may receive alerts at different times.

### Target Variable: Delayed Evacuation Alert

To support decision-making, we define a binary target variable:

**Late Evacuation Alert**

An alert is considered *late* if there is a substantial delay between:
- the first recorded report of a wildfire event, and
- the time at which an evacuation alert becomes effective for a given zone.

This formulation allows us to frame the problem as a classification task, where the goal is to identify high-risk situations for delayed alerts.


### Methodological Approach

Our analysis proceeds in the following steps:

1. Identify the earliest reported timestamp for each wildfire event using change logs.
2. Extract evacuation alert effective times for each evacuation zone.
3. Compute the time delay ("alert lag") between event reporting and alert issuance.
4. Define a data-driven threshold to label alerts as delayed or on-time.
5. Explore patterns and drivers of delayed alerts through exploratory data analysis.
6. Build an interpretable classification model to flag high-risk cases.
7. Translate model outputs into a decision-support framework.

This structured approach ensures transparency, reproducibility, and relevance for real-world use.


### Technical Setup

All analyses are conducted in Google Colab to ensure reproducibility and alignment with WiDS Datathon guidelines.

Data files are downloaded programmatically using the Kaggle API and are not stored in this repository.  
This ensures that the notebook can be rerun end-to-end by reviewers without manual data handling.


In [1]:
from google.colab import files
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"leonardomarcu","key":"90e4f9e2c12bb3de5825ba1691c7ad2d"}'}

In [3]:
# ===============================
# 1) Kaggle download (Colab)
# ===============================
!pip -q install kaggle

import os, zipfile, glob

# If you already uploaded kaggle.json earlier, keep it in the current folder.
assert os.path.exists("kaggle.json"), "Upload kaggle.json in the Colab session first."

!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

# Download competition data
COMP = "wids-university-datathon-2025"   # <-- if this errors, we will adjust the slug
!kaggle competitions download -c {COMP}

# Unzip into /content/data
!mkdir -p data
zip_path = f"{COMP}.zip"
assert os.path.exists(zip_path), f"Expected {zip_path} after download."
!unzip -o {zip_path} -d data


Downloading wids-university-datathon-2025.zip to /content
 92% 372M/403M [00:00<00:00, 658MB/s] 
100% 403M/403M [00:00<00:00, 638MB/s]
Archive:  wids-university-datathon-2025.zip
  inflating: data/WiDS _-_ Watch Duty_ Data Dictionary.docx  
  inflating: data/evac_zone_status_geo_event_map.csv  
  inflating: data/evac_zones_gis_evaczone.csv  
  inflating: data/evac_zones_gis_evaczonechangelog.csv  
  inflating: data/fire_perimeters_gis_fireperimeter.csv  
  inflating: data/fire_perimeters_gis_fireperimeterchangelog.csv  
  inflating: data/geo_events_externalgeoevent.csv  
  inflating: data/geo_events_externalgeoeventchangelog.csv  
  inflating: data/geo_events_geoevent.csv  
  inflating: data/geo_events_geoeventchangelog.csv  


In [4]:
import os, glob
print("Top-level in data/:", os.listdir("data"))

all_files = sorted(glob.glob("data/**/*", recursive=True))
print("Num files:", len(all_files))
for f in all_files[:120]:
    print(f)


Top-level in data/: ['geo_events_geoevent.csv', 'evac_zones_gis_evaczonechangelog.csv', 'geo_events_geoeventchangelog.csv', 'fire_perimeters_gis_fireperimeterchangelog.csv', 'WiDS _-_ Watch Duty_ Data Dictionary.docx', 'fire_perimeters_gis_fireperimeter.csv', 'geo_events_externalgeoeventchangelog.csv', 'evac_zone_status_geo_event_map.csv', 'evac_zones_gis_evaczone.csv', 'geo_events_externalgeoevent.csv']
Num files: 10
data/WiDS _-_ Watch Duty_ Data Dictionary.docx
data/evac_zone_status_geo_event_map.csv
data/evac_zones_gis_evaczone.csv
data/evac_zones_gis_evaczonechangelog.csv
data/fire_perimeters_gis_fireperimeter.csv
data/fire_perimeters_gis_fireperimeterchangelog.csv
data/geo_events_externalgeoevent.csv
data/geo_events_externalgeoeventchangelog.csv
data/geo_events_geoevent.csv
data/geo_events_geoeventchangelog.csv
