# HURDAT2 to ML Features Workflow

Transform HURDAT2 Atlantic hurricane data into census tract-level features using the **Max Distance Envelope Approach**.

**Input**: Raw HURDAT2 text file  
**Output**: CSV where each row = one storm's impact on one census tract  
**Key Innovation**: Envelope polygon method for efficient wind field modeling

---

## Notebook Structure (7 Sections, 22 Cells)

1. **Data Acquisition & Basic Parsing** (Cells 1-3)
2. **Data Profiling & Understanding** (Cells 4-6) 
3. **Single Storm Envelope (Hurricane Ida Test)** (Cells 7-10)
4. **Census Tract Integration** (Cells 11-13)
5. **Wind Speed Calculations** (Cells 14-16)
6. **Scale to Multiple Storms** (Cells 17-19)
7. **Export & Validation** (Cells 20-22)

---

## Section 1: Data Acquisition & Basic Parsing

Parse raw HURDAT2 format → clean DataFrame

In [4]:
# Cell 1: Download HURDAT2 data
import os
import requests
import pandas as pd
import numpy as np
from pathlib import Path

# Set up paths
base_dir = Path("..").resolve()
input_dir = base_dir / "input_data"
output_dir = base_dir / "outputs"

# Create directories if they don't exist
input_dir.mkdir(exist_ok=True)
output_dir.mkdir(exist_ok=True)

# Download HURDAT2 Atlantic data if not present
# Alternative: use raw GitHub source or archive.org mirror
hurdat_url = "https://www.nhc.noaa.gov/data/hurdat/hurdat2-1851-2024-040425.txt"
hurdat_file = input_dir / "hurdat2-atlantic.txt"

if not hurdat_file.exists():
    print("Downloading HURDAT2 Atlantic data...")
    
    # Try with headers to mimic browser request
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }
    
    response = requests.get(hurdat_url, headers=headers)
    response.raise_for_status()
    
    # Check if we got HTML instead of text data
    if response.text.strip().startswith('<'):
        print("ERROR: Got HTML page instead of data file")
        print("Manual download required from: https://www.nhc.noaa.gov/data/hurdat/")
        print("Please download hurdat2-1851-2024-040425.txt manually")
    else:
        with open(hurdat_file, 'w') as f:
            f.write(response.text)
        print(f"Downloaded to {hurdat_file}")
else:
    print(f"HURDAT2 file already exists: {hurdat_file}")

if hurdat_file.exists():
    print(f"File size: {hurdat_file.stat().st_size:,} bytes")

Downloading HURDAT2 Atlantic data...
Downloaded to /Users/Michael/hurricane-data-etl/hurdat2/input_data/hurdat2-atlantic.txt
File size: 7,034,638 bytes
