# Fetch Historical Weather Data for Pullman, WA

**Purpose:** Download historical weather data from 2020-01-01 to 2025-10-31 for WSU Pullman campus.

**Data Source:** Open-Meteo Historical Weather API (free, no API key required)

**Location:** Pullman, WA (46.7298°N, -117.1817°W)

**Weather Variables:**
- Temperature (max, min, mean)
- Precipitation (total, hours)
- Snow depth
- Wind speed (max)
- Weather code (rain, snow, clear, etc.)

In [18]:
import pandas as pd
import numpy as np
import requests
from datetime import datetime, timedelta
import time

pd.set_option('display.max_columns', None)
print("Libraries loaded successfully")

Libraries loaded successfully


## Define Location and Date Range

In [19]:
# Pullman, WA coordinates (WSU campus)
LATITUDE = 46.7298
LONGITUDE = -117.1817

# Date range for our parking data
START_DATE = "2020-01-01"
END_DATE = "2025-10-31"

print(f"Fetching weather data for:")
print(f"  Location: Pullman, WA ({LATITUDE}°N, {LONGITUDE}°W)")
print(f"  Date Range: {START_DATE} to {END_DATE}")
print(f"  Total Days: {(pd.to_datetime(END_DATE) - pd.to_datetime(START_DATE)).days + 1}")

Fetching weather data for:
  Location: Pullman, WA (46.7298°N, -117.1817°W)
  Date Range: 2020-01-01 to 2025-10-31
  Total Days: 2131


## Fetch Weather Data from Open-Meteo API

Open-Meteo provides free historical weather data with daily and hourly granularity.

We'll fetch:
- Daily data: temperature, precipitation, snow, wind
- Weather codes to categorize conditions (clear, rain, snow, etc.)

In [20]:
def fetch_hourly_weather_data(lat, lon, start_date, end_date, year_label):
    """
    Fetch HOURLY historical weather data from Open-Meteo API.
    
    API limits: 10,000 requests/day, no API key needed for moderate use
    """
    # Open-Meteo Historical Weather API endpoint
    url = "https://archive-api.open-meteo.com/v1/archive"
    
    # Parameters for HOURLY data
    params = {
        "latitude": lat,
        "longitude": lon,
        "start_date": start_date,
        "end_date": end_date,
        "hourly": [
            "temperature_2m",
            "precipitation",
            "snowfall",
            "snow_depth",
            "windspeed_10m",
            "weather_code"
        ],
        "temperature_unit": "fahrenheit",
        "windspeed_unit": "mph",
        "precipitation_unit": "inch",
        "timezone": "America/Los_Angeles"  # PST/PDT (same as Pullman)
    }
    
    print(f"\n{'='*70}")
    print(f"Fetching HOURLY weather data for {year_label}...")
    print(f"{'='*70}")
    print(f"URL: {url}")
    print(f"Date range: {start_date} to {end_date}")
    
    response = requests.get(url, params=params)
    
    if response.status_code == 200:
        print(f"✓ Success! Received {len(response.content):,} bytes")
        return response.json()
    else:
        print(f"✗ Error: HTTP {response.status_code}")
        print(f"Response: {response.text}")
        return None

# Fetch data by year to create separate sheets
years = [
    (2020, "2020-01-01", "2020-12-31"),
    (2021, "2021-01-01", "2021-12-31"),
    (2022, "2022-01-01", "2022-12-31"),
    (2023, "2023-01-01", "2023-12-31"),
    (2024, "2024-01-01", "2024-12-31"),
    (2025, "2025-01-01", "2025-10-31")  # Partial year
]

all_weather_data = {}

for year, start, end in years:
    weather_json = fetch_hourly_weather_data(LATITUDE, LONGITUDE, start, end, str(year))
    if weather_json:
        all_weather_data[year] = weather_json
        time.sleep(1)  # Be polite to the API
    else:
        print(f"⚠️  Failed to fetch data for {year}")

print(f"\n{'='*70}")
print(f"FETCHED DATA FOR {len(all_weather_data)} YEARS")
print(f"{'='*70}")



Fetching HOURLY weather data for 2020...
URL: https://archive-api.open-meteo.com/v1/archive
Date range: 2020-01-01 to 2020-12-31
✓ Success! Received 426,275 bytes

Fetching HOURLY weather data for 2021...
URL: https://archive-api.open-meteo.com/v1/archive
Date range: 2021-01-01 to 2021-12-31
✓ Success! Received 424,946 bytes

Fetching HOURLY weather data for 2022...
URL: https://archive-api.open-meteo.com/v1/archive
Date range: 2022-01-01 to 2022-12-31
✓ Success! Received 424,771 bytes

Fetching HOURLY weather data for 2023...
URL: https://archive-api.open-meteo.com/v1/archive
Date range: 2023-01-01 to 2023-12-31
✓ Success! Received 424,284 bytes

Fetching HOURLY weather data for 2024...
URL: https://archive-api.open-meteo.com/v1/archive
Date range: 2024-01-01 to 2024-12-31
✓ Success! Received 425,619 bytes

Fetching HOURLY weather data for 2025...
URL: https://archive-api.open-meteo.com/v1/archive
Date range: 2025-01-01 to 2025-10-31
✓ Success! Received 352,706 bytes

FETCHED DATA FO

## Parse Weather Data into DataFrame

In [21]:
if all_weather_data:
    # Parse HOURLY data for each year
    weather_by_year = {}
    
    for year, weather_json in all_weather_data.items():
        if 'hourly' in weather_json:
            hourly = weather_json['hourly']
            
            # Create DataFrame with HOURLY granularity
            df = pd.DataFrame({
                'datetime': pd.to_datetime(hourly['time']),
                'temperature_f': hourly['temperature_2m'],
                'precipitation_inches': hourly['precipitation'],
                'snowfall_inches': hourly['snowfall'],
                'snow_depth_inches': hourly['snow_depth'],
                'wind_mph': hourly['windspeed_10m'],
                'weather_code': hourly['weather_code']
            })
            
            # Extract date and hour
            df['date'] = df['datetime'].dt.date
            df['hour'] = df['datetime'].dt.hour
            df['year'] = year
            
            weather_by_year[year] = df
            
            print(f"\n{year}: {len(df):,} hourly records")
            print(f"  Date range: {df['datetime'].min()} to {df['datetime'].max()}")
    
    # Combine all years
    weather_df = pd.concat(weather_by_year.values(), ignore_index=True)
    
    print(f"\n{'='*70}")
    print(f"COMBINED HOURLY WEATHER DATA")
    print(f"{'='*70}")
    print(f"Total hours: {len(weather_df):,}")
    print(f"Date range: {weather_df['datetime'].min()} to {weather_df['datetime'].max()}")
    print(f"\nFirst few rows:")
    print(weather_df.head(10))
    
    print(f"\nData summary:")
    print(weather_df[['temperature_f', 'precipitation_inches', 'wind_mph']].describe())
else:
    print("No weather data fetched. Check API responses above.")



2020: 8,784 hourly records
  Date range: 2020-01-01 00:00:00 to 2020-12-31 23:00:00

2021: 8,760 hourly records
  Date range: 2021-01-01 00:00:00 to 2021-12-31 23:00:00

2022: 8,760 hourly records
  Date range: 2022-01-01 00:00:00 to 2022-12-31 23:00:00

2023: 8,760 hourly records
  Date range: 2023-01-01 00:00:00 to 2023-12-31 23:00:00

2024: 8,784 hourly records
  Date range: 2024-01-01 00:00:00 to 2024-12-31 23:00:00

2025: 7,296 hourly records
  Date range: 2025-01-01 00:00:00 to 2025-10-31 23:00:00

COMBINED HOURLY WEATHER DATA
Total hours: 51,144
Date range: 2020-01-01 00:00:00 to 2025-10-31 23:00:00

First few rows:
             datetime  temperature_f  precipitation_inches  snowfall_inches  \
0 2020-01-01 00:00:00           45.1                 0.016              0.0   
1 2020-01-01 01:00:00           43.6                 0.047              0.0   
2 2020-01-01 02:00:00           43.0                 0.000              0.0   
3 2020-01-01 03:00:00           42.6                

## Interpret Weather Codes

WMO Weather Codes (from Open-Meteo):
- 0: Clear sky
- 1, 2, 3: Mainly clear, partly cloudy, overcast
- 45, 48: Fog
- 51, 53, 55: Drizzle (light, moderate, dense)
- 61, 63, 65: Rain (slight, moderate, heavy)
- 71, 73, 75: Snow (slight, moderate, heavy)
- 77: Snow grains
- 80, 81, 82: Rain showers (slight, moderate, violent)
- 85, 86: Snow showers (slight, heavy)
- 95: Thunderstorm
- 96, 99: Thunderstorm with hail

In [22]:
def categorize_weather(code):
    """
    Convert WMO weather code to human-readable category.
    """
    if pd.isna(code):
        return 'Unknown'
    
    code = int(code)
    
    if code == 0:
        return 'Clear'
    elif code in [1, 2, 3]:
        return 'Cloudy'
    elif code in [45, 48]:
        return 'Fog'
    elif code in [51, 53, 55, 56, 57]:
        return 'Drizzle'
    elif code in [61, 63, 65, 66, 67, 80, 81, 82]:
        return 'Rain'
    elif code in [71, 73, 75, 77, 85, 86]:
        return 'Snow'
    elif code in [95, 96, 99]:
        return 'Thunderstorm'
    else:
        return 'Other'

# Add weather category
weather_df['weather_category'] = weather_df['weather_code'].apply(categorize_weather)

print("Weather category distribution:")
print(weather_df['weather_category'].value_counts())
print(f"\nWeather categories added!")

Weather category distribution:
weather_category
Cloudy     25945
Clear      18300
Drizzle     4185
Snow        2336
Rain         378
Name: count, dtype: int64

Weather categories added!


## Add Derived Weather Features

In [23]:
# Binary indicators for weather conditions (adjusted for HOURLY data)
weather_df['is_rainy'] = (weather_df['precipitation_inches'] > 0.05).astype(int)  # Any measurable rain
weather_df['is_snowy'] = (weather_df['snowfall_inches'] > 0.05).astype(int)  # Any measurable snow
weather_df['is_cold'] = (weather_df['temperature_f'] < 32).astype(int)  # Freezing
weather_df['is_hot'] = (weather_df['temperature_f'] > 80).astype(int)
weather_df['is_windy'] = (weather_df['wind_mph'] > 20).astype(int)

# Severe weather indicator (heavy rain, snow, or wind - adjusted for hourly)
weather_df['is_severe'] = (
    (weather_df['precipitation_inches'] > 0.1) |  # Heavy hourly precipitation
    (weather_df['snowfall_inches'] > 0.5) |  # Heavy hourly snowfall
    (weather_df['wind_mph'] > 25)
).astype(int)

print("Derived weather features added:")
print(f"  Rainy hours: {weather_df['is_rainy'].sum():,} ({weather_df['is_rainy'].mean()*100:.1f}%)")
print(f"  Snowy hours: {weather_df['is_snowy'].sum():,} ({weather_df['is_snowy'].mean()*100:.1f}%)")
print(f"  Cold hours (<32°F): {weather_df['is_cold'].sum():,} ({weather_df['is_cold'].mean()*100:.1f}%)")
print(f"  Hot hours (>80°F): {weather_df['is_hot'].sum():,} ({weather_df['is_hot'].mean()*100:.1f}%)")
print(f"  Windy hours (>20mph): {weather_df['is_windy'].sum():,} ({weather_df['is_windy'].mean()*100:.1f}%)")
print(f"  Severe weather hours: {weather_df['is_severe'].sum():,} ({weather_df['is_severe'].mean()*100:.1f}%)")


Derived weather features added:
  Rainy hours: 557 (1.1%)
  Snowy hours: 1,466 (2.9%)
  Cold hours (<32°F): 6,835 (13.4%)
  Hot hours (>80°F): 3,238 (6.3%)
  Windy hours (>20mph): 542 (1.1%)
  Severe weather hours: 167 (0.3%)


## Check for Missing Data

In [24]:
print("Missing data check:")
print(weather_df.isnull().sum())

# Check date coverage
expected_days = (pd.to_datetime(END_DATE) - pd.to_datetime(START_DATE)).days + 1
actual_days = len(weather_df)

print(f"\nDate coverage:")
print(f"  Expected days: {expected_days}")
print(f"  Actual days: {actual_days}")
print(f"  Coverage: {actual_days/expected_days*100:.1f}%")

if actual_days < expected_days:
    print(f"\n⚠️  WARNING: Missing {expected_days - actual_days} days of weather data")
else:
    print(f"\n✓ Complete weather data coverage!")

Missing data check:
datetime                0
temperature_f           0
precipitation_inches    0
snowfall_inches         0
snow_depth_inches       0
wind_mph                0
weather_code            0
date                    0
hour                    0
year                    0
weather_category        0
is_rainy                0
is_snowy                0
is_cold                 0
is_hot                  0
is_windy                0
is_severe               0
dtype: int64

Date coverage:
  Expected days: 2131
  Actual days: 51144
  Coverage: 2400.0%

✓ Complete weather data coverage!


## Save Weather Data

In [25]:
# Save combined data to main CSV
weather_df.to_csv('../data/weather_pullman_hourly_2020_2025.csv', index=False)

print(f"\n{'='*70}")
print(f"MAIN FILE SAVED")
print(f"{'='*70}")
print(f"\nFile: data/weather_pullman_hourly_2020_2025.csv")
print(f"Total hours: {len(weather_df):,}")
print(f"Date range: {weather_df['datetime'].min()} to {weather_df['datetime'].max()}")

# Also save by year for easier parsing
print(f"\n{'='*70}")
print(f"SAVING BY YEAR")
print(f"{'='*70}")

for year, year_df in weather_by_year.items():
    filename = f'../data/weather_pullman_{year}_hourly.csv'
    year_df.to_csv(filename, index=False)
    print(f"{year}: {filename} ({len(year_df):,} hours)")

print(f"\n{'='*70}")
print(f"ALL FILES SAVED")
print(f"{'='*70}")
print(f"\nMain file: weather_pullman_hourly_2020_2025.csv ({len(weather_df):,} hours)")
print(f"By year: {len(weather_by_year)} separate files")

print(f"\nColumns ({len(weather_df.columns)}):")
for col in weather_df.columns:
    print(f"  - {col}")

print(f"\n{'='*70}")
print(f"NEXT STEPS")
print(f"{'='*70}")
print("\n1. Merge hourly weather with enforcement data by datetime + hour")
print("2. Match: weather['datetime'] with enforcement['datetime']")
print("3. Or merge by: weather['date'] + weather['hour'] = enforcement['date'] + enforcement['hour']")
print("4. Now we can analyze HOURLY weather impact on enforcement!")
print("\n✓ No more daily aggregation - precise hour-by-hour weather!")



MAIN FILE SAVED

File: data/weather_pullman_hourly_2020_2025.csv
Total hours: 51,144
Date range: 2020-01-01 00:00:00 to 2025-10-31 23:00:00

SAVING BY YEAR
2020: ../data/weather_pullman_2020_hourly.csv (8,784 hours)
2021: ../data/weather_pullman_2021_hourly.csv (8,760 hours)
2022: ../data/weather_pullman_2022_hourly.csv (8,760 hours)
2023: ../data/weather_pullman_2023_hourly.csv (8,760 hours)
2024: ../data/weather_pullman_2024_hourly.csv (8,784 hours)
2025: ../data/weather_pullman_2025_hourly.csv (7,296 hours)

ALL FILES SAVED

Main file: weather_pullman_hourly_2020_2025.csv (51,144 hours)
By year: 6 separate files

Columns (17):
  - datetime
  - temperature_f
  - precipitation_inches
  - snowfall_inches
  - snow_depth_inches
  - wind_mph
  - weather_code
  - date
  - hour
  - year
  - weather_category
  - is_rainy
  - is_snowy
  - is_cold
  - is_hot
  - is_windy
  - is_severe

NEXT STEPS

1. Merge hourly weather with enforcement data by datetime + hour
2. Match: weather['datetime'] w

## Preview Weather Statistics by Season

In [None]:
# Add season column
def get_season(date):
    month = date.month
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    else:
        return 'Fall'

weather_df['season'] = weather_df['date'].apply(get_season)

print("\n" + "="*70)
print("WEATHER STATISTICS BY SEASON (HOURLY DATA)")
print("="*70)

seasonal_stats = weather_df.groupby('season').agg({
    'temperature_f': 'mean',  # Changed from temp_mean_f
    'precipitation_inches': 'sum',
    'snowfall_inches': 'sum',
    'is_rainy': 'sum',
    'is_snowy': 'sum',
    'hour': 'count'  # Changed from 'date' to count hours
}).round(2)

seasonal_stats.columns = ['Avg Temp (°F)', 'Total Precip (in)', 'Total Snow (in)', 
                           'Rainy Hours', 'Snowy Hours', 'Total Hours']

# Reorder seasons
season_order = ['Winter', 'Spring', 'Summer', 'Fall']
seasonal_stats = seasonal_stats.reindex(season_order)

print(seasonal_stats)





WEATHER STATISTICS BY SEASON (HOURLY DATA)
        Avg Temp (°F)  Total Precip (in)  Total Snow (in)  Rainy Hours  \
season                                                                   
Winter          32.91              50.67           171.18          184   
Spring          47.20              31.39            42.35          132   
Summer          70.59              12.57             0.00           68   
Fall            52.18              33.74            31.70          173   

        Snowy Hours  Total Hours  
season                            
Winter         1035        12264  
Spring          257        13248  
Summer            0        13248  
Fall            174        12384  

Pullman, WA typical weather:
  - Cold, snowy winters (Dec-Feb)
  - Mild springs with some rain (Mar-May)
  - Warm, dry summers (Jun-Aug)
  - Cool, rainy falls (Sep-Nov)
