# California Fire Data Ingestion

**Dataset**: CAL FIRE Historical Fire Perimeters (fire24_1.gdb)  
**Date Range**: 1878 - 2025 (focusing on 2000-2025 for modeling)  
**Source**: [CAL FIRE FRAP](https://www.fire.ca.gov/what-we-do/fire-resource-assessment-program/fire-perimeters)

**Objective**: 
- Load and validate historical California fire data
- Verify data quality and completeness
- Understand dataset structure and features
- Document January 2025 fires in California
- Prepare for comprehensive EDA in Phase 2


## Why Historical Fire Data Matters for Prediction

Understanding past fire patterns is fundamental to predicting future fire risk. Historical fire data provides:

### **1. Spatial Patterns** 🗺️
- **Fire-prone zones**: Some areas burn repeatedly due to geography, vegetation, and climate
- **Fire frequency**: How often does each region experience fires?
- **Fire clustering**: Do fires tend to occur near previous burns?
- **Geographic risk factors**: Identify consistently high-risk counties and landscapes

### **2. Temporal Patterns** 📅
- **Seasonality**: California fire season traditionally May-October, but January 2025 fires show patterns changing
- **Multi-year trends**: Are fires increasing in frequency and size?
- **Climate change signals**: Longer fire seasons, earlier starts, later ends
- **Fire return intervals**: How long between fires in the same location?

### **3. Fire Behavior Insights** 🔥
- **Fire size distribution**: Most fires small, but large fires cause most damage
- **Fire duration**: Days from ignition to containment
- **Rapid vs. slow-growing fires**: What conditions lead to explosive growth?

### **4. Ignition Source Intelligence** ⚡
- **Human vs. natural causes**: Power lines, equipment, campfires vs. lightning
- **Cause-specific patterns**: When and where do different ignition sources occur?
- **Risk mitigation**: Target prevention efforts based on common causes

### **5. Model Training Foundation** 🤖
- **Labeled examples**: Historical fires = positive class, non-fire events = negative class
- **Feature importance**: Which conditions consistently preceded fires?
- **Validation dataset**: Test predictions against known fire outcomes
- **Spatial-temporal learning**: Model learns "when + where = fire risk"

### **Key Insight for ML:**
By analyzing 147 years of fire history (1878-2025), we can identify the environmental signatures that precede fires and use those patterns to predict future risk before ignition occurs.


In [1]:
# Import libraries
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("✅ Libraries imported successfully!")


✅ Libraries imported successfully!


### Dataset Columns Explained

The CAL FIRE fire perimeters dataset contains rich attributes for each fire:

#### Key Columns:

| Column | Description | Usage in Fire Prediction |
|--------|-------------|--------------------------|
| **FIRE_NAME** | Name of the fire (UPPER CASE) | Identification |
| **YEAR_** | Year fire started | Temporal trends |
| **ALARM_DATE** | Fire discovery date (DD/MM/YYYY) | When fire began |
| **CONT_DATE** | Containment date | Fire duration |
| **GIS_ACRES** | Fire size in acres | Fire severity/impact |
| **CAUSE** | Ignition source (coded) | Human vs. natural, ignition patterns |
| **UNIT_ID** | Fire unit/agency code | Geographic location |
| **STATE** | State (CA, NV, OR, AZ) | Geographic filter |
| **AGENCY** | Responsible agency | Management classification |
| **geometry** | Fire perimeter polygon | Spatial analysis |

#### CAUSE Codes (Fire Ignition Sources):
According to the [CAL FIRE Data Dictionary](https://www.fire.ca.gov/what-we-do/fire-resource-assessment-program/fire-perimeters), causes include:
- **Lightning** (natural ignition)
- **Equipment Use** (machinery, power tools)
- **Electrical Power** (power lines - major cause in CA!)
- **Arson** (deliberate ignition)
- **Debris Burning** (controlled burns that escape)
- **Campfire** (recreational fires)
- **Railroad** (train sparks)
- **Smoking** (cigarettes)
- **Miscellaneous** (other sources)
- **Unknown** (cause not determined)

#### Data Source:
- **Agency**: CAL FIRE Fire and Resource Assessment Program (FRAP)
- **URL**: [CAL FIRE Fire Perimeters](https://www.fire.ca.gov/what-we-do/fire-resource-assessment-program/fire-perimeters)
- **Data Dictionary**: [PDF Documentation](https://www.fire.ca.gov/media/fire-perimeters-data-dictionary.pdf)
- **Update Frequency**: Annually (latest: April 2025)
- **Coverage**: Federal, state, and local fire agencies across California


## 1. Load Fire Perimeter Data


In [2]:
# Define path to geodatabase
gdb_path = Path('../data/raw/fires/fire24_1.gdb')

print(f"Loading data from: {gdb_path}")
print(f"File exists: {gdb_path.exists()}")

# List layers in the geodatabase
import fiona
layers = fiona.listlayers(str(gdb_path))
print("\nLayers in geodatabase:")
for i, layer in enumerate(layers, 1):
    print(f"{i}. {layer}")


Loading data from: ../data/raw/fires/fire24_1.gdb
File exists: True

Layers in geodatabase:
1. rxburn24_1
2. firep24_1


In [3]:
# Load fire perimeters layer
fire_layer = [l for l in layers if 'fire' in l.lower() and 'rx' not in l.lower()][0]
print(f"Loading layer: {fire_layer}")

fires_gdf = gpd.read_file(gdb_path, layer=fire_layer)
print(f"\n✅ Loaded {len(fires_gdf):,} fire records!")
print(f"Columns: {list(fires_gdf.columns)}")
print(f"\nFirst 5 records:")
fires_gdf.head()


Loading layer: firep24_1

✅ Loaded 22,810 fire records!
Columns: ['YEAR_', 'STATE', 'AGENCY', 'UNIT_ID', 'FIRE_NAME', 'INC_NUM', 'IRWINID', 'ALARM_DATE', 'CONT_DATE', 'C_METHOD', 'CAUSE', 'COMPLEX_NAME', 'COMPLEX_ID', 'OBJECTIVE', 'GIS_ACRES', 'COMMENTS', 'FIRE_NUM', 'Shape_Length', 'Shape_Area', 'geometry']

First 5 records:


Unnamed: 0,YEAR_,STATE,AGENCY,UNIT_ID,FIRE_NAME,INC_NUM,IRWINID,ALARM_DATE,CONT_DATE,C_METHOD,CAUSE,COMPLEX_NAME,COMPLEX_ID,OBJECTIVE,GIS_ACRES,COMMENTS,FIRE_NUM,Shape_Length,Shape_Area,geometry
0,2025.0,CA,CDF,LDF,PALISADES,738,{A7EA5D21-F882-44B8-BF64-44AB11059DC1},2025-01-07 00:00:00+00:00,2025-01-31 00:00:00+00:00,7.0,14,,,1.0,23448.882812,,,116028.197349,94894260.0,"MULTIPOLYGON (((136696.228 -441776.379, 136683..."
1,2025.0,CA,CDF,LAC,EATON,9087,{72660ADC-B5EF-4D96-A33F-B4EA3740A4E3},2025-01-08 00:00:00+00:00,2025-01-31 00:00:00+00:00,7.0,14,,,1.0,14056.260742,,,86677.545056,56883670.0,"MULTIPOLYGON (((175062.878 -425909.927, 175066..."
2,2025.0,CA,CDF,ANF,HUGHES,250270,{994072D2-E154-434A-BB95-6F6C94C40829},2025-01-22 00:00:00+00:00,2025-01-28 00:00:00+00:00,7.0,14,,,1.0,10396.798828,,,79554.126153,42074350.0,"MULTIPOLYGON (((132177.534 -380697.661, 132181..."
3,2025.0,CA,CCO,VNC,KENNETH,3155,{842FB37B-7AC8-4700-BB9C-028BF753D149},2025-01-09 00:00:00+00:00,2025-02-04 00:00:00+00:00,2.0,14,,,1.0,998.737793,from OES Intel 24,,12891.056545,4041748.0,"MULTIPOLYGON (((121967.885 -426575.817, 121970..."
4,2025.0,CA,CDF,LDF,HURST,3294,{F4E810AD-CDF3-4ED4-B63F-03D43785BA7B},2025-01-07 00:00:00+00:00,2025-01-09 00:00:00+00:00,7.0,14,,,1.0,831.385498,,,13274.108148,3364498.0,"MULTIPOLYGON (((140774.966 -408332.494, 140784..."


## 2. Explore January 2025 Fires in California 🔥


In [4]:
# Filter for 2025 fires
fires_2025 = fires_gdf[fires_gdf['YEAR_'] == 2025]
print(f"Total fires in 2025: {len(fires_2025)}")
print("\n2025 Fires:")
print(fires_2025[['FIRE_NAME', 'ALARM_DATE', 'GIS_ACRES', 'UNIT_ID']].sort_values('ALARM_DATE'))

# Search for Hollywood-related fires
hollywood_fires = fires_gdf[fires_gdf['FIRE_NAME'].str.contains('HOLLYWOOD', case=False, na=False)]
print(f"\n\nFires with 'Hollywood' in name: {len(hollywood_fires)}")
if len(hollywood_fires) > 0:
    print("\nHollywood Fires:")
    print(hollywood_fires[['FIRE_NAME', 'ALARM_DATE', 'CONT_DATE', 'GIS_ACRES', 'YEAR_']])

# Filter for January 2025 LA area fires
la_jan_2025 = fires_gdf[
    (fires_gdf['YEAR_'] == 2025) & 
    ((fires_gdf['UNIT_ID'] == 'LAC') | (fires_gdf['UNIT_ID'] == 'LDF'))
]
print(f"\n\nLos Angeles area fires in 2025: {len(la_jan_2025)}")
if len(la_jan_2025) > 0:
    print("\nLA Area 2025 Fires:")
    print(la_jan_2025[['FIRE_NAME', 'ALARM_DATE', 'GIS_ACRES', 'UNIT_ID']])


Total fires in 2025: 6

2025 Fires:
   FIRE_NAME                ALARM_DATE     GIS_ACRES UNIT_ID
0  PALISADES 2025-01-07 00:00:00+00:00  23448.882812     LDF
4      HURST 2025-01-07 00:00:00+00:00    831.385498     LDF
1      EATON 2025-01-08 00:00:00+00:00  14056.260742     LAC
5      LIDIA 2025-01-08 00:00:00+00:00    347.704163     LAC
3    KENNETH 2025-01-09 00:00:00+00:00    998.737793     VNC
2     HUGHES 2025-01-22 00:00:00+00:00  10396.798828     ANF


Fires with 'Hollywood' in name: 2

Hollywood Fires:
               FIRE_NAME                ALARM_DATE                 CONT_DATE  \
2346     HOLLYWOOD HILLS 2019-06-09 00:00:00+00:00 2019-06-09 00:00:00+00:00   
17089  MT. HOLLYWOOD DR. 1952-08-09 00:00:00+00:00                       NaT   

       GIS_ACRES   YEAR_  
2346    1.006725  2019.0  
17089   6.635286  1952.0  


Los Angeles area fires in 2025: 4

LA Area 2025 Fires:
   FIRE_NAME                ALARM_DATE     GIS_ACRES UNIT_ID
0  PALISADES 2025-01-07 00:00:00+00:00  234