# California Wildfire Analysis (1878-2023)

**Author:** FireAnalyst Research Team  
**Date:** 2025  
**Dataset:** California Fire Perimeters (historical)

## Overview

This notebook presents a comprehensive statistical analysis of California wildfire patterns from 1878 to 2023. The analysis examines:

1. **Data Quality & Validation** - Comprehensive data cleaning and validation
2. **Temporal Trends** - Long-term patterns in fire occurrence and severity
3. **Seasonal Patterns** - Monthly distribution and seasonal effects
4. **Fire Causes** - Analysis of ignition sources and human factors
5. **Containment Effectiveness** - Evaluation of suppression methods
6. **Statistical Testing** - Hypothesis tests and confidence intervals

---

## 1. Setup and Configuration

In [None]:
# Import required modules
import sys
sys.path.insert(0, '.')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from src.data_processing import full_pipeline
from src.analysis import (
    calculate_containment_effectiveness,
    analyze_containment_methods,
    temporal_trend_analysis,
    seasonal_analysis,
    cause_analysis,
    compare_decades,
    summary_statistics,
    generate_analysis_report
)
from src.visualization import (
    plot_fires_over_time,
    plot_peak_years,
    plot_fires_by_decade,
    plot_fire_area_by_decade,
    plot_seasonality,
    plot_cause_distribution,
    plot_containment_effectiveness,
    plot_fire_size_distribution,
    generate_all_visualizations
)

# Configure display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 120)
pd.set_option('display.precision', 2)

%matplotlib inline

print("✓ Modules imported successfully")

## 2. Data Processing Pipeline

Execute the complete data processing workflow including loading, cleaning, validation, and feature engineering.

In [None]:
# Run the complete data processing pipeline
df, validation_stats = full_pipeline(verbose=True)

print(f"\n{'='*60}")
print(f"Final dataset shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print(f"{'='*60}")

## 3. Summary Statistics

Generate comprehensive descriptive statistics for the dataset.

In [None]:
# Calculate summary statistics
stats = summary_statistics(df)

print("\n" + "="*60)
print("SUMMARY STATISTICS")
print("="*60)
print(f"Total fires analyzed: {stats['total_fires']:,}")
print(f"Year range: {stats['year_range'][0]} - {stats['year_range'][1]}")
print(f"\nFire Size (acres):")
print(f"  Total burned: {stats['total_acres_burned']:,.0f}")
print(f"  Mean: {stats['mean_fire_size']:,.1f}")
print(f"  Median: {stats['median_fire_size']:,.1f}")
print(f"  Maximum: {stats['max_fire_size']:,.0f}")
print(f"  95th percentile: {stats['p95_fire_size']:,.0f}")
print(f"\nContainment Duration:")
print(f"  Mean: {stats['mean_containment_hours']:.1f} hours ({stats['mean_containment_hours']/24:.1f} days)")
print(f"  Median: {stats['median_containment_hours']:.1f} hours ({stats['median_containment_hours']/24:.1f} days)")
print("="*60)

# Display first few rows
print("\nSample data:")
df[['YEAR_', 'FIRE_NAME', 'GIS_ACRES', 'Containment_Duration', 'CAUSE_DESCRIPTION']].head(10)

## 4. Temporal Trend Analysis

Examine long-term trends in fire occurrence with statistical testing.

In [None]:
# Analyze temporal trends
yearly_stats, trend_test = temporal_trend_analysis(df)

print("\n" + "="*60)
print("TEMPORAL TREND ANALYSIS")
print("="*60)
print(f"\n{trend_test.interpretation}")
print("="*60)

# Visualize trends
fig = plot_fires_over_time(df, save=True)
plt.show()

# Peak years
fig = plot_peak_years(df, top_n=10, save=True)
plt.show()

# Decadal trends
fig = plot_fires_by_decade(df, save=True)
plt.show()

fig = plot_fire_area_by_decade(df, save=True)
plt.show()

## 5. Seasonal Pattern Analysis

Investigate monthly patterns in fire occurrence and test for seasonal variation.

In [None]:
# Analyze seasonal patterns
monthly_stats, seasonal_test = seasonal_analysis(df)

print("\n" + "="*60)
print("SEASONAL PATTERN ANALYSIS")
print("="*60)
print(f"\n{seasonal_test.interpretation}")
print("\nMonthly fire counts:")
print(monthly_stats)
print("="*60)

# Visualize seasonality
fig = plot_seasonality(df, save=True)
plt.show()

## 6. Fire Cause Analysis

Examine the distribution and characteristics of different fire causes.

In [None]:
# Analyze fire causes
cause_stats = cause_analysis(df)

print("\n" + "="*60)
print("FIRE CAUSE ANALYSIS")
print("="*60)
print("\nTop 10 fire causes:")
print(cause_stats.head(10))
print("="*60)

# Visualize causes
fig = plot_cause_distribution(df, top_n=10, save=True)
plt.show()

## 7. Containment Method Effectiveness

**⚠️ IMPORTANT METHODOLOGICAL NOTE:**

The effectiveness metric used here (acres/hour) has significant limitations:
- It conflates fire size with suppression effectiveness
- Larger fires naturally take longer to contain
- Does not control for initial conditions, weather, or terrain

Results should be interpreted cautiously and supplemented with size-controlled analyses.

In [None]:
# Calculate effectiveness and analyze by method
df_with_eff = calculate_containment_effectiveness(df)
method_stats, anova_results = analyze_containment_methods(df_with_eff)

print("\n" + "="*60)
print("CONTAINMENT METHOD ANALYSIS")
print("="*60)
print(f"\n{anova_results['interpretation']}")
print("\nMethod effectiveness (with 95% confidence intervals):")
print(method_stats[['Mean_Effectiveness', 'CI_Lower', 'CI_Upper', 'Count']])
print("="*60)

# Visualize effectiveness
fig = plot_containment_effectiveness(method_stats, save=True)
plt.show()

## 8. Fire Size Distribution

Examine the distribution of fire sizes to understand the data structure.

In [None]:
# Visualize fire size distribution
fig = plot_fire_size_distribution(df, save=True)
plt.show()

# Calculate percentiles
percentiles = [50, 75, 90, 95, 99]
print("\nFire size percentiles:")
for p in percentiles:
    value = df['GIS_ACRES'].quantile(p/100)
    print(f"  {p}th percentile: {value:,.1f} acres")

## 9. Comprehensive Analysis Report

Generate a complete analysis report with all statistical tests.

In [None]:
# Generate comprehensive report
df_with_eff = calculate_containment_effectiveness(df)
report = generate_analysis_report(df_with_eff)

print("\n" + "="*60)
print("COMPREHENSIVE ANALYSIS REPORT")
print("="*60)

print("\n1. Summary Statistics:")
for key, value in report['summary_statistics'].items():
    print(f"  {key}: {value}")

print("\n2. Temporal Trend Test:")
print(f"  {report['temporal_analysis']['trend_test'].interpretation}")

print("\n3. Seasonal Pattern Test:")
print(f"  {report['seasonal_analysis']['test'].interpretation}")

print("\n4. Containment Method ANOVA:")
if report['method_analysis']:
    print(f"  {report['method_analysis']['anova']['interpretation']}")

print("\n5. Decade Comparison Test:")
print(f"  {report['decade_comparison']['test']['interpretation']}")

print("\n" + "="*60)

## 10. Conclusions

### Key Findings:

1. **Data Quality:** The dataset spans 145+ years with comprehensive validation
2. **Temporal Trends:** [Interpretation based on statistical tests]
3. **Seasonality:** Fire occurrence shows strong seasonal patterns
4. **Fire Causes:** Lightning and human factors are primary ignition sources
5. **Containment Methods:** Statistical differences exist between methods (with caveats)

### Methodological Limitations:

- Effectiveness metric requires revision to control for fire size
- Missing confounding variables (weather, vegetation, population density)
- Temporal changes in reporting standards
- No spatial analysis of geographic patterns

### Future Work:

1. Develop size-controlled effectiveness metrics
2. Incorporate weather and environmental data
3. Add predictive modeling for fire risk
4. Perform spatial analysis with GIS data
5. Conduct sensitivity analyses

---

**For full methodology and technical details, see `docs/latex/methodology.tex`**