# 10 – Wildfire Data & Exploratory Analysis

In this notebook we:

1. Load a synthetic wildfire dataset (`wildfire_synthetic.csv`).
2. Perform basic exploratory data analysis (EDA).
3. Visualize distributions and relationships between key features and `fire_occurred`.

This mirrors what you would do with a real wildfire dataset from sources such as USGS and NASA FIRMS,
but keeps the data small and self-contained for teaching.

In [None]:
import os
import pandas as pd
import matplotlib.pyplot as plt

base_dir = os.path.dirname(os.path.dirname(os.getcwd())) if 'cardiff_ai_talk_runbook' in os.getcwd() else "/mnt/data/cardiff_ai_talk_runbook"
data_path = os.path.join(base_dir, "data", "raw", "wildfire_synthetic.csv")

df = pd.read_csv(data_path)
print("Shape:", df.shape)
df.head()

In [None]:
# Basic summary statistics
df.describe()

In [None]:
# Class balance for fire occurrence
fire_counts = df['fire_occurred'].value_counts(normalize=True)
print(fire_counts)

fire_counts.plot(kind='bar')
plt.title("Fire Occurrence (class balance)")
plt.xlabel("fire_occurred")
plt.ylabel("Proportion")
plt.show()

In [None]:
# Simple pairplots / correlations
numeric_cols = ['temp_c', 'humidity', 'wind_speed', 'rain_mm_last_7d', 'vegetation_index', 'population_density']
corr = df[numeric_cols + ['fire_occurred']].corr()
corr

In [None]:
# Visualize relationship between temperature and fire occurrence
plt.scatter(df['temp_c'], df['fire_occurred'] + np.random.normal(0, 0.02, size=len(df)), alpha=0.3)
plt.yticks([0, 1], ["No Fire", "Fire"])
plt.xlabel("Temperature (°C)")
plt.title("Temperature vs Fire Occurrence")
plt.show()