Dear editor, <br><br>
Given that the story is planned to release around wildfire season, I focused on fire-related FEMA disaster declarations specifically.  I think that there is a story here, about the accelerating rate at which fires are occurring in the US South and West, especially within the last 20 years. Two key angles that I think we can approach this piece from are:
1. While fires have historically been a regular occurrence in the US West, they've seen a rapid acceleration in the frequency of their occurrence over the last two decades alone.
2. Fires in the US South were historically rare, but have been occurring at a similar rate to the increasingly frequent fires in the US West over the last two decades

To support these claims, I've included plots that group the FEMA fire data into major regions of the US (South, West, Northeast, Midwest).I've plotted 
1. All the fires as they've occurred in the data from 1954 to 2023 grouped by region.
2. The distribution of fires over time for each region, capturing how they are specifically concentrated in the years post 2000.

#Install and import packages

In [None]:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import scipy
import pandas as pd
import scipy.special

# Our main plotting package (must have explicit import of submodules)
import bokeh.io
import bokeh.plotting
from bokeh.plotting import figure, show
from bokeh.transform import jitter
from bokeh.models import ColumnDataSource
from bokeh.plotting import output_notebook
output_notebook()

In [None]:
!pip install iqplot

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting iqplot
  Downloading iqplot-0.3.2-py2.py3-none-any.whl (32 kB)
Installing collected packages: iqplot
Successfully installed iqplot-0.3.2


In [None]:
import iqplot

#Read in data

In [None]:
# Read in all FEMA data
df = pd.read_csv('DisasterDeclarationsSummaries.csv')

Since the article is being released around fire season, I thought I'd explore the fire data only first.

In [None]:
# Select just fire data
df_fire = df.loc[(df['incidentType'] == "Fire")]

In [None]:
df_fire

Unnamed: 0,femaDeclarationString,disasterNumber,state,declarationType,declarationDate,fyDeclared,incidentType,declarationTitle,ihProgramDeclared,iaProgramDeclared,...,disasterCloseoutDate,fipsStateCode,fipsCountyCode,placeCode,designatedArea,declarationRequestNumber,lastIAFilingDate,hash,id,lastRefresh
0,FM-5444-TX,5444,TX,FM,2022-07-19T00:00:00.000Z,2022,Fire,CHALK MOUNTAIN FIRE,0,0,...,,48,221,99221,Hood (County),22060,,373c5ec27998afc08a53302dae796f476b1a6546,867be42a-71d5-4f13-aa21-d91e0a6fd577,2022-07-20T21:21:23.941Z
1,FM-5436-NE,5436,NE,FM,2022-04-23T00:00:00.000Z,2022,Fire,ROAD 702 FIRE,0,0,...,,31,63,99063,Frontier (County),22034,,ea3487ef36cff455236ce4c63d32fb8b5412bcef,e671348b-9782-42df-99f4-d38b8b1a89e6,2022-07-20T21:21:23.942Z
2,FM-5444-TX,5444,TX,FM,2022-07-19T00:00:00.000Z,2022,Fire,CHALK MOUNTAIN FIRE,0,0,...,,48,425,99425,Somervell (County),22060,,1f35dd8137e1b4cf003fb73d53d8aaaf642e6190,40f3ff75-0b80-4d25-8e85-156cd6a6f40b,2022-07-20T21:21:23.943Z
3,FM-5436-NE,5436,NE,FM,2022-04-23T00:00:00.000Z,2022,Fire,ROAD 702 FIRE,0,0,...,,31,65,99065,Furnas (County),22034,,9d813a845ab86f546bdf642fda53d3d4a0fbd098,de47838f-32db-4058-a34c-64997333789e,2022-07-20T21:21:23.943Z
4,FM-5436-NE,5436,NE,FM,2022-04-23T00:00:00.000Z,2022,Fire,ROAD 702 FIRE,0,0,...,,31,145,99145,Red Willow (County),22034,,d436856ce1d5205fa78597b93661fb27d3cea796,483c03b7-160a-414e-bb4f-5ccecc18d94a,2022-07-20T21:21:23.944Z
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
43692,FM-5126-OR,5126,OR,FM,2016-06-08T00:00:00.000Z,2016,Fire,AKAWANA FIRE,0,0,...,2022-09-29T00:00:00.000Z,41,31,99031,Jefferson (County),16046,,19b67c1967fc562d5d42961ba6439b8a7cb8b6a0,8361a108-44c8-48d3-b2e0-7b395730e3b3,2022-09-29T15:21:28.029Z
43821,FM-5092-OR,5092,OR,FM,2015-07-31T00:00:00.000Z,2015,Fire,STOUTS CREEK FIRE,0,0,...,,41,19,99019,Douglas (County),15048,,f6b61081e59865e243f6b626eb7df1ac07ba8a5b,cdfb84a7-3774-4e17-b9d0-1fc28a05773b,2022-10-22T01:41:13.199Z
43950,FM-5114-OR,5114,OR,FM,2015-09-14T00:00:00.000Z,2015,Fire,DRY GULCH FIRE,0,0,...,2022-09-28T00:00:00.000Z,41,1,99001,Baker (County),15075,,c0577c1ff975b5af81e20a56f0b83bfb8761fe6c,70def5d2-e6ce-470a-932e-e37ae91b0309,2022-09-28T13:41:13.929Z
44259,FM-5456-WA,5456,WA,FM,2022-10-17T00:00:00.000Z,2023,Fire,NAKIA CREEK FIRE,0,0,...,,53,11,99011,Clark (County),22097,,73993b1c2a24962b4e18c3d29696d8e18f7f212a,18b172e2-bdcb-4af0-80fd-05c6290114b0,2022-10-17T19:01:32.347Z


In [None]:
#Set source data for plots
source = ColumnDataSource(df_fire)

#Plots

##Preliminary plotting

First, I made a jitter plot for the fire data across all states, just to see what it looks like over time.

In [None]:
# Prelim jitter plot showing fires in all states over time
states = ['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'FL', 'GA', 'HI', 'IA', 'ID', 'IL', 'IN',
          'KS','KY','LA', 'MA', 'ME','MH','MI','MN','MO','MS','MT','NC','NE','NH','NJ',
          'NM', 'NV','NY','OK','OR','PA','SC','SD','TN','TX','UT','VA','VT','WA','WV','WY']

p = figure(width=800, height=700, y_range= states, title="Fires in the US (FEMA disaster declarations 1954 - Present)")

p.scatter(x='fyDeclared', y=jitter('state', width=0.6, range=p.y_range), source=source, alpha=0.3)

p.x_range.range_padding = 0
p.ygrid.grid_line_color = None


show(p)

Plotting all the states is messy, but it's enough to notice that states in the South and West regions of the US appear to have an uptick of fires from around 1990 onwards. Separating the data into major regions of the US should give us more clarity.

In [None]:
#Ensure bokeh recognizes each plot as a new plot
from bokeh.models import Model

for model in p.select({'type': Model}):
    prev_doc = model.document
    model._document = None
    if prev_doc:
        prev_doc.remove_root(model)

##Separated by US region

In [None]:
#Separated states
west = ['AK','AZ', 'CO', 'CA', 'HI','ID','MT','NV','NM','UT']
midwest = ['IA','IL','IN','KS','MI','MN','MO','NE']
south = ['AL', 'AR','FL', 'GA','KY','LA','MS','NC','OK','SC','TN','TX','WV','VA']
northeast = ['MA','ME','NH','NJ','NY','PA','VT']

#Made separate datasets just in case I needed to analyze them individually
#df_fire_w = df_fire.loc[(df_fire['state'].isin(west))]
#df_fire_mw = df_fire.loc[(df_fire['state'].isin(midwest))]
#df_fire_s = df_fire.loc[(df_fire['state'].isin(south))]
#df_fire_ne = df_fire.loc[(df_fire['state'].isin(northeast))]                   

In [None]:
#Create new column "region" in the fire dataset
df_fire['region'] = 'West'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_fire['region'] = 'West'


In [None]:
#Add region to the respective states
df_fire.loc[df_fire['state'].isin(midwest), 'region'] = 'Midwest'
df_fire.loc[df_fire['state'].isin(south), 'region'] = 'South'
df_fire.loc[df_fire['state'].isin(northeast), 'region'] = 'Northeast'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


In [None]:
# Import color palette for plotting
import bokeh.palettes
from bokeh.palettes import Set2

In [None]:
from bokeh.models import ColumnDataSource, RangeTool

###All data plotted by US region

With a jitterplot divided by region, we can clearly see that 1. the number of fires has definitely increased over time and that 2. Southern and Western US states have seen more dramatic upticks in fires from around the mid 90s onward. 

In [None]:
TOOLTIPS=[("Year", "@fyDeclared"),
          ("State", "@state"),
          ("County", "@designatedArea"),
          ("Fire", "@declarationTitle")]

# Strip-jitterplot by region of the US
p = iqplot.strip(
    data=df_fire,
    q="fyDeclared",
    cats="region",
    spread="jitter",
    marker_kwargs=dict(alpha=0.4),
    palette=Set2[4],
    frame_width=700,
    frame_height=400,
    tooltips=TOOLTIPS,
    x_axis_label = "Year",
    title="Fires in the US by region (FEMA disaster declarations 1954 - Present"
)

bokeh.io.show(p)



Hovering over individual points will display when and where a specific fire incident occurred. You can zoom in on certain parts of the graph by using the box zoom to select the area of the graph that you want to look at more closely, or the wheel zoom by scrolling to zoom in on a particular section. This may be helpful to find individual incidents within dense areas of points. The reset button will restore the graph to its original appearance. 

### Plots showing the distribution of fires over time

Next, I want to quantitatively show *how* drastic the increase is—i.e. how the fire data is distributed over time. I've used a python package for Bokeh called iqplot (https://iqplot.github.io/) to plot the distributions. iqplot allows you to specify one quantitative variable and one categorical variable—— in our case, the number of fires in a region and the region where they occurred respectively. 


I'll plot the data by region in a ECDF (empirical cumulative distribution function plot). An ECDF shows what fraction of the data points (y-value) are less than or equal to the corresponding x-value. In this case, this plot will show **what fraction of the total fires in each US region occurred during or before any given year.**

In [None]:
#Empirical cumulative distribution function (ECDF) plot
p = iqplot.ecdf(
    data=df_fire,
    q="fyDeclared",
    cats="region",
    style= "staircase",
    frame_height=250,
    frame_width = 500,
    title = "Fraction of total fires by region that occurred before or during a given year",
    x_axis_label = None,
    y_axis_label = None,
    palette=Set2[4]
    )
p.xaxis.axis_label_text_font_style = "bold"

bokeh.io.show(p)


*You can interact with this plot by clicking on a region in the legend to remove it and view the regions separately*

This plot shows just how skewed the distribution of fire occurrences is— in the South, half of all fires occur after 2006, and 80% of fires occur after 1999. While the West is the most fire-prone generally, half of all fires occur after 2008, and 80% of the fires there occur after 2000.
Though not as many fires occur in the Midwest and few in the Northeast, similarly, 80% of fires occurred during or after 2000.

**Even in the hotter, drier and more fire-prone regions of the US South and West, the overwhelming majority of FEMA-recorded fire disasters have occurred in the last two decades, and half occur just over a decade ago!**

For another way of looking at the distributions of fire occurrences (ECDFs are not always immediately intuitive to interpret), I used a boxplot. 

In [None]:

#Boxplot of fires in the US by region
p = iqplot.stripbox(
    data=df_fire,
    q="fyDeclared",
    cats="region",
    frame_width=500,
    marker_kwargs=dict(alpha=0.4),
    palette=Set2[4],
    y_axis_label="# of fires in the US by region",
    x_axis_label=None,
)

p.yaxis.axis_label_text_font_style = "bold"

bokeh.io.show(p)

Likewise, the boxplot shows that 50% of all fires over time (represented by the median line in each boxplot) occurred in 2006-2008 for the US West and South.

The distribution of fire occurrences in the US West is particularly striking, as the first FEMA-recorded fire is in 1957 and the median number of fires in the West occurs in 2008. The first half of all fires in the US West occurred over ~50 years, while the second half occurred in only 15 years. 
Another observation is that prior to 1985, FEMA data barely records any fires in the US South, showing that fire disaster declarations in the South are a relatively recent occurrence. Similarly to the West, it takes 35 years for the first half of all fires to occur in the South, and only 16 for the second half. 