# Wildfire Data Pre-Processing

#### Data Source:

Monitoring Trends in Burn Severity (MTBS)

Burned Areas Boundaries Dataset (1984-2022)

https://www.mtbs.gov/direct-download

In [18]:
import pandas as pd
import numpy as np

## Load Data From MTBS

In [19]:
import geopandas as gdp

In [20]:
df_raw = gdp.read_file('data/mtbs/mtbs_perims_DD.shp')

In [21]:
df = df_raw.copy()

## Drop Unneeded Columns

In [22]:
df = df.drop(columns=[
    'Map_ID',
    'Map_Prog',
    'Asmnt_Type',
    'Pre_ID',
    'Post_ID',
    'Perim_ID',
    'dNBR_offst',
    'dNBR_stdDv',
    'NoData_T',
    'IncGreen_T',
    'Low_T',
    'Mod_T',
    'High_T',
    'Comment'
])

## Rename Columns

In [23]:
df = df.rename(columns={
            'Event_ID': 'mtbs_id',
            'irwinID': 'irwin_id',
            'Incid_Name': 'incident_name',
            'Incid_Type': 'incident_type',
            'BurnBndAc': 'burned_acreage',
            'Ig_Date': 'ignition_date',
            'BurnBndLat': 'latitude',
            'BurnBndLon': 'longitude'
})

## Remove Prescribed Burns

In [24]:
df = df[df.incident_type != 'Prescribed Fire']

In [25]:
df = df.drop(columns=['incident_type'])

## Add a Year Column

In [30]:
df['ignition_year'] = df.ignition_date.apply(lambda date: date[:4])

## (Optional) Add a state column

In [27]:
"""This works to compute the state for each fire,
but it will take 3+ hours to work for all 30k fires
import reverse_geocoder
import tqdm
tqdm.pandas()

df = df.astype({
    'latitude': np.float64,
    'longitude': np.float64
})

def get_state(row):
    response = reverse_geocoder.search((row.latitude, row.longitude))
    return response[0]['admin1']
df['state'] = df.progress_apply(get_state, axis=1)
""";

## Data Cleanliness

The data, as provided by MTBS, only contains missing values for IRWIN IDs. This is to be expected, as IRWIN IDs were introduced in 2015 and this dataset goes back to 1984. Otherwise, the dataset has already been cleaned by MTBS.

In [28]:
df.isna().sum()

mtbs_id               0
irwin_id          18946
incident_name         0
burned_acreage        0
latitude              0
longitude             0
ignition_date         0
geometry              0
year                  0
dtype: int64

## Save Data

In [32]:
df_sans_perimeters = df.drop(columns=['geometry'])

In [34]:
df_sans_perimeters.to_json('data/mtbs.json')

In [35]:
df.to_file('data/mtbs_incl_perimeters.geojson', driver='GeoJSON')  