# County-level maps of FEMA disasters

We found a FEMA archive of disasters throughout the USA, split up by county. [Source](https://www.fema.gov/media-library/assets/documents/28318). This notebook performs data wrangling and outputs incidentdf.csv and CSV files.

incidentdf.csv - Consolidated by County FIPS Code.
incidentdf_year.csv - Consolidated by Year and County FIPS Code.

These CSV files inturn shall be used for data visualization in another [notebook](https://github.com/labs12-should-i-live-here/DS/blob/master/notebooks/DMA3%20-%20FEMA%20disasters%20by%20county.ipynb).


## A note about FIPS codes

[FIPS codes](https://en.wikipedia.org/wiki/FIPS_county_code) are unique identifiers for used by the US government for identifying states and counties (or county-equivalent areas). Each county has a full FIPS code that contains two digits for the state and three for the county within that state.

In [0]:
# Generic Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [0]:
# Data based on data.gov.FEMADeclarations.3.15.19
femadf = pd.read_csv('fema_dataset.csv')
femadf.shape

(48555, 16)

In [0]:
femadf.head()

Unnamed: 0,Disaster Number,IH Program Declared,IA Program Declared,PA Program Declared,HM Program Declared,State,Declaration Date,Disaster Type,Incident Type,Title,Incident Begin Date,Incident End Date,Disaster Close Out Date,Place Code,Declared County/Area,Declaration Request Number
0,4419,Yes,No,Yes,Yes,AL,3/5/2019,DR,Tornado,"SEVERE STORMS, STRAIGHT-LINE WINDS, AND TORNADOES",3/3/2019,3/3/2019,,99081.0,Lee (County),19006
1,4418,No,No,Yes,Yes,WA,3/4/2019,DR,Severe Storm(s),"SEVERE WINTER STORMS, STRAIGHT-LINE WINDS, FLO...",12/10/2018,12/24/2018,,99009.0,Clallam (County),19005
2,4418,No,No,Yes,Yes,WA,3/4/2019,DR,Severe Storm(s),"SEVERE WINTER STORMS, STRAIGHT-LINE WINDS, FLO...",12/10/2018,12/24/2018,,99027.0,Grays Harbor (County),19005
3,4418,No,No,Yes,Yes,WA,3/4/2019,DR,Severe Storm(s),"SEVERE WINTER STORMS, STRAIGHT-LINE WINDS, FLO...",12/10/2018,12/24/2018,,99029.0,Island (County),19005
4,4418,No,No,Yes,Yes,WA,3/4/2019,DR,Severe Storm(s),"SEVERE WINTER STORMS, STRAIGHT-LINE WINDS, FLO...",12/10/2018,12/24/2018,,99031.0,Jefferson (County),19005


In [0]:
# Checking for NaNs
femadf.isnull().sum()

Disaster Number                  0
IH Program Declared              0
IA Program Declared              0
PA Program Declared              0
HM Program Declared              0
State                            0
Declaration Date                 0
Disaster Type                    0
Incident Type                    0
Title                            0
Incident Begin Date              0
Incident End Date              339
Disaster Close Out Date       9385
Place Code                     197
Declared County/Area           197
Declaration Request Number       0
dtype: int64

In [0]:
# Drop columns Incident End Date, Disaster Close Out Date to remove NaNs
# 'Place Code' has location information but is not consistent.
femadf.drop(columns=['Incident End Date', 'Disaster Close Out Date', 
                     'Place Code'], inplace=True)

# Considering NaNs in Declared County/Area as Statewide
femadf['Declared County/Area'].fillna('Statewide', inplace = True) 

In [0]:
# Rechecking if we have a clean data
femadf.isnull().sum()

Disaster Number               0
IH Program Declared           0
IA Program Declared           0
PA Program Declared           0
HM Program Declared           0
State                         0
Declaration Date              0
Disaster Type                 0
Incident Type                 0
Title                         0
Incident Begin Date           0
Declared County/Area          0
Declaration Request Number    0
dtype: int64

In [0]:
# Rechecking the shape
femadf.shape

(48555, 13)

In [0]:
# Load state to code dataset and update information in femadf
statedf = pd.read_csv('states_code.csv', index_col=1)
statedf.head()

Unnamed: 0_level_0,State
Abbreviation,Unnamed: 1_level_1
AL,Alabama
AK,Alaska
AZ,Arizona
AR,Arkansas
CA,California


In [0]:
# Get State name from Abbreviation
def getstatename(col):
  return statedf.loc[col]['State']

In [0]:
# Rename State column to StateCode
femadf.rename(columns={'State ':'StateCode'}, inplace=True)

# Update a new column state to match StateCode
femadf['State'] = \
  femadf['StateCode'].apply(getstatename)

In [0]:
femadf['Incident Type'].unique()

array(['Tornado', 'Severe Storm(s)', 'Flood', 'Hurricane', 'Earthquake',
       'Fire', 'Typhoon', 'Snow', 'Coastal Storm', 'Volcano',
       'Mud/Landslide', 'Severe Ice Storm', 'Dam/Levee Break',
       'Toxic Substances', 'Chemical', 'Other', 'Terrorist', 'Freezing',
       'Tsunami', 'Drought', 'Human Cause', 'Fishing Losses'],
      dtype=object)

In [0]:
# Load county data
url = 'https://raw.githubusercontent.com/1wheel/whitehouse-petitions/master/Gaz_counties_national.txt'
countydf = pd.read_csv(url, sep='\t', encoding='ISO-8859-1')

# Replace special characters.
countydf.NAME = countydf.NAME.apply(lambda x: x.replace('á', 'a'))
countydf.NAME = countydf.NAME.apply(lambda x: x.replace('í', 'i'))
countydf.NAME = countydf.NAME.apply(lambda x: x.replace('ñ', 'n'))
countydf.NAME = countydf.NAME.apply(lambda x: x.replace('ó', 'o'))
countydf.NAME = countydf.NAME.apply(lambda x: x.replace('ü', 'u'))

countydf.head()

Unnamed: 0,USPS,GEOID,ANSICODE,NAME,POP10,HU10,ALAND,AWATER,ALAND_SQMI,AWATER_SQMI,INTPTLAT,INTPTLONG
0,AL,1001,161526,Autauga County,54571,22135,1539582278,25775735,594.436,9.952,32.536382,-86.64449
1,AL,1003,161527,Baldwin County,182265,104061,4117521611,1133190229,1589.784,437.527,30.659218,-87.746067
2,AL,1005,161528,Barbour County,27457,11829,2291818968,50864716,884.876,19.639,31.87067,-85.405456
3,AL,1007,161529,Bibb County,22915,8981,1612480789,9289057,622.582,3.587,33.015893,-87.127148
4,AL,1009,161530,Blount County,57322,23887,1669961855,15157440,644.776,5.852,33.977448,-86.567246


In [0]:
# matching index of fema_dataset
def updatecountyinfo(row):
  statecode = row['StateCode']
  county = row['Declared County/Area']
  
  if county == 'Statewide':
    return county
    
  series = countydf[countydf['USPS'] == statecode]['NAME']
  county_words_split = county.split(" ")
  search_string = county_words_split[0]
  output = series[series.str.startswith(search_string, na=False)]
  
  if output.shape[0] == 1:
    return output.iloc[0]
  
  # Check for two words
  if len(county_words_split) > 1:
    search_string = county.split(" ")[0] + " " + county.split(" ")[1]
    output = series[series.str.startswith(search_string, na=False)]
    
    if output.shape[0] == 1:
      return output.iloc[0]
    elif output.shape[0] > 1:
      # More than one selection so choosing 1st.
      return output.iloc[0]
    else:
      return None
    
  if output.shape[0] > 1:
    # More than one selection so choosing 1st.
    return output.iloc[0]

  return None

In [0]:
femadf['Updated County Info'] = femadf.apply(updatecountyinfo, axis=1)

In [0]:
sample = femadf.sample(10)
sample.loc[:,['StateCode','Declared County/Area','State','Updated County Info']]

Unnamed: 0,StateCode,Declared County/Area,State,Updated County Info
46294,MD,Calvert (County),Maryland,Calvert County
27561,IA,Louisa (County),Iowa,Louisa County
23500,NY,Chenango (County),New York,Chenango County
11727,KS,Wabaunsee (County),Kansas,Wabaunsee County
14986,KS,Republic (County),Kansas,Republic County
2402,CA,Yolo (County),California,Yolo County
24053,NE,Nuckolls (County),Nebraska,Nuckolls County
1749,FL,Holmes (County),Florida,Holmes County
34282,NC,Haywood (County),North Carolina,Haywood County
11805,MO,Oregon (County),Missouri,Oregon County


In [0]:
femadf.isnull().sum()

Disaster Number                  0
IH Program Declared              0
IA Program Declared              0
PA Program Declared              0
HM Program Declared              0
StateCode                        0
Declaration Date                 0
Disaster Type                    0
Incident Type                    0
Title                            0
Incident Begin Date              0
Declared County/Area             0
Declaration Request Number       0
State                            0
Updated County Info           1005
dtype: int64

In [0]:
femadf.shape

(48555, 15)

In [0]:
# Dropping rows which has no mapped County names.
femadf.dropna(inplace=True)
femadf.shape

(47550, 15)

In [0]:
# Fetch County FIPS Code with State code
def getcountycode_with_statecode(row):
  statecode = row['StateCode']
  county = row['Updated County Info']
  
  series = countydf[(countydf['USPS'] == statecode) & (countydf['NAME'] == county)]
  output = series['GEOID']
  
  if output.shape[0] == 1:
    return int(output.iloc[0])
    
  # Handling Statewide county code
  series = countydf[(countydf['USPS'] == statecode)]
  output = series['GEOID']
    
  if output.shape[0] < 1:
    return None
  
  stateFIPScode = output.iloc[0] // 1000
  
  return stateFIPScode * 1000

# Fetch County FIPS Code
def getcountycode(row):
  statecode = row['StateCode']
  county = row['Updated County Info']
  
  series = countydf[(countydf['USPS'] == statecode) & (countydf['NAME'] == county)]
  output = series['GEOID']
  
  if output.shape[0] == 1:
    return int(output.iloc[0])
  else:
    return None

In [0]:
femadf['County FIPS Code'] = femadf.apply(getcountycode, axis=1)

In [0]:
femadf.isnull().sum()

Disaster Number                 0
IH Program Declared             0
IA Program Declared             0
PA Program Declared             0
HM Program Declared             0
StateCode                       0
Declaration Date                0
Disaster Type                   0
Incident Type                   0
Title                           0
Incident Begin Date             0
Declared County/Area            0
Declaration Request Number      0
State                           0
Updated County Info             0
County FIPS Code              212
dtype: int64

In [0]:
femadf.shape

(47550, 16)

In [0]:
# Dropping rows which has no mapped County names.
femadf.dropna(inplace=True)
femadf.shape

(47338, 16)

In [0]:
femadf['County FIPS Code'] = femadf['County FIPS Code'].astype(int)

In [0]:
"""
incident_types = ['Tornado', 'Severe Storm(s)', 'Flood', 'Hurricane', 
                 'Earthquake', 'Fire', 'Typhoon', 'Snow', 'Coastal Storm', 
                 'Volcano', 'Mud/Landslide', 'Severe Ice Storm', 
                 'Dam/Levee Break', 'Toxic Substances', 'Chemical', 
                 'Other', 'Terrorist', 'Freezing', 'Tsunami', 'Drought', 
                 'Human Cause', 'Fishing Losses']

for incident_type in incident_types:
  countydf[incident_type] = 0
"""

"\nincident_types = ['Tornado', 'Severe Storm(s)', 'Flood', 'Hurricane', \n                 'Earthquake', 'Fire', 'Typhoon', 'Snow', 'Coastal Storm', \n                 'Volcano', 'Mud/Landslide', 'Severe Ice Storm', \n                 'Dam/Levee Break', 'Toxic Substances', 'Chemical', \n                 'Other', 'Terrorist', 'Freezing', 'Tsunami', 'Drought', \n                 'Human Cause', 'Fishing Losses']\n\nfor incident_type in incident_types:\n  countydf[incident_type] = 0\n"

In [0]:
!pip install category_encoders

Collecting category_encoders
[?25l  Downloading https://files.pythonhosted.org/packages/6e/a1/f7a22f144f33be78afeb06bfa78478e8284a64263a3c09b1ef54e673841e/category_encoders-2.0.0-py2.py3-none-any.whl (87kB)
[K    100% |████████████████████████████████| 92kB 3.5MB/s 
Installing collected packages: category-encoders
Successfully installed category-encoders-2.0.0


In [0]:
import category_encoders as ce

encoder = ce.OneHotEncoder(cols=['Incident Type'])
encoder.fit(femadf)
femadf_encoded = encoder.transform(femadf)

In [0]:
femadf_encoded.columns

Index(['Disaster Number', 'IH Program Declared', 'IA Program Declared',
       'PA Program Declared', 'HM Program Declared', 'StateCode',
       'Declaration Date', 'Disaster Type', 'Incident Type_1',
       'Incident Type_2', 'Incident Type_3', 'Incident Type_4',
       'Incident Type_5', 'Incident Type_6', 'Incident Type_7',
       'Incident Type_8', 'Incident Type_9', 'Incident Type_10',
       'Incident Type_11', 'Incident Type_12', 'Incident Type_13',
       'Incident Type_14', 'Incident Type_15', 'Incident Type_16',
       'Incident Type_17', 'Incident Type_18', 'Incident Type_19',
       'Incident Type_20', 'Incident Type_21', 'Incident Type_22', 'Title',
       'Incident Begin Date', 'Declared County/Area',
       'Declaration Request Number', 'State', 'Updated County Info',
       'County FIPS Code'],
      dtype='object')

In [0]:
"""
# Test code used to compare columns
print(femadf['Incident Type'].unique())

for i in range(1, 23):
  print(femadf_encoded['Incident Type_'+str(i)].sum())

femadf['Incident Type'].value_counts()
"""

"\n# Test code used to compare columns\nprint(femadf['Incident Type'].unique())\n\nfor i in range(1, 23):\n  print(femadf_encoded['Incident Type_'+str(i)].sum())\n\nfemadf['Incident Type'].value_counts()\n"

In [0]:
# Update encoded column names to more meaningful names
col_names = {'Incident Type_1': 'Tornado',
             'Incident Type_2': 'Severe Storm(s)',
             'Incident Type_3': 'Flood',
             'Incident Type_4': 'Hurricane',
             'Incident Type_5': 'Earthquake',
             'Incident Type_6': 'Fire',
             'Incident Type_7': 'Snow',
             'Incident Type_8': 'Coastal Storm', 
             'Incident Type_9': 'Volcano',
             'Incident Type_10': 'Mud/Landslide',
             'Incident Type_11': 'Severe Ice Storm',
             'Incident Type_12': 'Dam/Levee Break',
             'Incident Type_13': 'Toxic Substances',
             'Incident Type_14': 'Chemical',
             'Incident Type_15': 'Other',
             'Incident Type_16': 'Terrorist',
             'Incident Type_17': 'Freezing',
             'Incident Type_18': 'Tsunami', 
             'Incident Type_19': 'Human Cause',
             'Incident Type_20': 'Fishing Losses',
             'Incident Type_21': 'Drought',
             'Incident Type_22': 'Typhoon'
}

femadf_encoded.rename(columns=col_names, inplace=True)
femadf_encoded.drop(columns=['Disaster Number', 
                             'IH Program Declared', 
                             'IA Program Declared',
                             'PA Program Declared', 
                             'HM Program Declared', 
                             'Disaster Type',
                             'Title', 
                             'Incident Begin Date', 
                             'Declared County/Area',
                             'Declaration Request Number'], inplace=True)

In [0]:
femadf_encoded.columns

Index(['StateCode', 'Declaration Date', 'Tornado', 'Severe Storm(s)', 'Flood',
       'Hurricane', 'Earthquake', 'Fire', 'Snow', 'Coastal Storm', 'Volcano',
       'Mud/Landslide', 'Severe Ice Storm', 'Dam/Levee Break',
       'Toxic Substances', 'Chemical', 'Other', 'Terrorist', 'Freezing',
       'Tsunami', 'Human Cause', 'Fishing Losses', 'Drought', 'Typhoon',
       'State', 'Updated County Info', 'County FIPS Code'],
      dtype='object')

In [0]:
femadf_encoded['Declaration Date'] = pd.to_datetime(femadf_encoded['Declaration Date'])
femadf_encoded['Year'] = femadf_encoded['Declaration Date'].dt.year
femadf_encoded['Month'] = femadf_encoded['Declaration Date'].dt.month
femadf_encoded['Day of Month'] = femadf_encoded['Declaration Date'].dt.day
femadf_encoded['Day of Week'] = femadf_encoded['Declaration Date'].dt.weekday

In [0]:
femadf_encoded.head()

Unnamed: 0,StateCode,Declaration Date,Tornado,Severe Storm(s),Flood,Hurricane,Earthquake,Fire,Snow,Coastal Storm,...,Fishing Losses,Drought,Typhoon,State,Updated County Info,County FIPS Code,Year,Month,Day of Month,Day of Week
0,AL,2019-03-05,1,0,0,0,0,0,0,0,...,0,0,0,Alabama,Lee County,1081,2019,3,5,1
1,WA,2019-03-04,0,1,0,0,0,0,0,0,...,0,0,0,Washington,Clallam County,53009,2019,3,4,0
2,WA,2019-03-04,0,1,0,0,0,0,0,0,...,0,0,0,Washington,Grays Harbor County,53027,2019,3,4,0
3,WA,2019-03-04,0,1,0,0,0,0,0,0,...,0,0,0,Washington,Island County,53029,2019,3,4,0
4,WA,2019-03-04,0,1,0,0,0,0,0,0,...,0,0,0,Washington,Jefferson County,53031,2019,3,4,0


In [0]:
femadf_encoded.shape

(47338, 31)

In [0]:
femadf_encoded.columns

Index(['StateCode', 'Declaration Date', 'Tornado', 'Severe Storm(s)', 'Flood',
       'Hurricane', 'Earthquake', 'Fire', 'Snow', 'Coastal Storm', 'Volcano',
       'Mud/Landslide', 'Severe Ice Storm', 'Dam/Levee Break',
       'Toxic Substances', 'Chemical', 'Other', 'Terrorist', 'Freezing',
       'Tsunami', 'Human Cause', 'Fishing Losses', 'Drought', 'Typhoon',
       'State', 'Updated County Info', 'County FIPS Code', 'Year', 'Month',
       'Day of Month', 'Day of Week'],
      dtype='object')

In [0]:
cols=['Tornado', 'Severe Storm(s)', 'Flood', 'Hurricane', 'Earthquake', 'Fire', 
      'Snow', 'Coastal Storm', 'Volcano', 'Mud/Landslide', 'Severe Ice Storm', 
      'Dam/Levee Break', 'Toxic Substances', 'Chemical', 'Other', 'Terrorist', 
      'Freezing', 'Tsunami', 'Human Cause', 'Fishing Losses', 'Drought', 
      'Typhoon']
incidentdf = pd.pivot_table(femadf_encoded,index=['County FIPS Code'],values=cols,aggfunc=np.sum)
incidentdf_year = pd.pivot_table(femadf_encoded,index=['Year','County FIPS Code'],
                            values=cols,aggfunc=np.sum)

In [0]:
incidentdf_year

Unnamed: 0_level_0,Unnamed: 1_level_0,Chemical,Coastal Storm,Dam/Levee Break,Drought,Earthquake,Fire,Fishing Losses,Flood,Freezing,Human Cause,...,Other,Severe Ice Storm,Severe Storm(s),Snow,Terrorist,Tornado,Toxic Substances,Tsunami,Typhoon,Volcano
Year,County FIPS Code,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1959,18021,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1964,6003,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1964,6005,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1964,6007,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1964,6011,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1964,6015,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1964,6017,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1964,6021,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1964,6023,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
1964,6033,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


In [0]:
femadf_encoded.to_csv('femadf_encoded.csv')
femadf.to_csv('femadf.csv')
incidentdf.to_csv('incidentdf.csv')
incidentdf_year.to_csv('incidentdf_year.csv')