# Disaster Data Merge

In this notebook, we'll be reading in our **cleaned fema dataset** and combining this with some additional information based on FEMA's **distaster code dataset**. This additional dataset gives us purview into the type of distaster, among other relevant features that will be important to our end product.

### Read In Data

We'll start by reading in our data and taking a look at the columns for our *new* dataset.

In [2]:
# To generate and store data.
import numpy as np
import pandas as pd

In [3]:
#Read in files from .csvs
clean_df = pd.read_csv('./data/fema_clean.csv')
codes_df = pd.read_csv('./data/disaster_code_api.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [4]:
#Take a look at the columns we generated
codes_df.columns

Index(['declarationDate', 'declaredCountyArea', 'disasterCloseOutDate',
       'disasterNumber', 'disasterType', 'fyDeclared', 'hash',
       'hmProgramDeclared', 'iaProgramDeclared', 'id', 'ihProgramDeclared',
       'incidentBeginDate', 'incidentEndDate', 'incidentType', 'lastRefresh',
       'paProgramDeclared', 'placeCode', 'state', 'title'],
      dtype='object')

### Clean Columns Prior to Merge

We want to be consistent with our column naming convention (snake case). Let's take some steps to clean up our column names before proceeding any further.

In [5]:
#Disaster code column name clean
codes_df.columns = map(str.lower, codes_df.columns)
codes_df.columns = codes_df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')


We have a significant amount of information to sift through in this dataset. For our purposes, we're really only convered with *5* columns. We'll now drop all irrelevant columns and remove duplicates.

In [6]:
#Disaster code df feature reduction and drop duplicate entries for disasters
codes_df = codes_df[['disasternumber','disastertype', 'incidenttype', 'incidentbegindate', 'incidentenddate']]
codes_df = codes_df.drop_duplicates('disasternumber')


### Merge DataFrames

Now we can merge our disparate dataframes on *disaster_number.*

In [7]:
#join em
combined_df = pd.merge(clean_df, codes_df, left_on='disaster_number', right_on='disasternumber', how='left')


Check the shapes of our dataframes for consistency, in addition to the column names

In [8]:
clean_df.shape, codes_df.shape

((90579, 23), (3932, 5))

In [9]:
combined_df.shape

(90579, 28)

In [274]:
combined_df.columns

Index(['disaster_number', 'state', 'county', 'city', 'zipCode', 'valid_registration', 'avg_damage', 'tot_inspected', 'tot_damage', 'no_damage', 'inspect_1_10000', 'inspect_10001_20000', 'inspect_20001_30000', 'inspect_greater_30000', 'approve_assistance', 'tot_approve_ihp_amt', 'repair_replace_amt', 'rental_amt', 'other_needs_amt', 'approve_1_10000', 'approve_10001_25000', 'approve_25001_max', 'tot_max_grants', 'disasternumber', 'disastertype', 'incidenttype', 'incidentbegindate', 'incidentenddate'], dtype='object')

### Read to .csv

Good to go! Read this to .csv file.

In [276]:
combined_df.to_csv('combined.csv', index=False)