# FEMA Disaster Cost Forecasting
#### Capstone 2 - Pre-processing and Training Data Development
Michael Garber


* NOTE: Please run FEMA data wrangling workbook (below) before this one to create data needed.
    * On Github 
        * > [FEMA-DataWrangling.ipynb on Github](https://github.com/mdgarber/FEMADisasterCostForecasting/blob/ef70129c4bf06a38b13c61e1254fdb6a3105486b/femadisastercostforecasting/notebooks/FEMA-DataWrangling.ipynb)
    * OR local path
        * > /FEMADisasterCostForecasting/femadisastercostforecasting/notebooks/FEMA-DataWrangling.ipynb



#### Pre-processing and Training Data Development High-Level Steps
1. Creating dummy features
2. Scale standardization
3. Split data into training and testing subsets

Goal: Create a cleaned development dataset you can use to complete the
modeling step of your project.

## Step 0 - Import libraries & load & clean data

In [160]:
# import libraries
import pandas as pd

In [162]:
# load data
femaDataCleanV2 = pd.read_csv('../data/interim/femaDataCleanV2.csv')

In [164]:
# Clean data

# Create new V3 df & Set index to disasterNumber
femaDataCleanV3 = femaDataCleanV2.set_index('disasterNumber')

# drop useless columns: 'unnamed: 0', etc
femaDataCleanV3.drop(['Unnamed: 0'], axis=1, inplace=True)

# handle NaNs and NULLs
#columns to handle: 'ihProgramDeclared', 'iaProgramDeclared', 'paProgramDeclared', 'hmProgramDeclared'

#TODO: set NaNs to 0

## Step 1 - Creating Dummy Features

In [167]:
# view all columns
print(femaDataCleanV3.columns)

Index(['declarationDate', 'disasterName', 'incidentBeginDate',
       'incidentEndDate', 'declarationType', 'stateCode', 'stateName',
       'incidentType', 'entryDate', 'updateDate', 'closeoutDate', 'region',
       'ihProgramDeclared', 'iaProgramDeclared', 'paProgramDeclared',
       'hmProgramDeclared', 'designatedIncidentTypes',
       'declarationRequestDate', 'id_x', 'hash_x', 'lastRefresh_x',
       'totalNumberIaApproved', 'totalAmountIhpApproved',
       'totalAmountHaApproved', 'totalAmountOnaApproved',
       'totalObligatedAmountPa', 'totalObligatedAmountCatAb',
       'totalObligatedAmountCatC2g', 'paLoadDate', 'iaLoadDate',
       'totalObligatedAmountHmgp', 'hash_y', 'lastRefresh_y', 'id_y',
       'totalDisasterCost'],
      dtype='object')


In [169]:
# Choose categorical variables
'''
incidentType
stateCode
region
designatedIncidentTypes
'''

'\nincidentType\nstateCode\nregion\ndesignatedIncidentTypes\n'

In [173]:
# designatedIncidentTypes Key
'''
0: Not applicable
1: Explosion
2: Straight-Line Winds
3: Tidal Wave
4: Tropical Storm
5: Winter Storm
8: Tropical Depression
A: Tsunami
B: Biological
C: Coastal Storm
D: Drought
E: Earthquake
F: Flood
G: Freezing
H: Hurricane
I: Terrorist
J: Typhoon
K: Dam/Levee Break
L: Chemical
M: Mud/Landslide
N: Nuclear
O: Severe Ice Storm
P: Fishing Losses
Q: Crop Losses
R: Fire
S: Snowstorm
T: Tornado
U: Civil Unrest
V: Volcanic Eruption
W: Severe Storm
X: Toxic Substances
Y: Human Cause
Z: Other
'''

'\n0: Not applicable\n1: Explosion\n2: Straight-Line Winds\n3: Tidal Wave\n4: Tropical Storm\n5: Winter Storm\n8: Tropical Depression\nA: Tsunami\nB: Biological\nC: Coastal Storm\nD: Drought\nE: Earthquake\nF: Flood\nG: Freezing\nH: Hurricane\nI: Terrorist\nJ: Typhoon\nK: Dam/Levee Break\nL: Chemical\nM: Mud/Landslide\nN: Nuclear\nO: Severe Ice Storm\nP: Fishing Losses\nQ: Crop Losses\nR: Fire\nS: Snowstorm\nT: Tornado\nU: Civil Unrest\nV: Volcanic Eruption\nW: Severe Storm\nX: Toxic Substances\nY: Human Cause\nZ: Other\n'

## Step 2 - Scale Standardization

#### Step 2 Stuff here

## Step 3 - Split data into training and testing subsets

#### Step 3 Stuff here