# FEMA Disaster Cost Forecasting
#### Capstone 2 - Pre-processing and Training Data Development
Michael Garber


* NOTE: Please run FEMA data wrangling workbook (below) before this one to create data needed.
    * On Github 
        * > [FEMA-DataWrangling.ipynb on Github](https://github.com/mdgarber/FEMADisasterCostForecasting/blob/ef70129c4bf06a38b13c61e1254fdb6a3105486b/femadisastercostforecasting/notebooks/FEMA-DataWrangling.ipynb)
    * OR local path
        * > /FEMADisasterCostForecasting/femadisastercostforecasting/notebooks/FEMA-DataWrangling.ipynb



#### Pre-processing and Training Data Development High-Level Steps
1. Creating dummy features
2. Scale standardization
3. Split data into training and testing subsets

Goal: Create a cleaned development dataset you can use to complete the
modeling step of your project.

## Step 0 - Import libraries & load & clean data

In [5]:
# import libraries
import pandas as pd

In [7]:
# load data
femaDataCleanV2 = pd.read_csv('../data/interim/femaDataCleanV2.csv')

In [9]:
# Clean data

# Create new V3 df & Set index to disasterNumber
femaDataCleanV3 = femaDataCleanV2.set_index('disasterNumber')

# drop useless columns: 'unnamed: 0', etc
femaDataCleanV3.drop(['Unnamed: 0'], axis=1, inplace=True)

# handle NaNs and NULLs


## Step 1 - Creating Dummy Features

In [12]:
# view all columns
print(femaDataCleanV3.columns)

Index(['declarationDate', 'disasterName', 'incidentBeginDate',
       'incidentEndDate', 'declarationType', 'stateCode', 'stateName',
       'incidentType', 'entryDate', 'updateDate', 'closeoutDate', 'region',
       'ihProgramDeclared', 'iaProgramDeclared', 'paProgramDeclared',
       'hmProgramDeclared', 'designatedIncidentTypes',
       'declarationRequestDate', 'id_x', 'hash_x', 'lastRefresh_x',
       'totalNumberIaApproved', 'totalAmountIhpApproved',
       'totalAmountHaApproved', 'totalAmountOnaApproved',
       'totalObligatedAmountPa', 'totalObligatedAmountCatAb',
       'totalObligatedAmountCatC2g', 'paLoadDate', 'iaLoadDate',
       'totalObligatedAmountHmgp', 'hash_y', 'lastRefresh_y', 'id_y',
       'totalDisasterCost'],
      dtype='object')


__Choose categorical variables__
- incidentType
- stateCode
- region
- ihProgramDeclared (already a boolean)
- iaProgramDeclared (already a boolean)
- paProgramDeclared (already a boolean)
- hmProgramDeclared (already a boolean)
- designatedIncidentTypes * multi-value field *

In [15]:
#check for NULLs
femaDataCleanV3[['incidentType', 'stateCode', 'region','ihProgramDeclared', 'iaProgramDeclared', 'paProgramDeclared', 'hmProgramDeclared']].isnull().sum()

incidentType           0
stateCode              0
region                 0
ihProgramDeclared    251
iaProgramDeclared    251
paProgramDeclared    251
hmProgramDeclared    251
dtype: int64

__Fields with NULLs found__
- ihProgramDeclared
- iaProgramDeclared
- paProgramDeclared
- hmProgramDeclared

In [18]:
#Handle NULLS for categorical features
femaDataCleanV3 = femaDataCleanV3[['ihProgramDeclared', 'iaProgramDeclared', 'paProgramDeclared', 'hmProgramDeclared']].fillna(0)

In [20]:
#check for NULLs after fillna
femaDataCleanV3[['ihProgramDeclared', 'iaProgramDeclared', 'paProgramDeclared', 'hmProgramDeclared']].isnull().sum()

ihProgramDeclared    0
iaProgramDeclared    0
paProgramDeclared    0
hmProgramDeclared    0
dtype: int64

In [24]:
# Create Dummies for incidentType, stateCode, region

In [22]:
# Update Dummy fields for designatedIncidentTypes

In [None]:
# designatedIncidentTypes Key
'''
0: Not applicable
1: Explosion
2: Straight-Line Winds
3: Tidal Wave
4: Tropical Storm
5: Winter Storm
8: Tropical Depression
A: Tsunami
B: Biological
C: Coastal Storm
D: Drought
E: Earthquake
F: Flood
G: Freezing
H: Hurricane
I: Terrorist
J: Typhoon
K: Dam/Levee Break
L: Chemical
M: Mud/Landslide
N: Nuclear
O: Severe Ice Storm
P: Fishing Losses
Q: Crop Losses
R: Fire
S: Snowstorm
T: Tornado
U: Civil Unrest
V: Volcanic Eruption
W: Severe Storm
X: Toxic Substances
Y: Human Cause
Z: Other
'''

## Step 2 - Scale Standardization

#### Step 2 Stuff here

## Step 3 - Split data into training and testing subsets

#### Step 3 Stuff here

In [None]:
#TODO

'''
-create dummies
-How to create dummy variables then update multiple fields based on a single column? create a function of if statements

'''