## Intro
This notebook aggregates the environmental data by event whereas before we were looking at the data by date. 

### Calculate number of locations that flooded

In [1]:
%matplotlib inline
from db_scripts.focus_intersection import subset_floods, flood_df, subset_locations
from db_scripts.get_server_data import get_table_for_variable, get_db_table_as_df, data_dir
import pandas as pd
import numpy as np
pd.options.mode.chained_assignment = None  # default='warn'

In this case we are just focusing on the subset of points that is in the downtown area thus the "subset_floods."

In [2]:
event_total_flooded = subset_floods['event'].value_counts()

In [3]:
grouped = subset_floods.groupby('event')

Get the number of dates the event spanned, the number of unique locations that were flooded during the event and the total number of locations flooded on all event dates. 

In [4]:
event_dates = grouped['_date'].unique()
num_event_dates = grouped['_date'].nunique()
num_locations = grouped['location'].nunique()

In [5]:
event_df = pd.concat([event_dates, event_total_flooded, num_event_dates, num_locations], axis=1)
event_df.columns = ['dates', 'num_flooded', 'num_dates', 'num_locations']
event_df.reset_index(inplace=True)
event_df.head()

Unnamed: 0,index,dates,num_flooded,num_dates,num_locations
0,01/15/2016 (1/15/2016),[2016-01-15T00:00:00.000000000],1,1,1
1,09/02/15 (9/2/2015),[2015-09-02T00:00:00.000000000],1,1,1
2,7/10 Thunderstorms (7/10/2014),[2014-07-10T00:00:00.000000000],27,1,27
3,Bernie (Training) (7/25/2016),[2016-07-25T00:00:00.000000000],1,1,1
4,February 24th Storm (2/24/2016),[2016-02-24T00:00:00.000000000],1,1,1


Split the event dates and event names into two columns.

In [6]:
s = pd.Series(event_df['index'])
s = s.str.replace('\(Training\)', 'Training')  # remove parenth around training to get right split in next line

event_date_names = s.str.split("(", expand=True)
event_names = event_date_names[0]
event_date = event_date_names[1].str.replace("\)", "")
event_date = pd.to_datetime(event_date)
event_df['event_name'] = event_names.str.strip()
event_df['event_date'] = event_date
del event_df['index']
event_df.set_index(['event_date', 'event_name'], inplace=True)
event_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,dates,num_flooded,num_dates,num_locations
event_date,event_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-01-15,01/15/2016,[2016-01-15T00:00:00.000000000],1,1,1
2015-09-02,09/02/15,[2015-09-02T00:00:00.000000000],1,1,1
2014-07-10,7/10 Thunderstorms,[2014-07-10T00:00:00.000000000],27,1,27
2016-07-25,Bernie Training,[2016-07-25T00:00:00.000000000],1,1,1
2016-02-24,February 24th Storm,[2016-02-24T00:00:00.000000000],1,1,1


### Where num_flooded does not equal num_locations _investigation_
Let's checkout one of the events where the num_flooded is greater than the num_locations. I would expect this to mean that one location was flooded on multiple days of the same event. But for '2014-07-24' the event is only on one day so that isn't what I expected.

In [7]:
idx = pd.IndexSlice
event_df.sort_index(inplace=True)
event_df.loc[idx['2014-07-24', :], :]

Unnamed: 0_level_0,Unnamed: 1_level_0,dates,num_flooded,num_dates,num_locations
event_date,event_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2014-07-24,unnamed,[2014-07-24T00:00:00.000000000],8,1,7


In [8]:
fl_724 = subset_floods[subset_floods['_date'] == '2014-07-24']
fl_724[fl_724['location'].duplicated(keep=False)]

Unnamed: 0.1,Unnamed: 0,﻿recid,location,event,eventType,xcoord,ycoord,dt,_date,_time
444,444,4264,HAMPTON BOULEVARD & W 21ST STREET,unnamed (7/24/2014),Flooded street,12125900.0,3484891.0,2014-07-24 20:29:25.000,2014-07-24,2014-07-24 20:29:25.000
445,445,4265,HAMPTON BOULEVARD & W 21ST STREET,unnamed (7/24/2014),Flooded underpass,12125900.0,3484891.0,2014-07-24 20:29:25.000,2014-07-24,2014-07-24 20:29:25.000


So _here's_ what is happening. The location name is the same in two rows but there are two different event types: "flooded street" and "flooded underpass."
Now that I think about it, that may explain all the differences between the num_location and num_flooded columns. Let's try another one, this time one that spans more than one day: Irene.

In [9]:
event_df.sort_index(inplace=True)
event_df.loc[idx[:, 'Irene'], :]

Unnamed: 0_level_0,Unnamed: 1_level_0,dates,num_flooded,num_dates,num_locations
event_date,event_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2011-08-27,Irene,"[2011-08-27T00:00:00.000000000, 2011-08-28T00:...",32,2,30


In [10]:
irene = subset_floods[subset_floods['event'].str.contains('Irene')].sort_values('location')
irene[irene['location'].duplicated(keep=False)]

Unnamed: 0.1,Unnamed: 0,﻿recid,location,event,eventType,xcoord,ycoord,dt,_date,_time
182,182,1151,1000 BLOCK OF E VIRGINIA BEACH BOULEVARD,Irene (8/27/2011),Flooded underpass,12134230.0,3478210.0,2011-08-28 04:18:01.000,2011-08-28,2011-08-28 04:18:01.000
181,181,1150,1000 BLOCK OF E VIRGINIA BEACH BOULEVARD,Irene (8/27/2011),Flooded street,12134230.0,3478210.0,2011-08-28 04:18:01.000,2011-08-28,2011-08-28 04:18:01.000
104,104,926,E 21ST STREET & MONTICELLO AVENUE,Irene (8/27/2011),Flooded street,12131100.0,3482796.0,2011-08-27 06:08:00.000,2011-08-27,2011-08-27 06:08:00.000
185,185,1248,E 21ST STREET & MONTICELLO AVENUE,Irene (8/27/2011),Flooded underpass,12131100.0,3482796.0,2011-08-28 08:44:35.000,2011-08-28,2011-08-28 08:44:35.000


Looks like that's it. Which is not what I was hoping to show. I was thinking that that tell me something about the variety of locations that were flooded over the days but that's not the case.

Let's try this one more time with Hurricane Joaquin

In [11]:
jqn = flood_df[flood_df['event'].str.contains('Joaquin')]

In [12]:
jqn[jqn['location'].duplicated(keep=False)]

Unnamed: 0.1,Unnamed: 0,﻿recid,location,event,eventType,xcoord,ycoord,dt,_date,_time


So that is interesting. Even though for hurricanes Matthew and Joaquin, the seven and six days respectively, none
of the flooded locations were reported twice for one event. Very interesting. So to me, this means we really should be looking at these things by 'event' and not by '\_date'. It also means that the num_locations col doesn't add any information. So imma delete that.

In [13]:
del event_df['num_locations']

***

### Looking into date in "event" column versus dates in "\_date" column
Sometimes the date listed in the "event" column is quite different than the date(s) listed in the "\_date" column. A good example of this is the event "unnamed (2/25/2016)" where the dates in the "\_date" column are 2016-05-05, 2016-05-06, and 2016-05-31"

In [14]:
flood_df[flood_df['event'].str.contains('2/25/2016')]

Unnamed: 0.1,Unnamed: 0,﻿recid,location,event,eventType,xcoord,ycoord,dt,_date,_time
760,760,4815,19TH BAY STREET & PLEASANT AVENUE,unnamed (2/25/2016),Flooded street,12156460.0,3505946.0,2016-05-05 20:43:53.000,2016-05-05,2016-05-05 20:43:53.000
761,761,4816,20TH BAY STREET & PLEASANT AVENUE,unnamed (2/25/2016),Flooded street,12156800.0,3505887.0,2016-05-05 20:44:41.000,2016-05-05,2016-05-05 20:44:41.000
762,762,4817,BOUSH STREET & W OLNEY ROAD,unnamed (2/25/2016),Flooded street,12129210.0,3478803.0,2016-05-05 20:46:10.000,2016-05-05,2016-05-05 20:46:10.000
763,763,4818,900 BLOCK OF E CHARLOTTE STREET,unnamed (2/25/2016),Flooded street,12132230.0,3476292.0,2016-05-05 20:51:34.000,2016-05-05,2016-05-05 20:51:34.000
764,764,4819,LLEWELLYN AVENUE & W VIRGINIA BEACH BOULEVARD,unnamed (2/25/2016),Flooded street,12129060.0,3479121.0,2016-05-05 20:52:17.000,2016-05-05,2016-05-05 20:52:17.000
765,765,4820,ORLEANS STREET & LAFAYETTE AVENUE,unnamed (2/25/2016),Flooded street,12137870.0,3513103.0,2016-05-06 20:51:41.000,2016-05-06,2016-05-06 20:51:41.000
766,766,4821,GRANBY STREET & LLEWELLYN AVENUE,unnamed (2/25/2016),Flooded street,12131870.0,3490070.0,2016-05-06 20:52:52.000,2016-05-06,2016-05-06 20:52:52.000
767,767,4823,DUKE STREET & W OLNEY ROAD,unnamed (2/25/2016),Flooded street,12128850.0,3478992.0,2016-05-31 08:45:33.000,2016-05-31,2016-05-31 08:45:33.000


So to look at this more closely, I will calculate the difference in days between the "event" column date and the dates in the "\_date" column.

When I tried to calculate the time between the 'event_date' and the 'dates' to see how far off these were I found that two events had the same 'event_date'. So I think it's appropriate to drop the 'unnamed' one based on the fact that the dates in the "\_date" column are further from the "event_date".

In [15]:
event_df.sort_index(inplace=True)
event_df.loc[idx['2016-07-30', :], :]

Unnamed: 0_level_0,Unnamed: 1_level_0,dates,num_flooded,num_dates
event_date,event_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2016-07-30,Thunderstorm,[2016-07-30T00:00:00.000000000],3,1
2016-07-30,unnamed,"[2016-08-02T00:00:00.000000000, 2016-08-31T00:...",4,2


In [16]:
i = event_df.loc[['2016-07-30', 'unnamed'],:].index
event_df.drop(i, inplace=True)

In [17]:
event_df.reset_index(inplace=True)
event_df.set_index('event_date', inplace=True)

In [18]:
days_away = []
max_days = []
for d in event_df.index:
    ar = event_df.loc[d, 'dates'] - np.datetime64(d)
    ar = ar.astype('timedelta64[D]')
    days = ar / np.timedelta64(1, 'D')
    days_away.append(days)
    max_days.append(days.max())
event_df['days_away_from_event'] = days_away
event_df['max_days_away'] = max_days
print event_df.shape
event_df.head()

(33, 6)


Unnamed: 0_level_0,event_name,dates,num_flooded,num_dates,days_away_from_event,max_days_away
event_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2010-09-30,Nicole,"[2010-09-30T00:00:00.000000000, 2010-10-01T00:...",48,3,"[0.0, 1.0, 4.0]",4.0
2011-08-27,Irene,"[2011-08-27T00:00:00.000000000, 2011-08-28T00:...",32,2,"[0.0, 1.0]",1.0
2012-10-28,Sandy,"[2012-10-28T00:00:00.000000000, 2012-10-29T00:...",45,2,"[0.0, 1.0]",1.0
2013-10-09,Heavy Rain,"[2013-10-08T00:00:00.000000000, 2013-10-09T00:...",6,3,"[-1.0, 0.0, 1.0]",1.0
2014-05-16,Heavy Rain,[2014-05-16T00:00:00.000000000],21,1,[0.0],0.0


I don't trust the events that have higher days away so I will disregard any event with a "max_days_away" greater than 10. Five events fall under this category.

In [19]:
event_filt = event_df[event_df['max_days_away']<10]
event_df = event_filt
event_filt.shape


(28, 6)

## Now we'll get the rainfall, groundwater, tide, and wind for the events
First we need to get all of the data for the variables, aggregate it in various ways up to a daily time step and combine it into a dataframe

In [20]:
rain_df = get_table_for_variable('rainfall')
gw_df = get_table_for_variable('groundwater')
tide_df = get_table_for_variable('tide')
wind_dir_df = get_table_for_variable('wind_dir')
wind_vel_df = get_table_for_variable('wind_vel')

rain_df.sort_index(inplace=True)
gw_df.sort_index(inplace=True)
tide_df.sort_index(inplace=True)

# aggregate the rainfall in various ways
rain_grouped = rain_df.groupby('SiteID')

rain_daily = rain_grouped.resample('D').agg({'Value':np.sum, 'SiteID':np.mean, 'VariableID':np.mean})
rain_daily.reset_index(level=0, drop=True, inplace=True)
rain_daily_mean = rain_daily.resample('D').mean()

rain_hourly_totals = rain_grouped.rolling(window=4).sum()
rain_hourly_totals.reset_index(level=0, drop=True, inplace=True)
rain_hourly_max = rain_hourly_totals.resample('D').max()
rain_max_15_min_all = rain_df.resample('D').max()

rain_prev_3_days = rain_grouped.resample('D').sum().rolling(window=3).sum()
rain_prev_3_days.reset_index(level=0, drop=True, inplace=True)
rain_prev_3_days = rain_prev_3_days.resample('D').mean()

gw_daily_avg = gw_df.resample('D').mean()
tide_daily_avg = tide_df.resample('D').mean()

wind_dir_daily_avg = wind_dir_df.resample('D').mean()
wind_vel_daily_avg = wind_vel_df.resample('D').mean()
wind_vel_hourly_max_avg = wind_vel_df.resample('H').max().resample('D').mean()




In [21]:
feat_columns = ['rain_daily_sum',
                'rain_hourly_max',
                'rain_15_min_max',
                'rain_prev_3_days',
                'gw_elev_avg',
                'tide_elev_avg',
                'wind_dir_avg',
                'wind_vel_avg',
                'wind_vel_hourly_max_avg'               
               ]

feature_df = pd.concat([rain_daily_mean['Value'],
                        rain_hourly_max['Value'], 
                        rain_max_15_min_all['Value'], 
                        rain_prev_3_days['Value'], 
                        gw_daily_avg['Value'], 
                        tide_daily_avg['Value'],
                        wind_dir_daily_avg['Value'],
                        wind_vel_daily_avg['Value'],
                        wind_vel_hourly_max_avg['Value']
                       ], 
                       axis=1)
feature_df.columns = feat_columns
feature_df = feature_df["2010-01-01": "2016-10-31"]
feature_df.head()


Unnamed: 0_level_0,rain_daily_sum,rain_hourly_max,rain_15_min_max,rain_prev_3_days,gw_elev_avg,tide_elev_avg,wind_dir_avg,wind_vel_avg,wind_vel_hourly_max_avg
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2010-01-01,0.055,0.04,0.02,0.145,3.165459,0.387667,,,
2010-01-02,0.0,0.0,0.0,0.13,3.236243,0.352125,,,
2010-01-03,0.0,0.0,0.0,0.055,3.125281,-0.853333,,,
2010-01-04,0.05,0.1,0.1,0.05,2.989199,-0.789292,,,
2010-01-05,0.0,0.0,0.0,0.05,2.871405,-0.235708,,,


Now for each event we get an aggregate of the different variables for the given dates

In [22]:
def add_event_data(evnt_data, evnt_df, in_col_name, out_col_name, func):
    res = func(evnt_data[in_col_name])
    evnt_df.loc[ind, out_col_name] = res
    return evnt_df
    
    

In [23]:
event_df = pd.concat([event_df, pd.DataFrame(columns=feat_columns)])



for ind in event_df.index:
    # get the dates of the event and include the date in the "event" column
    ds = event_df.loc[ind, 'dates']
    ind = np.datetime64(ind)
    ds = np.append(ds, ind) if not ind in ds else ds
    
    # daily rainfall
    event_data = feature_df.loc[ds]
    
    event_df = add_event_data(event_data, event_df, 'rain_daily_sum', 'rain_event_total', np.sum)
    event_df = add_event_data(event_data, event_df, 'rain_daily_sum', 'rain_daily_max', np.max)

    # hourly rainfall
    event_df = add_event_data(event_data, event_df, 'rain_hourly_max', 'rain_hourly_max', np.max)
    
    # max fifteen min rainfall
    event_df = add_event_data(event_data, event_df, 'rain_15_min_max', 'rain_15_min_max', np.max)
    
    # three day rainfall previous to the first day of the event
    event_df = add_event_data(event_data.loc[ds[0]], event_df, 'rain_prev_3_days', 'rain_prev_3_days', np.mean)
    
    # avg gw level
    event_df = add_event_data(event_data, event_df, 'gw_elev_avg', 'gw_elev_avg', np.mean)
    
    # avg tide level
    event_df = add_event_data(event_data, event_df, 'tide_elev_avg', 'tide_elev_avg', np.mean)
    
    # max avg daily wind
    event_df = add_event_data(event_data, event_df, 'wind_dir_avg', 'wind_dir_avg', np.mean)
    
    # max hourly avg wind
    event_df = add_event_data(event_data, event_df, 'wind_vel_hourly_max_avg', 'wind_vel_hourly_max_avg', 
                              np.max)
    
    # max avg daily wind
    event_df = add_event_data(event_data, event_df, 'wind_vel_avg', 'wind_vel_avg', np.max)
  
    
event_df.head()

Unnamed: 0,dates,days_away_from_event,event_name,gw_elev_avg,max_days_away,num_dates,num_flooded,rain_15_min_max,rain_daily_sum,rain_hourly_max,rain_prev_3_days,tide_elev_avg,wind_dir_avg,wind_vel_avg,wind_vel_hourly_max_avg,rain_event_total,rain_daily_max
2010-09-30,"[2010-09-30T00:00:00.000000000, 2010-10-01T00:...","[0.0, 1.0, 4.0]",Nicole,1.44729,4.0,3.0,48.0,0.67,,1.59,11.695,1.11147,,,,11.415,10.255
2011-08-27,"[2011-08-27T00:00:00.000000000, 2011-08-28T00:...","[0.0, 1.0]",Irene,1.03384,1.0,2.0,32.0,0.33,,1.22,8.245,1.4935,,,,7.895,7.82
2012-10-28,"[2012-10-28T00:00:00.000000000, 2012-10-29T00:...","[0.0, 1.0]",Sandy,1.34151,1.0,2.0,45.0,0.21,,0.58,2.495,2.90383,132.216,17.9594,22.1479,3.795,2.0
2013-10-09,"[2013-10-08T00:00:00.000000000, 2013-10-09T00:...","[-1.0, 0.0, 1.0]",Heavy Rain,0.869149,1.0,3.0,6.0,0.22,,0.5,0.685,1.94981,52.9563,14.9616,18.4242,4.32,3.14
2014-05-16,[2014-05-16T00:00:00.000000000],[0.0],Heavy Rain,2.3677,0.0,1.0,21.0,0.76,,2.23,3.88333,0.328958,216.666,5.01215,8.27917,3.853333,3.853333


In [24]:
event_df

Unnamed: 0,dates,days_away_from_event,event_name,gw_elev_avg,max_days_away,num_dates,num_flooded,rain_15_min_max,rain_daily_sum,rain_hourly_max,rain_prev_3_days,tide_elev_avg,wind_dir_avg,wind_vel_avg,wind_vel_hourly_max_avg,rain_event_total,rain_daily_max
2010-09-30,"[2010-09-30T00:00:00.000000000, 2010-10-01T00:...","[0.0, 1.0, 4.0]",Nicole,1.44729,4.0,3.0,48.0,0.67,,1.59,11.695,1.11147,,,,11.415,10.255
2011-08-27,"[2011-08-27T00:00:00.000000000, 2011-08-28T00:...","[0.0, 1.0]",Irene,1.03384,1.0,2.0,32.0,0.33,,1.22,8.245,1.4935,,,,7.895,7.82
2012-10-28,"[2012-10-28T00:00:00.000000000, 2012-10-29T00:...","[0.0, 1.0]",Sandy,1.34151,1.0,2.0,45.0,0.21,,0.58,2.495,2.90383,132.216,17.9594,22.1479,3.795,2.0
2013-10-09,"[2013-10-08T00:00:00.000000000, 2013-10-09T00:...","[-1.0, 0.0, 1.0]",Heavy Rain,0.869149,1.0,3.0,6.0,0.22,,0.5,0.685,1.94981,52.9563,14.9616,18.4242,4.32,3.14
2014-05-16,[2014-05-16T00:00:00.000000000],[0.0],Heavy Rain,2.3677,0.0,1.0,21.0,0.76,,2.23,3.88333,0.328958,216.666,5.01215,8.27917,3.853333,3.853333
2014-06-19,[2014-06-20T00:00:00.000000000],[1.0],Thunderstorms,1.63786,1.0,1.0,5.0,0.54,,1.13,0.843333,0.317563,176.878,5.92839,8.58375,0.843333,0.843333
2014-07-09,[2014-07-09T00:00:00.000000000],[0.0],Thunderstorms,1.33984,0.0,1.0,1.0,0.63,,1.28,1.48,-0.495208,230.59,7.93232,12.4308,1.48,1.48
2014-07-10,[2014-07-10T00:00:00.000000000],[0.0],7/10 Thunderstorms,1.41384,0.0,1.0,27.0,1.27,,1.97,3.76667,-0.213583,206.126,4.20326,7.67583,2.286667,2.286667
2014-07-24,[2014-07-24T00:00:00.000000000],[0.0],unnamed,1.69844,0.0,1.0,8.0,0.64,,1.3,1.89,0.223333,180.255,5.22707,7.87125,1.89,1.89
2014-09-04,[2014-09-04T00:00:00.000000000],[0.0],Thunderstorm,1.02555,0.0,1.0,2.0,0.77,,2.11,2.14667,0.199708,194.608,3.53836,6.0925,2.146667,2.146667


### Combining with the non-flooding event data
First we have to combine all the dates in the "dates" column of the event_df into one array so we can filter those out of the overall dataset.

In [25]:
flooded_dates = [np.datetime64(i) for i in event_df.index]
flooded_dates = np.array(flooded_dates)
fl_event_dates = np.concatenate(event_df['dates'].tolist())
all_fl_dates = np.concatenate([fl_event_dates, flooded_dates])

In [26]:
non_flooded_records = feature_df[feature_df.index.isin(all_fl_dates) != True]
non_flooded_records['num_flooded'] = 0
non_flooded_records['flooded'] = False
non_flooded_records['event_name'] = np.nan
non_flooded_records['event_date'] = non_flooded_records.index
non_flooded_records.reset_index(drop=True, inplace=True)
non_flooded_records.head()

Unnamed: 0,rain_daily_sum,rain_hourly_max,rain_15_min_max,rain_prev_3_days,gw_elev_avg,tide_elev_avg,wind_dir_avg,wind_vel_avg,wind_vel_hourly_max_avg,num_flooded,flooded,event_name,event_date
0,0.055,0.04,0.02,0.145,3.165459,0.387667,,,,0,False,,2010-01-01
1,0.0,0.0,0.0,0.13,3.236243,0.352125,,,,0,False,,2010-01-02
2,0.0,0.0,0.0,0.055,3.125281,-0.853333,,,,0,False,,2010-01-03
3,0.05,0.1,0.1,0.05,2.989199,-0.789292,,,,0,False,,2010-01-04
4,0.0,0.0,0.0,0.05,2.871405,-0.235708,,,,0,False,,2010-01-05


Combine with flooded events

In [27]:
event_df.reset_index(inplace=True)
flooded_records = event_df
flooded_records['event_date'] = event_df['index']
flooded_records['rain_daily_sum'] = event_df['rain_daily_max']
flooded_records['flooded'] = True
flooded_records.head()

Unnamed: 0,index,dates,days_away_from_event,event_name,gw_elev_avg,max_days_away,num_dates,num_flooded,rain_15_min_max,rain_daily_sum,rain_hourly_max,rain_prev_3_days,tide_elev_avg,wind_dir_avg,wind_vel_avg,wind_vel_hourly_max_avg,rain_event_total,rain_daily_max,event_date,flooded
0,2010-09-30,"[2010-09-30T00:00:00.000000000, 2010-10-01T00:...","[0.0, 1.0, 4.0]",Nicole,1.44729,4.0,3.0,48.0,0.67,10.255,1.59,11.695,1.11147,,,,11.415,10.255,2010-09-30,True
1,2011-08-27,"[2011-08-27T00:00:00.000000000, 2011-08-28T00:...","[0.0, 1.0]",Irene,1.03384,1.0,2.0,32.0,0.33,7.82,1.22,8.245,1.4935,,,,7.895,7.82,2011-08-27,True
2,2012-10-28,"[2012-10-28T00:00:00.000000000, 2012-10-29T00:...","[0.0, 1.0]",Sandy,1.34151,1.0,2.0,45.0,0.21,2.0,0.58,2.495,2.90383,132.216,17.9594,22.1479,3.795,2.0,2012-10-28,True
3,2013-10-09,"[2013-10-08T00:00:00.000000000, 2013-10-09T00:...","[-1.0, 0.0, 1.0]",Heavy Rain,0.869149,1.0,3.0,6.0,0.22,3.14,0.5,0.685,1.94981,52.9563,14.9616,18.4242,4.32,3.14,2013-10-09,True
4,2014-05-16,[2014-05-16T00:00:00.000000000],[0.0],Heavy Rain,2.3677,0.0,1.0,21.0,0.76,3.853333,2.23,3.88333,0.328958,216.666,5.01215,8.27917,3.853333,3.853333,2014-05-16,True


In [28]:
reformat = pd.concat([flooded_records, non_flooded_records], join='inner')
reformat.head()
reformat.reset_index(inplace=True, drop=True)

In [29]:
reformat.to_csv("{}reformat_by_event.csv".format(data_dir), index=False)