# Sandy Inflow Analysis
## Organising the Dataset
We import ```pandas```, ```numpy```, ```csv```, and ```os``` libraries

In [11]:
import pandas as pd
import numpy as np
import csv
import os

We store the path to the ```IRS_migration_data``` repository folder in a string variable

In [19]:
repo_path = os.getcwd()[0:len(os.getcwd())-7]

We also create a new ```tables``` folder in which we will store the tables produced by this script, it will be a subfolder of your ```IRS_migration_data``` repository. If such a folder already exists, a new one will not be created.

In [None]:
results_path = repo_path + 'tables/'
if not os.path.exists(results_path):
    os.makedirs(results_path)

We upload the data on inflows from a csv file.
It covers the period 1998-2015.

In [20]:
inflow_df = pd.read_csv(repo_path + 'inflows/inflow.csv')

We print the first 10 lines of the dataframe we just created to see how it is structured

In [21]:
inflow_df.head()

Unnamed: 0,year,state_code_dest,county_code_dest,destination,state_code_origin,county_code_origin,origin,state_abbrv,county_name,return_num,exmpt_num,aggr_agi
0,1998-1999,0,0,0,96,0,96000,US,Total Mig - US & For,6916177,13132561,272835496
1,1998-1999,0,0,0,97,0,97000,US,Total Mig - US,6711480,12800546,268111409
2,1998-1999,0,0,0,97,1,97001,US,Total Mig - US Same St,3803650,7221528,143410291
3,1998-1999,0,0,0,97,3,97003,US,Total Mig - US Diff St,2907830,5579018,124701118
4,1998-1999,0,0,0,98,0,98000,US,Total Mig - Foreign,204697,332015,4724087


We store in a numpy array the unique fip codes of destination and origin counties

In [22]:
destinations = inflow_df[(inflow_df['state_code_dest']<=56) & ((inflow_df['state_code_origin']<=56))]
destination_codes = pd.unique(destinations['destination'].values)
origin_codes = np.append(destination_codes, [58000,59000,57009])

We upload the cleaned and restructured datasets for the six period we will analyse:
    
* before Sandy (2010-2011);
* and after Sandy (2012-2013).

In [24]:
pre_1 = pd.read_csv(repo_path + 'inflows/1011in.csv')
pre_1.rename(columns={'Unnamed: 0':''}, inplace=True)
pre_1.set_index([''], inplace=True)
new_col_names = list(map(int, pre_1.columns.values))
pre_1.columns = new_col_names

pre_2 = pd.read_csv(repo_path + 'inflows/1112in.csv')
pre_2.rename(columns={'Unnamed: 0':''}, inplace=True)
pre_2.set_index([''], inplace=True)
new_col_names = list(map(int, pre_2.columns.values))
pre_2.columns = new_col_names

re_1 = pd.read_csv(repo_path + 'inflows/1213in.csv')
re_1.rename(columns={'Unnamed: 0':''}, inplace=True)
re_1.set_index([''], inplace=True)
new_col_names = list(map(int, re_1.columns.values))
re_1.columns = new_col_names

re_2 = pd.read_csv(repo_path + 'inflows/1314in.csv')
re_2.rename(columns={'Unnamed: 0':''}, inplace=True)
re_2.set_index([''], inplace=True)
new_col_names = list(map(int, re_2.columns.values))
re_2.columns = new_col_names

We adjust the estimates for all the years after 2011 (included) in order to account for a change in methodology by the IRS. Since improvements in the collection mechanisms allowed the IRS to increase by 4.7 percent the coverage rate, we will decrease by 4.7 percent all the flows in order to make them comparable to their pre 2011 equivalents.

In [27]:
pre_2_adj = pre_2 * 0.963
re_1_adj = re_1 * 0.963
re_2_adj = re_2 * 0.963

We import a set of csv files that contain the fip codes for different groups of counties we will use in the analysis.

In [37]:
disaster_sandy_counties_df = pd.read_csv(repo_path + 'county_groups/disaster_sandy_counties.csv', usecols = ['fip_code'])
nearby_sandy_counties_df = pd.read_csv(repo_path + 'county_groups/nearby_sandy_counties.csv', usecols = ['fip_code'])
distant_sandy_counties_df = pd.read_csv(repo_path + 'county_groups/distant_sandy_counties.csv', usecols = ['fip_code'])
urban_nc_counties_df = pd.read_csv(repo_path + 'county_groups/urban_nc_counties.csv', usecols = ['fip_code'])

We now convert the dataframes into lists and we add one list with all the counties

In [44]:
disaster_sandy_counties = list(disaster_sandy_counties_df['fip_code'])
nearby_sandy_counties = list(nearby_sandy_counties_df['fip_code'])
distant_sandy_counties = list(distant_sandy_counties_df['fip_code'])
urban_nc_counties = list(urban_nc_counties_df['fip_code'])
all_nc_counties = disaster_sandy_counties + nearby_sandy_counties + distant_sandy_counties

Finally, using list comprehension, we divide all the groups we have defined so far into urban and rural areas by looking at their 2010 Census population. If the proportion living in rural areas is equal or above 50% we classify the county as rural otherwise as urban.

In [43]:
disaster_sandy_urban_counties = [x for x in disaster_sandy_counties if x in urban_nc_counties]
nearby_sandy_urban_counties = [x for x in nearby_sandy_counties if x in urban_nc_counties]
distant_sandy_urban_counties = [x for x in distant_sandy_counties if x in urban_nc_counties]

We now summarize the number of counties in each group:

In [48]:
print('There are', len(disaster_sandy_counties), 'disaster counties, of which', 
      len(disaster_sandy_urban_counties), 'are urban.' )

print('There are', len(nearby_sandy_counties), 'nearby counties, of which', 
      len(nearby_sandy_urban_counties), 'are urban.' )

print('There are', len(distant_sandy_counties), 'distant counties, of which', 
      len(distant_sandy_urban_counties), 'are urban.' )

print('There is a total of', len(all_nc_counties), 'counties, of which', 
      len(urban_nc_counties), 'are urban and the remaining',
      len(all_nc_counties) - len(urban_nc_counties), 'are rural.' )

There are 41 disaster counties, of which 40 are urban.
There are 132 nearby counties, of which 84 are urban.
There are 2939 distant counties, of which 1123 are urban.
There is a total of 3112 counties, of which 1247 are urban and the remaining 1865 are rural.


## Ties Analysis

We create six dataframes, one for each year considered, where we have the ties connecting each county to the others. Here a tie is defined as the presence of a flow of any size between two counties. Given that there has been a change in methodology in the IRS data collection such that before 2012 flows up to 10 households were recorded while after only flows up to 20 households are recoded, we need to use this latter threshold in order not to bias the results.

In [49]:
pre_1_ties = pre_1.drop([58000,59000,57009],axis=1).where(pre_1<20,1)
pre_1_ties = pre_1_ties.where(pre_1_ties==1,0)

pre_2_ties = pre_2.drop([58000,59000,57009],axis=1).where(pre_2<20,1)
pre_2_ties = pre_2_ties.where(pre_2_ties==1,0)

re_1_ties = re_1.drop([58000,59000,57009],axis=1).where(re_1<20,1)
re_1_ties = re_1_ties.where(re_1_ties==1,0)

re_2_ties = re_2.drop([58000,59000,57009],axis=1).where(re_2<20,1)
re_2_ties = re_2_ties.where(re_2_ties==1,0)

The final result is a matrix for each period whose rows and columns are all the counties in the dataset and where a 1 indicates the presence of a tie between the two counties and a 0 its absence. We consider a tie to exist if a positive flow was recorded at least in one of the three years composing the before and after periods.

In [50]:
pre_ties = (pre_1_ties + pre_2_ties)
pre_ties = pre_ties.where(pre_ties==0,1)
re_ties = (re_1_ties + re_2_ties)
re_ties = re_ties.where(re_ties==0,1)

Finally, we compute the number of unique ties in the before and recovery periods. A tie is unique if present only in one of the two periods.

In [51]:
ties = re_ties - pre_ties
uties_pre = ties.where(ties==-1,0)*-1
uties_re = ties.where(ties==1,0)

In [52]:
for i in uties_pre.index:
    uties_pre.loc[i, i] = 0
for i in uties_re.index:
    uties_re.loc[i, i] = 0

We set up the table headers

In [60]:
county_groups= ['All','Disaster Affected','Nearby','Distant',
                   'All (Urban)', 'Disaster Affected (Urban)', 'Nearby (Urban)','Distant (Urban)']
periods = ['Pre-Disaster','Recovery','% Change']

We initialise the table

In [61]:
ties_df = pd.DataFrame(0, index=county_groups, columns=periods)

Finally we fill it

In [62]:
ties_df.loc['All',
            'Pre-Disaster'] = uties_pre.loc[disaster_sandy_counties,
                                            all_nc_counties].sum(axis=1).sum(axis=0)
ties_df.loc['All',
            'Recovery'] = uties_re.loc[disaster_sandy_counties,
                                       all_nc_counties].sum(axis=1).sum(axis=0)

ties_df.loc['Disaster Affected',
            'Pre-Disaster'] = uties_pre.loc[disaster_sandy_counties,
                                            disaster_sandy_counties].sum(axis=1).sum(axis=0)
ties_df.loc['Disaster Affected',
            'Recovery'] = uties_re.loc[disaster_sandy_counties,
                                       disaster_sandy_counties].sum(axis=1).sum(axis=0)

ties_df.loc['Nearby',
            'Pre-Disaster'] = uties_pre.loc[disaster_sandy_counties,
                                            nearby_sandy_counties].sum(axis=1).sum(axis=0)
ties_df.loc['Nearby',
            'Recovery'] = uties_re.loc[disaster_sandy_counties,
                                       nearby_sandy_counties].sum(axis=1).sum(axis=0)

ties_df.loc['Distant',
            'Pre-Disaster'] = uties_pre.loc[disaster_sandy_counties,
                                            distant_sandy_counties].sum(axis=1).sum(axis=0)
ties_df.loc['Distant',
            'Recovery'] = uties_re.loc[disaster_sandy_counties,
                                       distant_sandy_counties].sum(axis=1).sum(axis=0)

ties_df.loc['All (Urban)',
            'Pre-Disaster'] = uties_pre.loc[disaster_sandy_counties,
                                            urban_nc_counties].sum(axis=1).sum(axis=0)
ties_df.loc['All (Urban)',
            'Recovery'] = uties_re.loc[disaster_sandy_counties,
                                       urban_nc_counties].sum(axis=1).sum(axis=0)

ties_df.loc['Disaster Affected (Urban)',
            'Pre-Disaster'] = uties_pre.loc[disaster_sandy_counties,
                                            disaster_sandy_urban_counties].sum(axis=1).sum(axis=0)
ties_df.loc['Disaster Affected (Urban)',
            'Recovery'] = uties_re.loc[disaster_sandy_counties,
                                       disaster_sandy_urban_counties].sum(axis=1).sum(axis=0)

ties_df.loc['Nearby (Urban)',
            'Pre-Disaster'] = uties_pre.loc[disaster_sandy_counties,
                                            nearby_sandy_urban_counties].sum(axis=1).sum(axis=0)
ties_df.loc['Nearby (Urban)',
            'Recovery'] = uties_re.loc[disaster_sandy_counties,
                                       nearby_sandy_urban_counties].sum(axis=1).sum(axis=0)

ties_df.loc['Distant (Urban)',
            'Pre-Disaster'] = uties_pre.loc[disaster_sandy_counties,
                                            distant_sandy_urban_counties].sum(axis=1).sum(axis=0)
ties_df.loc['Distant (Urban)',
            'Recovery'] = uties_re.loc[disaster_sandy_counties,
                                       distant_sandy_urban_counties].sum(axis=1).sum(axis=0)

ties_df.loc[:,'% Change'] = (ties_df.loc[:,'Recovery'] - ties_df.loc[:,'Pre-Disaster'])/ties_df.loc[:,'Pre-Disaster']*100

ties_df.loc[:,'% Change'] = ties_df.loc[:,'% Change'].round(decimals=1)
ties_df

Unnamed: 0,Pre-Disaster,Recovery,% Change
All,258.0,241.0,-6.6
Disaster Affected,30.0,37.0,23.3
Nearby,50.0,56.0,12.0
Distant,178.0,148.0,-16.9
All (Urban),252.0,234.0,-7.1
Disaster Affected (Urban),29.0,37.0,27.6
Nearby (Urban),48.0,51.0,6.2
Distant (Urban),175.0,146.0,-16.6


To export the table to a csv file, uncomment the following line

In [28]:
#ties_df.to_csv(results_path + 'inties_table_sandy.csv')

## Flows Analysis
We apply the same adjustment to flows, removing all flows below 20 households

In [63]:
pre_1_flows = pre_1.where(pre_1>=20,0)
pre_2_flows = pre_2.where(pre_2>=20,0)
re_1_flows = re_1.where(re_1>=20,0)
re_2_flows = re_2.where(re_2>=20,0)

We create to dataframes with the same structure as the ones with the ties but containing average flows for the two periods respectively

In [64]:
pre_avg = (pre_1_flows + pre_2_flows)/2
re_avg = (re_1_flows + re_2_flows)/2

pre_avg = pre_avg.round(decimals=0)
re_avg = re_avg.round(decimals=0)

pre_flows = pre_avg
re_flows = re_avg

We remove flows in the main diagonal as these represent household that remained in the same counties and are thus not interesting in our migration analysis

In [65]:
for i in pre_flows.index:
    pre_flows.loc[i, i] = 0
for i in re_flows.index:
    re_flows.loc[i, i] = 0

We initialise the flows table as before

In [66]:
flows_df = pd.DataFrame(0, index=county_groups, columns=periods)

and now we fill it

In [68]:
flows_df.loc['All',
            'Pre-Disaster'] = pre_avg.loc[disaster_sandy_counties,
                                         all_nc_counties].sum(axis=1).sum(axis=0)
flows_df.loc['All',
            'Recovery'] = re_avg.loc[disaster_sandy_counties,
                                    all_nc_counties].sum(axis=1).sum(axis=0)

flows_df.loc['Disaster Affected',
            'Pre-Disaster'] = pre_avg.loc[disaster_sandy_counties, 
                                          disaster_sandy_counties].sum(axis=1).sum(axis=0)
flows_df.loc['Disaster Affected',
            'Recovery'] = re_avg.loc[disaster_sandy_counties, 
                                     disaster_sandy_counties].sum(axis=1).sum(axis=0)

flows_df.loc['Nearby',
            'Pre-Disaster'] = pre_avg.loc[disaster_sandy_counties, 
                                          nearby_sandy_counties].sum(axis=1).sum(axis=0)
flows_df.loc['Nearby',
            'Recovery'] = re_avg.loc[disaster_sandy_counties, 
                                     nearby_sandy_counties].sum(axis=1).sum(axis=0)

flows_df.loc['Distant',
            'Pre-Disaster'] = pre_avg.loc[disaster_sandy_counties, 
                                          distant_sandy_counties].sum(axis=1).sum(axis=0)
flows_df.loc['Distant',
            'Recovery'] = re_avg.loc[disaster_sandy_counties, 
                                     distant_sandy_counties].sum(axis=1).sum(axis=0)

flows_df.loc['All (Urban)',
            'Pre-Disaster'] = pre_avg.loc[disaster_sandy_counties, 
                                          urban_nc_counties].sum(axis=1).sum(axis=0)
flows_df.loc['All (Urban)',
            'Recovery'] = re_avg.loc[disaster_sandy_counties, 
                                     urban_nc_counties].sum(axis=1).sum(axis=0)

flows_df.loc['Disaster Affected (Urban)',
            'Pre-Disaster'] = pre_avg.loc[disaster_sandy_counties, 
                                          disaster_sandy_urban_counties].sum(axis=1).sum(axis=0)
flows_df.loc['Disaster Affected (Urban)',
            'Recovery'] = re_avg.loc[disaster_sandy_counties, 
                                     disaster_sandy_urban_counties].sum(axis=1).sum(axis=0)

flows_df.loc['Nearby (Urban)',
            'Pre-Disaster'] = pre_avg.loc[disaster_sandy_counties, 
                                          nearby_sandy_urban_counties].sum(axis=1).sum(axis=0)
flows_df.loc['Nearby (Urban)',
            'Recovery'] = re_avg.loc[disaster_sandy_counties, 
                                     nearby_sandy_urban_counties].sum(axis=1).sum(axis=0)

flows_df.loc['Distant (Urban)',
            'Pre-Disaster'] = pre_avg.loc[disaster_sandy_counties, 
                                          distant_sandy_urban_counties].sum(axis=1).sum(axis=0)
flows_df.loc['Distant (Urban)',
            'Recovery'] = re_avg.loc[disaster_sandy_counties, 
                                     distant_sandy_urban_counties].sum(axis=1).sum(axis=0)

flows_df.loc[:,'% Change'] = (flows_df.loc[:,'Recovery'] - flows_df.loc[:,'Pre-Disaster'])/flows_df.loc[:,'Pre-Disaster']*100
flows_df.loc[:,'% Change'] = flows_df.loc[:,'% Change'].round(decimals=1)
flows_df

Unnamed: 0,Pre-Disaster,Recovery,% Change
All,439425.0,452139.0,2.9
Disaster Affected,325898.0,341816.0,4.9
Nearby,47163.0,46670.0,-1.0
Distant,66364.0,63653.0,-4.1
All (Urban),437728.0,450414.0,2.9
Disaster Affected (Urban),325106.0,341044.0,4.9
Nearby (Urban),46387.0,45820.0,-1.2
Distant (Urban),66235.0,63550.0,-4.1


To export the table to a csv file, uncomment the following line

In [34]:
#flows_df.to_csv(results_path + 'inflows_table_sandy.csv')

## Adjusted Data After 2011
We now repeat the analysis this time using the adjusted data

In [69]:
pre_2_ties_adj = pre_2_adj.drop([58000,59000,57009],axis=1).where(pre_2_adj<20,1)
pre_2_ties_adj = pre_2_ties_adj.where(pre_2_ties_adj==1,0)

re_1_ties_adj = re_1_adj.drop([58000,59000,57009],axis=1).where(re_1_adj<20,1)
re_1_ties_adj = re_1_ties_adj.where(re_1_ties_adj==1,0)

re_2_ties_adj = re_2_adj.drop([58000,59000,57009],axis=1).where(re_2_adj<20,1)
re_2_ties_adj = re_2_ties_adj.where(re_2_ties_adj==1,0)

The final result is a matrix for each period whose rows and columns are all the counties in the dataset and where a 1 indicates the presence of a tie between the two counties and a 0 its absence. We consider a tie to exist if a positive flow was recorded at least in one of the three years composing the before and after periods.

In [70]:
pre_ties_adj = (pre_1_ties + pre_2_ties_adj)
pre_ties_adj = pre_ties_adj.where(pre_ties_adj==0,1)
re_ties_adj = (re_1_ties_adj + re_2_ties_adj)
re_ties_adj = re_ties_adj.where(re_ties_adj==0,1)

Finally, we compute the number of unique ties in the before and recovery periods. A tie is unique if present only in one of the two periods.

In [71]:
ties_adj = re_ties_adj - pre_ties_adj
uties_pre_adj = ties_adj.where(ties_adj==-1,0)*-1
uties_re_adj = ties_adj.where(ties_adj==1,0)

In [72]:
for i in uties_pre_adj.index:
    uties_pre_adj.loc[i, i] = 0
for i in uties_re_adj.index:
    uties_re_adj.loc[i, i] = 0

We initialise the table

In [73]:
ties_adj_df = pd.DataFrame(0, index=county_groups, columns=periods)

Finally we fill it

In [74]:
ties_adj_df.loc['All',
            'Pre-Disaster'] = uties_pre_adj.loc[disaster_sandy_counties,
                                            all_nc_counties].sum(axis=1).sum(axis=0)
ties_adj_df.loc['All',
            'Recovery'] = uties_re_adj.loc[disaster_sandy_counties,
                                       all_nc_counties].sum(axis=1).sum(axis=0)

ties_adj_df.loc['Disaster Affected',
            'Pre-Disaster'] = uties_pre_adj.loc[disaster_sandy_counties,
                                            disaster_sandy_counties].sum(axis=1).sum(axis=0)
ties_adj_df.loc['Disaster Affected',
            'Recovery'] = uties_re_adj.loc[disaster_sandy_counties,
                                       disaster_sandy_counties].sum(axis=1).sum(axis=0)

ties_adj_df.loc['Nearby',
            'Pre-Disaster'] = uties_pre_adj.loc[disaster_sandy_counties,
                                            nearby_sandy_counties].sum(axis=1).sum(axis=0)
ties_adj_df.loc['Nearby',
            'Recovery'] = uties_re_adj.loc[disaster_sandy_counties,
                                       nearby_sandy_counties].sum(axis=1).sum(axis=0)

ties_adj_df.loc['Distant',
            'Pre-Disaster'] = uties_pre_adj.loc[disaster_sandy_counties,
                                            distant_sandy_counties].sum(axis=1).sum(axis=0)
ties_adj_df.loc['Distant',
            'Recovery'] = uties_re_adj.loc[disaster_sandy_counties,
                                       distant_sandy_counties].sum(axis=1).sum(axis=0)

ties_adj_df.loc['All (Urban)',
            'Pre-Disaster'] = uties_pre_adj.loc[disaster_sandy_counties,
                                            urban_nc_counties].sum(axis=1).sum(axis=0)
ties_adj_df.loc['All (Urban)',
            'Recovery'] = uties_re_adj.loc[disaster_sandy_counties,
                                       urban_nc_counties].sum(axis=1).sum(axis=0)

ties_adj_df.loc['Disaster Affected (Urban)',
            'Pre-Disaster'] = uties_pre_adj.loc[disaster_sandy_counties,
                                            disaster_sandy_urban_counties].sum(axis=1).sum(axis=0)
ties_adj_df.loc['Disaster Affected (Urban)',
            'Recovery'] = uties_re_adj.loc[disaster_sandy_counties,
                                       disaster_sandy_urban_counties].sum(axis=1).sum(axis=0)

ties_adj_df.loc['Nearby (Urban)',
            'Pre-Disaster'] = uties_pre_adj.loc[disaster_sandy_counties,
                                            nearby_sandy_urban_counties].sum(axis=1).sum(axis=0)
ties_adj_df.loc['Nearby (Urban)',
            'Recovery'] = uties_re_adj.loc[disaster_sandy_counties,
                                       nearby_sandy_urban_counties].sum(axis=1).sum(axis=0)

ties_adj_df.loc['Distant (Urban)',
            'Pre-Disaster'] = uties_pre_adj.loc[disaster_sandy_counties,
                                            distant_sandy_urban_counties].sum(axis=1).sum(axis=0)
ties_adj_df.loc['Distant (Urban)',
            'Recovery'] = uties_re_adj.loc[disaster_sandy_counties,
                                       distant_sandy_urban_counties].sum(axis=1).sum(axis=0)

ties_adj_df.loc[:,'% Change'] = (ties_adj_df.loc[:,'Recovery'] - ties_adj_df.loc[:,'Pre-Disaster'])/ties_adj_df.loc[:,'Pre-Disaster']*100

ties_adj_df.loc[:,'% Change'] = ties_adj_df.loc[:,'% Change'].round(decimals=1)
ties_adj_df

Unnamed: 0,Pre-Disaster,Recovery,% Change
All,257.0,197.0,-23.3
Disaster Affected,29.0,33.0,13.8
Nearby,52.0,38.0,-26.9
Distant,176.0,126.0,-28.4
All (Urban),250.0,192.0,-23.2
Disaster Affected (Urban),28.0,33.0,17.9
Nearby (Urban),50.0,35.0,-30.0
Distant (Urban),172.0,124.0,-27.9


To export the table to a csv file, uncomment the following line

In [41]:
#ties_adj_df.to_csv(results_path + 'inties_table_adj.csv')

## Flows Analysis
We apply the same adjustment to flows, removing all flows below 20 households

In [78]:
pre_2_flows_adj = pre_2_adj.where(pre_2_adj>=20,0)
re_1_flows_adj = re_1_adj.where(re_1_adj>=20,0)
re_2_flows_adj = re_2_adj.where(re_2_adj>=20,0)

We create to dataframes with the same structure as the ones with the ties but containing average flows for the two periods respectively

In [79]:
pre_avg_adj = (pre_1_flows + pre_2_flows_adj)/2
re_avg_adj = (re_1_flows_adj + re_2_flows_adj)/2

pre_avg_adj = pre_avg_adj.round(decimals=0)
re_avg_adj = re_avg_adj.round(decimals=0)

pre_flows_adj = pre_avg_adj
re_flows_adj = re_avg_adj

We remove flows in the main diagonal as these represent household that remained in the same counties and are thus not interesting in our migration analysis 

In [80]:
for i in pre_flows_adj.index:
    pre_flows_adj.loc[i, i] = 0
for i in re_flows_adj.index:
    re_flows_adj.loc[i, i] = 0

We initialise the flows table as before

In [81]:
flows_adj_df = pd.DataFrame(0, index=county_groups, columns=periods)

and now we fill it

In [83]:
flows_adj_df.loc['All',
            'Pre-Disaster'] = pre_flows_adj.loc[disaster_sandy_counties,
                                         all_nc_counties].sum(axis=1).sum(axis=0)
flows_adj_df.loc['All',
            'Recovery'] = re_flows_adj.loc[disaster_sandy_counties,
                                    all_nc_counties].sum(axis=1).sum(axis=0)

flows_adj_df.loc['Disaster Affected',
            'Pre-Disaster'] = pre_flows_adj.loc[disaster_sandy_counties, 
                                          disaster_sandy_counties].sum(axis=1).sum(axis=0)
flows_adj_df.loc['Disaster Affected',
            'Recovery'] = re_flows_adj.loc[disaster_sandy_counties, 
                                     disaster_sandy_counties].sum(axis=1).sum(axis=0)

flows_adj_df.loc['Nearby',
            'Pre-Disaster'] = pre_flows_adj.loc[disaster_sandy_counties, 
                                          nearby_sandy_counties].sum(axis=1).sum(axis=0)
flows_adj_df.loc['Nearby',
            'Recovery'] = re_flows_adj.loc[disaster_sandy_counties, 
                                     nearby_sandy_counties].sum(axis=1).sum(axis=0)

flows_adj_df.loc['Distant',
            'Pre-Disaster'] = pre_flows_adj.loc[disaster_sandy_counties, 
                                          distant_sandy_counties].sum(axis=1).sum(axis=0)
flows_adj_df.loc['Distant',
            'Recovery'] = re_flows_adj.loc[disaster_sandy_counties, 
                                     distant_sandy_counties].sum(axis=1).sum(axis=0)

flows_adj_df.loc['All (Urban)',
            'Pre-Disaster'] = pre_flows_adj.loc[disaster_sandy_counties, 
                                          urban_nc_counties].sum(axis=1).sum(axis=0)
flows_adj_df.loc['All (Urban)',
            'Recovery'] = re_flows_adj.loc[disaster_sandy_counties, 
                                     urban_nc_counties].sum(axis=1).sum(axis=0)

flows_adj_df.loc['Disaster Affected (Urban)',
            'Pre-Disaster'] = pre_flows_adj.loc[disaster_sandy_counties, 
                                          disaster_sandy_urban_counties].sum(axis=1).sum(axis=0)
flows_adj_df.loc['Disaster Affected (Urban)',
            'Recovery'] = re_flows_adj.loc[disaster_sandy_counties, 
                                     disaster_sandy_urban_counties].sum(axis=1).sum(axis=0)

flows_adj_df.loc['Nearby (Urban)',
            'Pre-Disaster'] = pre_flows_adj.loc[disaster_sandy_counties, 
                                          nearby_sandy_urban_counties].sum(axis=1).sum(axis=0)
flows_adj_df.loc['Nearby (Urban)',
            'Recovery'] = re_flows_adj.loc[disaster_sandy_counties, 
                                     nearby_sandy_urban_counties].sum(axis=1).sum(axis=0)

flows_adj_df.loc['Distant (Urban)',
            'Pre-Disaster'] = pre_flows_adj.loc[disaster_sandy_counties, 
                                          distant_sandy_urban_counties].sum(axis=1).sum(axis=0)
flows_adj_df.loc['Distant (Urban)',
            'Recovery'] = re_flows_adj.loc[disaster_sandy_counties, 
                                     distant_sandy_urban_counties].sum(axis=1).sum(axis=0)

flows_adj_df.loc[:,'% Change'] = (flows_adj_df.loc[:,'Recovery'] - flows_adj_df.loc[:,'Pre-Disaster'])/flows_adj_df.loc[:,'Pre-Disaster']*100
flows_adj_df.loc[:,'% Change'] = flows_adj_df.loc[:,'% Change'].round(decimals=1)
flows_adj_df

Unnamed: 0,Pre-Disaster,Recovery,% Change
All,429818.0,433609.0,0.9
Disaster Affected,319329.0,328916.0,3.0
Nearby,46058.0,44442.0,-3.5
Distant,64431.0,60251.0,-6.5
All (Urban),428159.0,432000.0,0.9
Disaster Affected (Urban),318553.0,328172.0,3.0
Nearby (Urban),45302.0,43667.0,-3.6
Distant (Urban),64304.0,60161.0,-6.4
Disaster Affected (Coastline),319329.0,328916.0,3.0


To export the table to a csv file, uncomment the following line

In [None]:
#flows_adj_df.to_csv(results_path + 'inflows_table_adj.csv')