## Methods: 03 Develop and Threshold Flood Risk Index

**The purpose of this notebook is to assign all 216 districts of Ghana into either the treatment or control group based on whether they are flood-prone (treatment) or not flood-prone (control).** Specifically, this notebook does the following:

1. Prepare flood recurrence and damage estimates data
2. Develops a Flood Risk Index using flood recurrence interval data and flood damage estimates
3. Thresholds the Flood Risk Index to bin sampling units into treatment and control groups


### 1 Prepare flood recurrence and damage estimates data

The following code prepares the flood recurrence and flood damage estimates data. The layering of the raster layers from which this data comes from was completed by Cloud to Street staff, and is not provided in this notebook.

In [12]:
import pandas as pd
import constants

# load file paths ---
path_input_pop = constants.path_output+constants.name_output_marginal_population
path_input_cropland = constants.path_output+constants.name_output_marginal_cropland

path_output = constants.path_output+constants.name_output_tc_groups

# merge people and cropland ---
pop = pd.read_csv(path_input_pop, header=0)
cropland = pd.read_csv(path_input_cropland, header=0)

# display data for illustration purposes ---
tot_flood_risk = pop.merge(cropland, on='Region')
tot_flood_risk.head(5)


Unnamed: 0,Region,Avg Impact Per Year - People,Avg Impact Per Year - Crops
0,Abura / Asebu / Kwamankese,0.0,0.0
1,Kwabre,0.0,0.0
2,Kwahu East,0.0,0.001205
3,Kwahu West,0.0,0.000125
4,La Dade Kotopon,0.0,0.0


### 2 Develop Flood Risk Index using flood recurrence interval data and flood damage estimates

The following code develops a flood risk index using the data prepared above. Specifically, it does the following:

* converts cropland impacts into population impacts using an arbitary conversion rate
* constructs a raw index by adding cropland and population impacts together (now that units of each impact are consistent)
* normalizes the index using a z-score conversion


In [5]:
# ASSUMPTION -----------------

hectares_to_people = 10

print('The following arbitrary conversion rate will be used to develop the Flood Risk Index:', '\n')
print(str(hectares_to_people)+' hectares of flood-affected cropland = 1 flood-affected person in any given year')


The following arbitrary conversion rate will be used to develop the Flood Risk Index: 

10 hectares of flood-affected cropland = 1 flood-affected person in any given year


In [7]:
import numpy as np

# convert cropland impacts into population impacts by scaling cropland impact by a factor of 10 (10H = 1P)
tot_flood_risk['Avg Impact Per Year - Crops Scaled'] = tot_flood_risk['Avg Impact Per Year - Crops']/hectares_to_people

# calculate raw Flood Risk Index
tot_flood_risk['FRI'] = pd.Series((np.array(tot_flood_risk['Avg Impact Per Year - People'])) + 
                                  (np.array(tot_flood_risk['Avg Impact Per Year - Crops Scaled'])))

# calculate z-scores for Flood Risk Index
tot_flood_risk['FRI_z_score'] = ( tot_flood_risk['FRI'] - tot_flood_risk['FRI'].mean() ) / tot_flood_risk['FRI'].std()

# sort dataset by normalized Flood Risk Index
tot_flood_risk = tot_flood_risk.sort_values(by='FRI_z_score', ascending=False)

# display data for illustration purposes ---
tot_flood_risk.head(5)


Unnamed: 0,Region,Avg Impact Per Year - People,Avg Impact Per Year - Crops,Avg Impact Per Year - Crops Scaled,FRI,FRI_z_score
215,Ga South,81.644667,0.072383,0.007238,81.651905,8.798205
214,Gonja Central,68.578833,0.164963,0.016496,68.59533,7.358079
213,East Gonja,55.174833,1.187827,0.118783,55.293616,5.890914
212,Komenda Edna Eguafo / Abirem,42.764667,4.7e-05,5e-06,42.764671,4.508985
211,Accra Metropolis,30.7445,2.5e-05,3e-06,30.744502,3.183174


### 3 Threshold Flood Risk Index to bin sampling units into treatment and control groups

The following code bins districts into treatment and control groups by thresholding the Flood Risk Index.


In [8]:
# THRESHOLD -----------------

treatment_control_cutoff = 0.50

print('The following arbitrary threshold will be used to develop the Flood Risk Index:', '\n')
print('Districts in the', str(int(treatment_control_cutoff*100))+'th percentile and above on the normalized Flood Risk Index will be put in the treatment group')
print('The remaining districts which were in the bottom '+ str(int(((1-treatment_control_cutoff)*100)))+'th percentile will be put in the control group')


The following arbitrary threshold will be used to develop the Flood Risk Index: 

Districts in the 50th percentile and above on the normalized Flood Risk Index will be put in the treatment group
The remaining districts which were in the bottom 50th percentile will be put in the control group


In [9]:
import numpy as np

# bin groups into treatment and control groups using threshold defined above ---
tot_flood_risk = tot_flood_risk.sort_values(by='FRI_z_score', ascending=False) # sort in descending order
tot_flood_risk['Treatment'] = None
n_treatment = round(len(tot_flood_risk) * treatment_control_cutoff)
tot_flood_risk.loc[tot_flood_risk.index[0:n_treatment], 'Treatment'] = 1
tot_flood_risk.loc[tot_flood_risk.index[n_treatment:len(tot_flood_risk)], 'Treatment'] = 0
print('Number of districts in each group (0 = Control, 1 = Treatment):')
print(tot_flood_risk.groupby('Treatment').size())

# assert to ensure all districts were assigned to either treatment or control ---

assert tot_flood_risk.groupby('Treatment').size()[0]+tot_flood_risk.groupby('Treatment').size()[1] == len(tot_flood_risk)

# show Flood Risk Index and treatment/control assignment for display purposes ----
tot_flood_risk


Number of districts in each group (0 = Control, 1 = Treatment):
Treatment
0    108
1    108
dtype: int64


Unnamed: 0,Region,Avg Impact Per Year - People,Avg Impact Per Year - Crops,Avg Impact Per Year - Crops Scaled,FRI,FRI_z_score,Treatment
215,Ga South,81.644667,0.072383,0.007238,81.651905,8.798205,1
214,Gonja Central,68.578833,0.164963,0.016496,68.595330,7.358079,1
213,East Gonja,55.174833,1.187827,0.118783,55.293616,5.890914,1
212,Komenda Edna Eguafo / Abirem,42.764667,0.000047,0.000005,42.764671,4.508985,1
211,Accra Metropolis,30.744500,0.000025,0.000003,30.744502,3.183174,1
...,...,...,...,...,...,...,...
96,Adansi North,0.000000,0.000000,0.000000,0.000000,-0.207911,0
78,Asunafo South,0.000000,0.000000,0.000000,0.000000,-0.207911,0
80,Dormaa West,0.000000,0.000000,0.000000,0.000000,-0.207911,0
90,Berekum,0.000000,0.000000,0.000000,0.000000,-0.207911,0


In [11]:
# write file to disk ---

tot_flood_risk.to_csv(path_output, index = False, header=True)
