# Finding Those Most At Risk

### By: Climate Dreamer Pros

## Introduction

"*Celebrating a birthday while ash-filled rain pours outside from a nearby wildfire*"

"*Seeing cars ditched in the middle of the streets after a flash flood had made it impossible to drive*"

"*Arguing over who gets the icepack next on the 15th blisteringly hot day in an apartment where the air conditioning has stopped working...again*"

Everyone on our team has one story of many describing how climate change has impacted their lives. 
Although we had all shared several uncomfortable experiences, we knew we did not come close to experiencing the true scale of climate change related adversity that exists. 

We were not the ones that would suffer most if we lost our disaster-insured home in a wildfire or the ones that would experience the detremental health affects from inhaling the resulting smoke. 

We were not the ones that had to live in a low-elevation coastal area that has cheaper housing but also is prone to flooding.  

## Intersection between Environmental and Social Issues

The relationship between climate change and the most disadvantaged groups in society is well known to be characterized as a positive feedback loop. Initial inequality not only allows the most disadvantaged groups to be impacted disproportionately worse by the impacts of climate change, but also results in increased subsequent inequality.

Specifically it has been [documented](http://www.un.org/esa/desa/papers/2017/wp152_2017.pdf) that the disadvantaged groups of society are more exposed, more susceptible and have more difficulty recovering from the consequences of climate change. Given this, it is crucial to determine which pockets of the world are either currently facing or at risk of facing this positive feedback loop. 

## Useful Data for Both Governments and Corporations

Finding the areas that are most at risk of the positive feedback loop exacerbated by climate change can be of use to both governments and corporations. 

By knowing the most at risk locations, governments can determine where to direct funding and resources in order to prevent the cycle from perpetuating. As we have seen with the current pandemic, governments have difficulty providing resources to the corporations that need it most and instead opt for [widespread injection of stimulus](https://www.washingtonpost.com/business/2020/05/04/small-businesses-still-afloat-grapple-with-whether-accept-stimulus-funds/) to company balance sheets. Knowing which companies have locations in the areas that are most disadvantaged (from both a social and environmental standpoint), would provide a more useful tool to direct resources.

If companies knew the most at risk locations, not only could that help with risk assessment regarding their physical locations (i.e. headquarters), but would also give companies a concrete reason to contribute to their local communities.  

## Creating a Metric

Our proposal for a metric is based on combining variables from both socio-economic data as well as the climate change induced hazards that an area is expecting to face. We designed the metric to be a combination of a 'disadvantaged' score and a 'hazard' score for all cities in the United States. Below is the methodology that we followed when developing our 'Most At Risk' metric.

### Importing Libraries

In [None]:
import numpy as np
import pandas as pd

### Loading Data

In order to create a 'disadvantaged' score for each city in the United States, we used the 2018 CDC Social Vulnerability Index data. Determining a 'hazard' score for each city in the United States was done using the 2018 CDP Cities Responses data.

In [None]:
survey_year = '2018'
survey_type = 'Climate Change'
path = '/kaggle/input/cdp-unlocking-climate-solutions/'

cities_df = pd.read_csv(path+'Cities/Cities Responses/'+survey_year+'_Full_Cities_Dataset.csv')
cities_info_df = pd.read_csv(path+'Cities/Cities Disclosing/'+survey_year+'_Cities_Disclosing_to_CDP.csv')
corps_cc_df = pd.read_csv(path+'Corporations/Corporations Responses/'+survey_type+'/'+survey_year+'_Full_Climate_Change_Dataset.csv')
na_hq_df = pd.read_csv(path+'Supplementary Data/Locations of Corporations/NA_HQ_public_data.csv')
corps_cc_info_df = pd.read_csv(path+'Corporations/Corporations Disclosing/'+survey_type+'/'+survey_year+'_Corporates_Disclosing_to_CDP_Climate_Change.csv')
census_df = pd.read_csv(path+'Supplementary Data/CDC 500 Cities Census Tract Data/500_Cities__Census_Tract-level_Data__GIS_Friendly_Format___2019_release.csv')
social_df = pd.read_csv(path+'Supplementary Data/CDC Social Vulnerability Index 2018/SVI2018_US.csv')
social_county_df = pd.read_csv(path+'Supplementary Data/CDC Social Vulnerability Index 2018/SVI2018_US_COUNTY.csv')
uscities_df = pd.read_csv(path+'Supplementary Data/Simple Maps US Cities Data/uscities.csv')

### Cleaning City Data

When looking through the CDP Cities Responses data, we realized that there was a few fixes needed in order to create a metric from a clean dataset. 

Some of these fixes included dealing with duplicate City names (with different account numbers) and renaming the city name to one that is more common (removing 'City of' from the name). 

We also separated the Latitude and Longitude information for each city into two columns, as a way to help us in the future when determining which City belong to with State in the United States. 

In [None]:
# removing duplicate cities
cities_info = cities_info_df.copy()
cities_info.loc[cities_info['Account Number'] == 60393, ['City']] = 'Santiago Government'

# cleaning city names
city_names = cities_info[['Account Number','City']]
cities_clean = pd.merge(cities_df,city_names,how='left',on='Account Number')
cities_clean['Organization'] = cities_clean['City']
cities_clean.drop(labels='City', axis="columns", inplace=True)
cities_clean = cities_clean.rename(columns={'Organization':'City'})

# cleaning lat and long column
latlong = cities_info['City Location'].str.strip('POINT ()').str.split(' ', expand=True).rename(columns={0:'Longitude', 1:'Latitude'}) 
cities_info = pd.merge(cities_info,latlong,left_index=True,right_index=True)
cities_info.head()

### Adding States to Cities Data

In [None]:
# adding states to cities dataframe
uscities = uscities_df.copy()
uscities = uscities.rename(columns = {'county_name':'County','state_id':'State','city':'City'})
uscities = uscities[['City','State','County','lat','lng']].drop_duplicates()

# fix errors (switched latitude and longitude)
cities_info.loc[cities_info['City'] == 'Key West',['Latitude']] = 24.5551
cities_info.loc[cities_info['City'] == 'Key West',['Longitude']] = 81.78

cities_info.loc[cities_info['City'] == 'South Bend',['Latitude']] = 41.6574 
cities_info.loc[cities_info['City'] == 'South Bend',['Longitude']] = -86.2532

cities_info.loc[cities_info['City'] == 'Aurora',['Latitude']] = 41.7606 
cities_info.loc[cities_info['City'] == 'Aurora',['Longitude']] = 88.3201

cities_info.loc[cities_info['City'] == 'Norfolk',['Latitude']] = 38.8468 
cities_info.loc[cities_info['City'] == 'Norfolk',['Longitude']] = -76.2851

# use pythagorean distance to determine most likely state that the city is in
pyth_dist = pd.merge(uscities[['City','State','lat','lng']],cities_info[['City','Latitude','Longitude']],on='City',how='left')
pyth_dist[pyth_dist['City']=='Cleveland']

def pyth(lat,Lat,lng,Long):
    Q = (( float(lat) - abs(float(Lat)) )**2  + (float(lng) + abs(float(Long)))**2 )**0.5
    return Q

pyth_dist['dist'] = pyth_dist.apply(lambda row:pyth(row['lat'],row['Latitude'],row['lng'],row['Longitude']), axis=1)
pyth_dist = pyth_dist[pyth_dist['Latitude'].notnull()]
pyth_dist = pyth_dist.loc[pyth_dist.groupby('City')['dist'].idxmin()]

# add states to cities_clean 
city_states = pyth_dist[['City','State']]
cities_clean = pd.merge(cities_clean,city_states,on='City',how='left')
cities_clean.head()

### Cleaning Social Vulnerability Data

We also realized that we had to add City names to the Social Vulnerability data (which only had county names) in order to eventually match the score we developed to a hazard score. This was done by using the Simple Maps data, which had information matching Cities with Counties. 

In [None]:
# adding cities, latitude and longitude to counties
uscities = uscities_df.copy()
social_county = social_county_df.copy()

uscities = uscities.rename(columns = {'county_name':'County','state_id':'State','city':'City'})
uscities = uscities[['City','State','County','lat','lng']].drop_duplicates()
social_county = social_county.rename(columns = {'COUNTY':'County','ST_ABBR':'State'})

social_cities = pd.merge(social_county,uscities,on=['State','County'],how='right')
social_cities.head()

### Creating a Social Vulnerability Score

In [None]:
def normalize(df,var):
    min_var = df[var].min()
    max_var = df[var].max()
    diff = max_var - min_var
    df['new_var'] = (df[var] - min_var) / diff
    return df

sc_metrics = social_cities

# poverty metric
sc_pov = sc_metrics[['State','County','City','EP_POV']][sc_metrics['EP_POV'] >= 0]
sc_pov = normalize(sc_pov,'EP_POV').rename(columns={'new_var':'EP_POV_std'})
sc_pov = sc_pov[['City','State','EP_POV_std']].drop_duplicates()

# unemployment metric
sc_unp = sc_metrics[['State','County','City','EP_UNEMP']][sc_metrics['EP_UNEMP'] >= 0]
sc_unp = normalize(sc_unp,'EP_UNEMP').rename(columns={'new_var':'EP_UNEMP_std'})
sc_unp = sc_unp[['City','State','EP_UNEMP_std']].drop_duplicates()

# education metric
sc_edu = sc_metrics[['State','County','City','EP_NOHSDP']][sc_metrics['EP_NOHSDP'] >= 0]
sc_edu = normalize(sc_edu,'EP_NOHSDP').rename(columns={'new_var':'EP_NOHSDP_std'})
sc_edu = sc_edu[['City','State','EP_NOHSDP_std']].drop_duplicates()

# transportation
sc_tsp = sc_metrics[['State','County','City','EP_NOVEH']][sc_metrics['EP_NOVEH'] >= 0]
sc_tsp = normalize(sc_tsp,'EP_NOVEH').rename(columns={'new_var':'EP_NOVEH_std'})
sc_tsp = sc_tsp[['City','State','EP_NOVEH_std']].drop_duplicates()

# crowded living situation
sc_crd = sc_metrics[['State','County','City','EP_CROWD']][sc_metrics['EP_CROWD'] >= 0]
sc_crd = normalize(sc_crd,'EP_CROWD').rename(columns={'new_var':'EP_CROWD_std'})
sc_crd = sc_crd[['City','State','EP_CROWD_std']].drop_duplicates()

# disabled but non institutionalized
sc_dis = sc_metrics[['State','County','City','EP_DISABL']][sc_metrics['EP_DISABL'] >= 0]
sc_dis = normalize(sc_dis,'EP_DISABL').rename(columns={'new_var':'EP_DISABL_std'})
sc_dis = sc_dis[['City','State','EP_DISABL_std']].drop_duplicates()

# single parents with young kids
sc_sng = sc_metrics[['State','County','City','EP_SNGPNT']][sc_metrics['EP_SNGPNT'] >= 0]
sc_sng = normalize(sc_sng,'EP_SNGPNT').rename(columns={'new_var':'EP_SNGPNT_std'})
sc_sng = sc_sng[['City','State','EP_SNGPNT_std']].drop_duplicates()

# disadvantaged score = (poverty+unemployment+education+transportation+crowded+disabled+single)/7
disadv_score = pd.merge(sc_pov,sc_unp,on=['State','City'],how='left')
disadv_score = pd.merge(disadv_score,sc_edu,on=['State','City'],how='left')
disadv_score = pd.merge(disadv_score,sc_tsp,on=['State','City'],how='left')
disadv_score = pd.merge(disadv_score,sc_crd,on=['State','City'],how='left')
disadv_score = pd.merge(disadv_score,sc_dis,on=['State','City'],how='left')
disadv_score = pd.merge(disadv_score,sc_sng,on=['State','City'],how='left')
disadv_score['disadv_score'] = (disadv_score['EP_POV_std']+disadv_score['EP_UNEMP_std']+disadv_score['EP_NOHSDP_std']+disadv_score['EP_NOVEH_std']+disadv_score['EP_CROWD_std']+disadv_score['EP_DISABL_std']+disadv_score['EP_SNGPNT_std'])/7

disadv_score = disadv_score[['City','State','disadv_score']].drop_duplicates()
disadv_score = disadv_score[disadv_score['disadv_score'].notnull()]
disadv_score = disadv_score.loc[disadv_score.groupby(['City','State'])['disadv_score'].idxmin()]

disadv_score[disadv_score['City'].isin(cities_info.City.unique())].sort_values('disadv_score',ascending=False)

### Creating a Hazard Score

Within the CDP Cities Responses data, we found that a number of cities in the United States described both the upcoming and the current challenges that the city is facing due to climate change, with variables that allowed us to determine whether one city was more or less at risk in comparison to the others. By mapping word categories to numerical values and probabilities, we could then create a 'hazard' score, with the highest score indicating a city that is facing the largest number of high consequence climate change related hazards. 

In [None]:
# finding hazards per city

hazards = cities_clean[cities_clean['Question Number'] == '2.2a']
hazards = hazards[hazards['Response Answer'].notnull()]

hazards = hazards.pivot_table(index=['Account Number', 'City', 'State','Row Number'],
                                     columns='Column Name', 
                                     values='Response Answer',
                                     aggfunc=lambda x: ' '.join(x)).reset_index()
hazards.head()

In [None]:
hazard_score = hazards.copy()

# filtering out uncertain responses to be analyzed separately
hazard_score = hazard_score[(hazard_score['Consequence of hazard'] != 'Do not know') & (hazard_score['Probability of hazard'] != 'Do not know')]

# filtering out most urgent hazards that are currently affecting cities
hazard_score = hazard_score[hazard_score['Hazard status'] == 'Currently affecting the city']

# creating numerical mappings for hazard characteristics
timescale = {'Long-term':2.0, 'Medium-term':6.0,  'Short-term':10.0}
magnitude = {'Less serious':2.0, 'Serious':6.0, 'Extremely serious':10.0}
consequence = {'Low':2.0, 'Medium Low':4.0, 'Medium':6.0, 'Medium High':8.0, 'High':10.0}
probability = {'Low':0.2, 'Medium Low':0.4, 'Medium':0.6, 'Medium High':0.8, 'High':0.95}

hazard_score['Anticipated timescale'].replace(timescale, inplace=True)
hazard_score['Consequence of hazard'].replace(consequence, inplace=True)
hazard_score['Magnitude of impact'].replace(magnitude, inplace=True)
hazard_score['Probability of hazard'].replace(probability, inplace=True)

# defining score and taking average across all hazards
hazard_score['hazard_score'] = hazard_score['Anticipated timescale']+hazard_score['Magnitude of impact']+(hazard_score['Consequence of hazard']*hazard_score['Probability of hazard'])
hazard_score[hazard_score['hazard_score'].notnull()][['City','hazard_score']].sort_values('hazard_score')
hazard_score = hazard_score[['City','State','hazard_score']].groupby(['City','State']).mean()

hazard_score.sort_values('hazard_score', ascending=False)

### Combining Scores To Create 'Most At Risk' Metric

Finally, multiplying the 'hazard' score with the 'disadvantaged' score of a city creates a 'most at risk' metric where a city with a high value indicates being most disadvantaged from both a socio-economic and climate change standpoint, allowing a clear mechanism to determine who needs the most help urgently. 

In [None]:
most_at_risk = pd.merge(hazard_score, disadv_score, on=['City','State'], how='left')
most_at_risk['mar_metric'] = most_at_risk['hazard_score']*most_at_risk['disadv_score']
most_at_risk.sort_values('mar_metric',ascending=False)