## INTRODUCTION

CDP takes a unique view of climate change, from the perspective of both cities and corporations, incentivizing cities and corporations to become leaders in environmental transparency and action through disclosure. While this provides a good starting point for solving the global climate crisis, the CDP recognizes that there are many factors that influence both the cause and the consequency of climate change. 

From this point, the CDP presents a compelling starting point for the data analysis: looking at the overlap between climate change, social issues, global health, the economy, and the corporations, cities, and societies that underpin all of this.

In [None]:
# standard libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import dask.dataframe as dd
from pandas.api.types import CategoricalDtype
from dask.diagnostics import ProgressBar

# Input data files are available in the read-only "../input/" directory

import os
import matplotlib.pyplot as plt
import re
import json

# plotting libraries
import seaborn as sns

# geospatial libraries
import plotly.graph_objects as go
import geopandas as gpd
import folium
import plotly_express as px

# set in line plotly 
from plotly.offline import init_notebook_mode;
init_notebook_mode(connected=True)

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

print(os.getcwd())

In [None]:
# import corporate response data - climate change
cc_df_2018 = pd.read_csv('../input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses/Climate Change/2018_Full_Climate_Change_Dataset.csv')
cc_df_2019 = pd.read_csv('../input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses/Climate Change/2019_Full_Climate_Change_Dataset.csv')
cc_df_2020 = pd.read_csv('../input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses/Climate Change/2020_Full_Climate_Change_Dataset.csv')

# import corproate response data - water security
ws_df_2018 = pd.read_csv('../input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses/Water Security/2018_Full_Water_Security_Dataset.csv')
ws_df_2019 = pd.read_csv('../input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses/Water Security/2019_Full_Water_Security_Dataset.csv')
ws_df_2019 = pd.read_csv('../input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses/Water Security/2019_Full_Water_Security_Dataset.csv')

# import cities response df
cities_df_2018 = pd.read_csv("../input/cdp-unlocking-climate-solutions/Cities/Cities Responses/2018_Full_Cities_Dataset.csv")
cities_df_2019 = pd.read_csv("../input/cdp-unlocking-climate-solutions/Cities/Cities Responses/2019_Full_Cities_Dataset.csv")
cities_df_2020 = pd.read_csv("../input/cdp-unlocking-climate-solutions/Cities/Cities Responses/2020_Full_Cities_Dataset.csv")

# external data - import CDC social vulnerability index data - census tract level
svi_df = pd.read_csv("../input/cdp-unlocking-climate-solutions/Supplementary Data/CDC Social Vulnerability Index 2018/SVI2018_US.csv")

# cities metadata - lat,lon locations for US cities
cities_meta_df = pd.read_csv("../input/cdp-unlocking-climate-solutions/Supplementary Data/Simple Maps US Cities Data/uscities.csv")

# cities metadata - CDP metadata on organisation HQ cities
cities_cdpmeta_df = pd.read_csv("../input/cdp-unlocking-climate-solutions/Supplementary Data/Locations of Corporations/NA_HQ_public_data.csv")

### PROBLEM STATEMENT

Develop a methodology for calculating key performance indicators (KPIs) that relate to the *environmental and social issues* that are discussed in the CDP survey data. 

* How do you help cities adapt to a rapidly changing climate amidst a global pandemic, but do it in a way that is socially equitable?

* What are the projects that can be invested in that will help pull cities out of a recession, mitigate climate issues, but not perpetuate racial/social inequities?

* What are the practical and actionable points where city and corporate ambition join, i.e. where do cities have problems that corporations affected by those problems could solve, and vice versa?

* How can we measure the intersection between environmental risks and social equity, as a contributor to resiliency?

### APPROACH

To tackle these questions, we decided to look beyond the symptoms of climate change and climate health, to the wider socio-economic factors driving both the underlying causes leading to climate change, as well as the impact to those most vulnerable to inform decisions for investment in projects that have the greastest impact.

**STEP 1: UNDERSTAND THE PROBLEM**

Before beginning to define a solution, we had to understand the impact of climate change.  After looking at the impact both nation wide and globally, we realised there was a common theme emerging. That was the groups of people within the population that most vulnerable to climate hazards according to global city responses.

In [None]:
# Data Cleansing and EDA of Vulnerable Populations

# Remove duplicates from dataset - courtesy of cdp_starter_notebook

def list_dedupe(x):
    """
    Convert list to dict and back to list to dedupe
    
    Parameters
    ----------
    x: list
        Python list object
        
    Returns
    -------
    dictionary:
        dictionary object with duplicates removed
        
    """
    return list(dict.fromkeys(x))

In [None]:
# Identify the groups of people within the population that are most vulnerable to climate hazards according to city responses.

# CDP Dataset Reference: 2020_Full_Cities_Dataset, Question 2.1, Column 7 "Please identify which vulnerable populations are affected"

cities_2_1 = cities_df_2020[cities_df_2020['Question Number'] == '2.1']\
    .rename(columns={'Organization': 'City'})

cities_2_1_7 = cities_2_1[cities_2_1['Column Number'] == 7]\
    .rename(columns={'Response Answer': 'vulnerable_groups'})

cities_2_1_7['Column Number'] = cities_2_1_7['Column Number'].fillna('No Response')

cities_2_1_7.head()

In [None]:
vulnerable_populations = cities_2_1_7.vulnerable_groups.value_counts()

# How can we measure the intersection between environmental risks and social equity, as a contributor to resiliency?

In [None]:
# How can we measure the intersection between environmental risks and social equity, as a contributor to resiliency?

vulnerable_populations = vulnerable_populations[:8]
plt.figure(figsize=(23, 5))
sns.barplot(vulnerable_populations.index, vulnerable_populations.values, alpha=0.8)
plt.title('Vulnerable Populations Most Impacted by Climate Change')
plt.ylabel('Vulnerable Populations Across All Climate Hazards')
plt.xlabel('Vulnerable Groups')
plt.show()

In [None]:
cities_2_1_8 = cities_2_1[cities_2_1['Column Number'] == 8]\
    .rename(columns={'Response Answer': 'future_frequency'})

cities_2_1_9 = cities_2_1[cities_2_1['Column Number'] == 9]\
    .rename(columns={'Response Answer': 'future_intensity'})

cities_2_1_10 = cities_2_1[cities_2_1['Column Number'] == 10]\
    .rename(columns={'Response Answer': 'future_magnitude'})

cities_2_1_11 = cities_2_1[cities_2_1['Column Number'] == 11]\
    .rename(columns={'Response Answer': 'impact_intensity_timing'})

cities_2_1_9.head()



Let's correlate the vulnerable populations with the frequency, intensity, and impact of climate hazards. 
# This will allow us to do further analysis against other social factors like the global pandemic.

In [None]:
# Let's correlate the vulnerable populations with the frequency, intensity, and impact of climate hazards. 
# This will allow us to do further analysis against other social factors like the global pandemic.

cities_2_1_7 = cities_2_1_7.rename(columns={'Account Number': 'account_number'})
cities_2_1_8 = cities_2_1_8.rename(columns={'Account Number': 'account_number'})
cities_2_1_9 = cities_2_1_9.rename(columns={'Account Number': 'account_number'})
cities_2_1_10 = cities_2_1_10.rename(columns={'Account Number': 'account_number'})
cities_2_1_11 = cities_2_1_11.rename(columns={'Account Number': 'account_number'})

cities_2_1_7_join = cities_2_1_7[['account_number','Country', 'CDP Region', 'vulnerable_groups', 'Row Number']]
cities_2_1_8_join = cities_2_1_8[['account_number', 'future_frequency']]
cities_2_1_9_join = cities_2_1_9[['account_number', 'future_intensity']]
cities_2_1_10_join = cities_2_1_10[['account_number', 'future_magnitude']]
cities_2_1_11_join = cities_2_1_11[['account_number', 'impact_intensity_timing']]

In [None]:
# Let's hone in future magnitute, as this takes into account frequency and intensity.

cities_2_1_7_10 = pd.merge(left=cities_2_1_7_join, right=cities_2_1_10_join,on='account_number')


cities_2_1_7_10['future_magnitude'] = cities_2_1_7_10['future_magnitude'].fillna('None')

cities_2_1_7_10 = cities_2_1_7_10[~cities_2_1_7_10.vulnerable_groups.str.contains("Other", "None", na=False)]
cities_2_1_7_10 = cities_2_1_7_10[~cities_2_1_7_10.future_magnitude.str.contains("Other", "None", na=False)]

cities_2_1_7_10 = cities_2_1_7_10[:500]

cities_2_1_7_10.head()

In [None]:
# Let's hone in future magnitute, as this takes into account frequency and intensity.

cities_2_1_7_11 = pd.merge(left=cities_2_1_7_join, right=cities_2_1_11_join,on='account_number')

cities_2_1_7_11['impact_intensity_timing'] = cities_2_1_7_11['impact_intensity_timing'].fillna('None')

cities_2_1_7_11 = cities_2_1_7_11[~cities_2_1_7_11.vulnerable_groups.str.contains("Other", "None", na=False)]
cities_2_1_7_11 = cities_2_1_7_11[~cities_2_1_7_11.impact_intensity_timing.str.contains("Other", "None", na=False)]

cities_2_1_7_11 = cities_2_1_7_11[:500]

cities_2_1_7_11.head()

In [None]:
cities_2_1_7_10_group = cities_2_1_7_10.groupby('vulnerable_groups')['future_magnitude'].value_counts()
cities_2_1_7_10_group = cities_2_1_7_10_group[:10]
cities_2_1_7_10_group.head()

In [None]:
cities_2_1_7_10_groupdf = cities_2_1_7_10_group.to_frame()

In [None]:
cities_2_1_7_10_groupdf.info()

In [None]:
cities_2_1_7_10_groupdf['vulnerable_groups_col'] = cities_2_1_7_10_group.index
cities_2_1_7_10_groupdf['vulnerable_groups_col'] = cities_2_1_7_10_groupdf['vulnerable_groups_col'].fillna('None')
cities_2_1_7_10_groupdf['future_magnitude'] = cities_2_1_7_10_groupdf['future_magnitude'].fillna('None')
cities_2_1_7_10_groupdf.head()

In [None]:
cities_2_1_7_10_groupdf.info()

In [None]:
fig = px.bar_polar(cities_2_1_7_10, 
                   r="future_magnitude", 
                   theta="vulnerable_groups",
                   color="future_magnitude", 
                   template="plotly_dark",
                   color_discrete_sequence= px.colors.sequential.Plasma_r, height=600)
fig.show()

In [None]:
cities_2_1_7_11['impact_intensity_timing'] = cities_2_1_7_11['impact_intensity_timing'].fillna('None')

cities_2_1_7_11 = cities_2_1_7_11[~cities_2_1_7_11.vulnerable_groups.str.contains("Other", "None", na=False)]

fig = px.bar_polar(cities_2_1_7_11, 
                   r="impact_intensity_timing", 
                   theta="vulnerable_groups",
                   color="impact_intensity_timing", 
                   template="plotly_dark",
                   color_discrete_sequence= px.colors.sequential.Plasma_r, height=600)
fig.show()