# KPIs to Keep an eye on

## Quantifying Environmental Issues with a Social Context

## Content:
- [Introduction](#introduction)
    * [Problem Statement](#problem-statement)
- [The Data](#the-data)
- [Analysis](#analysis)
    * [Climate Hazards](#climatehazards)
    * [Adaptation](#adaptation)
    * [Clustering with various metrics](#clustering)
        * [Clustering Categorical Features](#catclustering)
        * [Clustering Numerical Features](#numclustering)
- [Conclusion](#conclusion)

# Introduction<a class="anchor" id="introduction"></a>
The Carbon Disclosure Project (CDP) is a global non-profit that drives companies and governments to reduce their greenhouse gas emissions, safeguard water resources, and protect forests. Each year, CDP takes the information supplied in its annual reporting process and scores companies and cities based on their journey through disclosure and towards environmental leadership.

### Problem Statement <a class="anchor" id="problem-statement"></a>
Develop a methodology for calculating key performance indicators (KPIs) that relate to the environmental and social issues that are discussed in the CDP survey data. Leverage external data sources and thoroughly discuss the intersection between environmental issues and social issues. Mine information to create automated insight generation demonstrating whether city and corporate ambitions take these factors into account.

## The Data<a class="anchor" id="the-data"></a>

In [None]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', -1)
import re
import matplotlib.pyplot as plt
import plotly.graph_objects as go

from kmodes.kmodes import KModes
from sklearn import preprocessing
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

import warnings
warnings.filterwarnings('ignore')

Climate crisis is acknowledged throughout the world. Putting some numbers to the issue takes away the ambiguity and gives us an objective perspective to move forward.

The data consists of responses from 1032 different cities reported between 2018 and 2020. For simplicity, the 2019 dataset was used as it had the most reporting cities (861) and fairly up-to-date questionaire. Cities reporting in 2019 account for approximately 2.2 billion people. To add perspective, the current world population is 7.6 billion.


In [None]:
# loading city details and concatenating data
cities18 = pd.read_csv('../input/cdp-unlocking-climate-solutions/Cities/Cities Disclosing/2018_Cities_Disclosing_to_CDP.csv')
cities19 = pd.read_csv('../input/cdp-unlocking-climate-solutions/Cities/Cities Disclosing/2019_Cities_Disclosing_to_CDP.csv')
cities20 = pd.read_csv('../input/cdp-unlocking-climate-solutions/Cities/Cities Disclosing/2020_Cities_Disclosing_to_CDP.csv')
citiesall = pd.concat([cities18,cities19, cities20])

# loading responses and concatening data
resp18 = pd.read_csv('../input/cdp-unlocking-climate-solutions/Cities/Cities Responses/2018_Full_Cities_Dataset.csv')
resp19 = pd.read_csv('../input/cdp-unlocking-climate-solutions/Cities/Cities Responses/2019_Full_Cities_Dataset.csv')
resp20 = pd.read_csv('../input/cdp-unlocking-climate-solutions/Cities/Cities Responses/2020_Full_Cities_Dataset.csv')
respall = pd.concat([resp18,resp19,resp20])

In [None]:
def filtq(qdf, qnum, cnum, rnum):
    '''filters CDP questionaire dataset by question, column and row number'''
    filcon = (qdf['Question Number'].apply(lambda x: x in qnum)) & (qdf['Column Number'].apply(lambda x: x in cnum)) & (qdf['Row Number'].apply(lambda x: x in rnum))
    return qdf[filcon]

In [None]:
city_count_yearly = [len(citdf['Account Number'].unique()) for citdf in [cities18, cities19, cities20]]

In [None]:
city_series = pd.concat([cities18['Account Number'],cities19['Account Number'],cities20['Account Number']])
city_count_total = len(city_series.unique())

In [None]:
city_count_yearly

In [None]:
city_count_total

In [None]:
import geopandas
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from shapely import wkt

In [None]:
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
ax = world.plot(color='white', edgecolor='black', figsize=(20,20))
df = cities19[['City','Country','City Location']].dropna(subset=['City Location'])
df['City Location'] = df['City Location'].apply(wkt.loads)
gdf = geopandas.GeoDataFrame(df, geometry='City Location')
gdf.plot(ax=ax, color='red')
plt.title('World Map of Disclosing Cities in 2019')

Acknowledgement and Planning is the first step in tackling any problem. Most of the reporting cities have created climate vulnerability assessments or intend to do so in the near future.

In [None]:

# filtering to vulnerability assesment question
ax = resp19[resp19['Question Number']=='2.0']\
.groupby('Response Answer').count().sort_values(by=['Account Number'], ascending=False)['Account Number']\
.plot.barh()

plt.title('Climate Change Vulnerability Assessments')
plt.xlabel('Number of Cities')

In [None]:
# filtering hazards and probability of hazards
tempdf2 = resp19[(resp19['Question Number']=='2.1') & ((resp19['Column Number']==1) | (resp19['Column Number']==3))]
pivind = pd.MultiIndex.from_tuples(list(zip(tempdf2['Account Number'],tempdf2['Row Number'])), names=['Account Number', 'Row Number'])
tempdf2['Pivot Index'] = pivind
tempdf2 = tempdf2.pivot(values='Response Answer',columns='Column Name',index='Pivot Index')
tempdf2 = tempdf2.reset_index().groupby(['Climate Hazards','Current probability of hazard']).count().reset_index()
tempdf2 = tempdf2.pivot(values='Pivot Index',index='Climate Hazards',columns='Current probability of hazard')
tempdf2 = tempdf2[['High','Medium High','Medium','Medium Low','Low','Does not currently impact the city','Do not know']]
tempdf2['Total']=tempdf2.sum(axis=1)
tempdf2 = tempdf2.sort_values(by='Total', ascending=False).head(16).drop(columns=['Total'])
# top most listed hazards
tophaz = tempdf2.index

In [None]:

tempdf2.plot.barh(stacked=True, figsize=(10,10), color=['red','darkorange','orange','gold','khaki','lightgreen','lightblue'])
plt.gca().invert_yaxis()
plt.title('Top 15 most listed Climate Hazards')
plt.xlabel('Number of Cities')

In [None]:
# hazards listed
tempdf = resp19[(resp19['Question Number']=='2.1')\
                & (resp19['Column Number']==1)]
tempdf = tempdf[tempdf['Response Answer'].apply(lambda x: x in tophaz)]
# specific population affected by hazard
tempdf2 = resp19[(resp19['Question Number']=='2.1')\
       & ((resp19['Column Number']==10))]

pivind = list(zip(tempdf['Account Number'],tempdf['Row Number']))
pivind2 = list(zip(tempdf2['Account Number'],tempdf2['Row Number']))
tempdf['Pivot Index'] = pivind
tempdf2['Pivot Index'] = pivind2

topcols = tempdf2.groupby('Response Answer').count().sort_values(by=['Account Number'],ascending=False).head(16).index
tempdf2 = tempdf2.pivot(values='Account Number',index='Pivot Index',columns='Response Answer')

tempdf3 = tempdf.merge(tempdf2[topcols].notna().reset_index(), left_on='Pivot Index', right_on='Pivot Index',\
          suffixes=('_left', '_right'))
tempdf3 = tempdf3.groupby('Response Answer').sum()[topcols].loc[tophaz[0:10]]
tempdf4 = tempdf3.reset_index().melt(id_vars=['Climate Hazards'])


In [None]:
all_nodes = tempdf4['Climate Hazards'].values.tolist() + tempdf4['variable'].values.tolist()
source_indices = [all_nodes.index(haz) for haz in tempdf4['Climate Hazards']]
target_indices = [all_nodes.index(affpop) for affpop in tempdf4['variable']]

fig = go.Figure(data=[go.Sankey(
    node = dict(
      pad = 20,
      thickness = 20,
      line = dict(color = "black", width = 1.0),
      label =  all_nodes,
    ),

    link = dict(
      source =  source_indices,
      target =  target_indices,
      value =  tempdf4.value,
))])

fig.update_layout(title_text="Popluations Affected by Climate Hazards",
                  font_size=10)
fig.show()

The sankey diagram shows the number cities listing a specific popluation as most affected by a given hazard. Although Extreme Precipitation > rainstorm was the most listed hazard, certain populations such as elderly, childern and low-income households, are most impacted by extremely hot temperatures.

### Adaptation<a class="anchor" id="adaptation"></a>

In [None]:
adap19 = filtq(resp19, '3.0', [1,2,8,10], list(range(1000)))
pivind = pd.MultiIndex.from_tuples(list(zip(adap19['Account Number'],adap19['Row Number'])), names=['Account Number', 'Row Number'])
adap19['Pivot Index'] = pivind
adap19 = adap19.pivot(values='Response Answer',columns='Column Name', index='Pivot Index')

In [None]:
adact19 = adap19.reset_index().groupby('Action').count().sort_values(by=['Pivot Index'], ascending=False).head(10)['Pivot Index']
ax = adact19.plot.barh()
plt.gca().invert_yaxis()
plt.title('Action taken for Hazards')
plt.xlabel('Number of times listed')

The top actions taken to combate climate hazards were planting trees and flood mapping. These actions directly address the top climate hazards which were extreme temperatures, precipitation/flooding and drought. These steps are the appropriate approach to handle the most common issues.

In [None]:
# factors affecting adaption
adaff = filtq(resp19,'2.2',[1,2,3],[1])
pivind = pd.MultiIndex.from_tuples(list(zip(adaff['Account Number'],adaff['Row Number'])), names=['Account Number', 'Row Number'])
adaff['Pivot Index'] = pivind
adaff = adaff.pivot(values='Response Answer',columns='Column Name', index='Pivot Index')

adaff = adaff.reset_index()\
.groupby(['Factors that affect ability to adapt','Support / Challenge'])\
.count()\
.reset_index()[['Factors that affect ability to adapt','Pivot Index','Support / Challenge']]
adaff['Factors that affect ability to adapt'] = adaff['Factors that affect ability to adapt'].apply(lambda x: x[0:70])


adaff = adaff.pivot(index='Factors that affect ability to adapt',values='Pivot Index', columns='Support / Challenge')\
.fillna(0)
adaff=adaff[['Support','Challenge']]

In [None]:
adaff.sort_values(by='Challenge')\
.plot.barh(figsize=(10,10))
plt.title('Factors Affecting ability to Adapt')
ax1 = plt.axes()
y_axis = ax1.axes.get_yaxis()
y_label = y_axis.get_label()
y_label.set_visible(False)
plt.xlabel('Number of Cities')

Cities listed the above factors as an issue or strength in adapting to climate hazards. 'Access to basic services' was listed about 100 times. About 30 cities have adequate access to basic services and have listed them as a support adapting to climate change. 70 cities still need bolstering of basic services and studying the strategy of the aformentioned 30 could reveal insights to address this challenge. <br>
<br>
Budgetary capacity was almost always listed as a challenge suggesting more funds are needed for adaption plans.

### Clustering with various metrics <a class="anchor" id="clustering"></a>

The data, as it is, reveals valuable insights and simple calculations can be adequate KPIs. However, the purpose of this endeavor is to devise a KPI that can measure whether an individual city's response effectively addresses looming environmental issues while also considering social issues.

Most pertinent data from the emissions, emission reduction, energy, oppurtunity, buildings, transport and waste sections of the questionaire have been extracted to compile a city level data frame.

It is difficult to create a standard methodology for all cities as there are many different standards of reporting. The reported metrics are not always directly comparable or substitutable. Therefore, many fields have been imputed. Perhaps once a KPI methodology is established, it will be easier to create a standard and collect the specific data readily from participating cities.

In [None]:
# compile data at city level
# emissions, emission reduction, energy, oppurtunity, buildings, transport, waste

# getting city population and area
filtdf = filtq(resp19, ['0.5', '0.6'], [1,3],[1])
citycompile = filtdf.pivot(values='Response Answer', columns='Column Name', index='Account Number')
citycompile = citycompile.rename(columns={"Land area of the city boundary as defined in question 0.1 (in square km)":"Land Area (sq km)"})
citycompile = citycompile.apply(pd.to_numeric)

# getting climate if vulnerability report made
filtdf = filtq(resp19, ['2.0'], [0],[0])
pivdf = filtdf.pivot(values='Response Answer', columns='Row Name', index='Account Number')
pivdf.columns = ['Risk Vulnerability Assessment']
citycompile = citycompile.join(pivdf, how='outer')

# getting emissions, CO2 metric tonnes from different scopes
filtdf = filtq(resp19, ['4.6b'],[1], range(100))
pivdf = filtdf.pivot(values='Response Answer', columns='Row Name', index='Account Number')
pivdf = pivdf.apply(pd.to_numeric)
citycompile = citycompile.join(pivdf, how='outer')

# getting emiision reduction targets, %target and % achievedd
filtdf = filtq(resp19, ['5.0a'], [1,7,10], range(100))
filtdf['pivind'] = index = pd.MultiIndex.from_frame(filtdf[['Account Number','Row Number']])
pivdf = filtdf.pivot(values='Response Answer', columns='Column Name', index='pivind')
pivdf = pivdf[pivdf.Sector=='All emissions sources included in city inventory']
pivdf = pivdf.reset_index()
pivdf['Account Number'] = pivdf.pivind.apply(lambda x: x[0])
pivdf = pivdf.set_index('Account Number')[['Percentage of target achieved so far', 'Percentage reduction target']]
pivdf.columns = ['Emission Reduction Achieved (%)', 'Emission Reduction Target (%)']
pivdf = pivdf.apply(pd.to_numeric)
citycompile = citycompile.join(pivdf, how='outer')

# getting energy data, renewable target exists and percentage of different sources
filtdf = filtq(resp19, ['8.0'], [0], [0])
pivdf = filtdf.pivot(values='Response Answer', columns='Column Name', index='Account Number')
pivdf.columns = ['Renewable Target Exists']
citycompile = citycompile.join(pivdf, how='outer')

filtdf = filtq(resp19, ['8.0a'], [5], range(20))
filtdf['Response Answer'] = pd.to_numeric(filtdf['Response Answer'])
filtdf = filtdf.groupby(['Account Number', 'Column Name']).max()['Response Answer']
filtdf = filtdf.reset_index(level='Column Name', drop=True).fillna(0)
citycompile = citycompile.join(filtdf, how='outer')
citycompile = citycompile.rename(columns={'Response Answer':'Renewable Energy (%)'})

# getting oppurtinity data, ESG in investing, business collab, established funds,
# green growth strategy, development bank funding, green jobs
filtdf = filtq(resp19, ['6.1','6.3','6.4','6.6','6.9','6.11'],[0],[0])
pivdf = filtdf.pivot(values='Response Answer', columns='Question Name', index='Account Number')
pivdf = pivdf.rename(columns={'How many people within your City are employed in green jobs/ industries?':'Green Jobs'})
pivdf['Green Jobs'] = pd.to_numeric(pivdf['Green Jobs'])
citycompile = citycompile.join(pivdf, how='outer')

# getting buildings data, CO2 from different buildings
filtdf = filtq(resp19, ['9.0'], [1],range(10))
pivdf = filtdf.pivot(values='Response Answer', columns='Row Name', index='Account Number')
pivdf = pivdf.rename(columns={'Municipal' : 'Municipal Buildings', 
                              'Commercial':'Commerical Buildings',
                             'Residential': 'Residential Buildings'})
pivdf.columns = [colname+' (CO2 Tonnes per capita)' for colname in pivdf.columns]
pivdf = pivdf.apply(pd.to_numeric)
citycompile = citycompile.join(pivdf, how='outer')

# getting transportation data, % of different transport modes
filtdf = filtq(resp19, ['10.1'], range(10),range(10))
pivdf = filtdf.pivot(values='Response Answer', columns='Column Name', index='Account Number').fillna(0)
pivdf = pivdf.apply(pd.to_numeric)
citycompile = citycompile.join(pivdf, how='outer')

# getting waste data, waste per capita
filtdf = filtq(resp19, ['13.0'], [1],[2])
pivdf = filtdf.pivot(values='Response Answer', columns='Row Name', index='Account Number')
pivdf = pivdf.apply(pd.to_numeric)
pivdf2 = pivdf
citycompile = citycompile.join(pivdf, how='outer')

# copy before imputing
unimpcitycomp = citycompile.copy()

##### Imputing Missing Values

In [None]:

# imputing categorical column NA values with no
citycompile.loc[:,citycompile.dtypes=='object'] = citycompile.loc[:,citycompile.dtypes=='object'].fillna('No').replace('Do not know', 'No')

# imputing base population with mean
citycompile['Current population'] = citycompile['Current population'].fillna(citycompile['Current population'].mean())

# dividing emission data and green jobscolumns with base year population
stdcolumns=['Agriculture, Forestry and Land Use – Scope 1 (V)',
           'Industrial Processes and Product Use – Scope 1 (IV)',
           'Stationary Energy: energy generation supplied to the grid – Scope 1 (I.4.4)',
           'Stationary Energy: energy use – Scope 1 (I.X.1)',
           'Stationary Energy: energy use – Scope 2 (I.X.2)',
           'Stationary Energy: energy use – Scope 3 (I.X.3)',
           'TOTAL BASIC emissions', 'TOTAL BASIC+ emissions',
           'TOTAL Scope 1 (Territorial) emissions', 'TOTAL Scope 2 emissions',
           'TOTAL Scope 3 emissions', 'Transportation – Scope 1 (II.X.1)',
           'Transportation – Scope 2 (II.X.2)',
           'Transportation – Scope 3 (II.X.3)',
           'Waste: waste generated outside the city boundary – Scope 1 (III.X.3)',
           'Waste: waste generated within the city boundary – Scope 1 (III.X.1)',
           'Waste: waste generated within the city boundary – Scope 3 (III.X.2)',
           'Green Jobs']

citycompile[stdcolumns] = citycompile[stdcolumns].div(citycompile['Current population'],axis=0)

citycompile.drop(columns = ['Current population','Land Area (sq km)', 'Projected population'], inplace=True)

# imputing emission columns with mean
impmncols = ['Agriculture, Forestry and Land Use – Scope 1 (V)',
               'Industrial Processes and Product Use – Scope 1 (IV)',
               'Stationary Energy: energy generation supplied to the grid – Scope 1 (I.4.4)',
               'Stationary Energy: energy use – Scope 1 (I.X.1)',
               'Stationary Energy: energy use – Scope 2 (I.X.2)',
               'Stationary Energy: energy use – Scope 3 (I.X.3)',
               'TOTAL BASIC emissions', 'TOTAL BASIC+ emissions',
               'TOTAL Scope 1 (Territorial) emissions', 'TOTAL Scope 2 emissions',
               'TOTAL Scope 3 emissions', 'Transportation – Scope 1 (II.X.1)',
               'Transportation – Scope 2 (II.X.2)',
               'Transportation – Scope 3 (II.X.3)',
               'Waste: waste generated outside the city boundary – Scope 1 (III.X.3)',
               'Waste: waste generated within the city boundary – Scope 1 (III.X.1)',
               'Waste: waste generated within the city boundary – Scope 3 (III.X.2)',
               'Emission Reduction Target (%)',
               'All building types (CO2 Tonnes per capita)',
               'Commerical Buildings (CO2 Tonnes per capita)',
               'Municipal Buildings (CO2 Tonnes per capita)',
               'Residential Buildings (CO2 Tonnes per capita)',
               'Buses (including BRT)', 
               'Cycling', 
               'Ferries/ River boats', 
               'Other',
               'Private motorized transport', 
               'Rail/Metro/Tram',
               'Taxis or For Hire Vehicles', 
               'Walking',
               'Waste generation per capita (kg/person/year)'
             ]

citycompile[impmncols] = citycompile[impmncols].fillna(citycompile[impmncols].mean())

# imputing 0 for na values
imp0cols = ['Green Jobs',
            'New buildings (CO2 Tonnes per capita)',
            'Emission Reduction Achieved (%)',
            'Renewable Energy (%)'            
           ]
citycompile[imp0cols] = citycompile[imp0cols].fillna(0)

### Clustering Categorigal Features<a class="anchor" id="catclustering"></a>

Finding appropriate number of clusters.

In [None]:
# get categorical variables and try various number of clusters
citycatvars = citycompile.select_dtypes(include='object')
catcols = citycatvars.columns
cost = []
for num_clusters in list(range(1,6)):
    kmode = KModes(n_clusters=num_clusters, init = "Cao", n_init = 1)
    kmode.fit_predict(citycatvars)
    cost.append(kmode.cost_)

# plot error vs clusters
y = np.array([i for i in range(1,6,1)])
plt.plot(y,cost)
plt.title('Cost vs Number of Clusters')
plt.xlabel('Clusters')
plt.ylabel('Cost')

Observing the cost vs number of clusters plot, 3 seems to be a good number of clusters.

In [None]:
citycatvars = citycompile.select_dtypes(include='object')
catcols = citycatvars.columns

# define the k-modes model
km = KModes(n_clusters=3, 
            init='Cao', n_init=11)

# fit the clusters to the skills dataframe
clusters = km.fit_predict(citycatvars)

# get an array of cluster modes
kmodes = km.cluster_centroids_
shape = kmodes.shape

**Categorigcal Features Cluster Centroids**

In [None]:
pd.DataFrame(data=kmodes, columns=catcols)

Kmodes clustering showed that the most distinguishing features were **whether a Renewable Energy Target exists** and **whether a Risk and Vulnerability assessment was created**. **Business partnerships for sustainability projects** was the next most distinguishing feature. The other categorical features did not serve well to create distinct clusters from the rest of the data. 

The Kmodes clustering revealed there are 3 most common tiers of cities: <br>

-Cities who have Risk Vulnerability assessments but do not Renewable Energy Targets or Collaborate with Businesses.

-Cities who Intend to create Risk Vulnerability Assessments, Renewable Energy Targets and Business Partnerships.

-Cities who have Risk Vulnerability Assesments, have Renewable Energy Targets and Collaborate with Businesses on Sustainability.

These clusters can be thought to be a graded system in which cities are graded as being **limited, intermediate or advanced** in their progress of addressing environmental issues. Extending this idea, cities can even be scored by simply adding the number of steps taken. For example if a city has a renewable target and risk assessment, it would have covered 2 of the 7 steps (represented as columns in the above dataframe). Of course this is a rudimentary system, and needs more complexity to truly represent a cities readiness for the issues at hand. Another idea is adding more weight to certain steps. For example, establishing funds to invest in sustainability projects should hold more weight than creating a risk assessment. Such a system would be subject and it would be best to get expert opinion to assess how much weight experts would give to each step.

In [None]:
citycompile['Are Environmental, Social and Governance (ESG) issues incorporated into investment decisions of any of the city retirement funds?'].value_counts()

### Clustering Numerical Features<a class="anchor" id="numclustering"></a>

In [None]:
# scaling numerical features
citynumvars = citycompile.select_dtypes(exclude='object')
scaler = StandardScaler()
scaled_features = scaler.fit_transform(citynumvars)

In [None]:
kmeans_kwargs = {"init": "random",
                 "n_init": 10,
                 "max_iter": 300,
                 "random_state": 42,
                }
# finding appropriate number of clusters
sse = []
for k in range(1, 11):
    kmeans = KMeans(n_clusters=k, **kmeans_kwargs)
    kmeans.fit(scaled_features)
    sse.append(kmeans.inertia_)

# plot error vs number of clusters
plt.plot(range(1, 11), sse)
plt.title('SSE vs Number of Clusters')
plt.xticks(range(1, 11))
plt.xlabel("Number of Clusters")
plt.ylabel("SSE")
plt.show()

Observing the above plot, about 2 to 3 clusters seem to be a good number of clusters.

In [None]:
kmeans = KMeans(n_clusters=2, **kmeans_kwargs)
kmeans.fit(scaled_features)
sse.append(kmeans.inertia_)

###### Numerical Features Cluster Centers

In [None]:
pd.DataFrame(kmeans.cluster_centers_, columns=citynumvars.columns)

Kmeans clustering revealed there to be two common classes of cities. Cities with low emissions and cities with high emissions. The emmission data was divided by the city's base year population, therfore it is analagous to per capita emmissions. Emmissions data was the key distinguishing factor. Data on waste, renewable energy, and transportation did not play highly decisive roles in clustering. <br>
<br>
There was quite a bit of imputed data. 84 of 910 cities reported waste per capita, therefore it likely did not hold great weight in distinguishing cities (the rest of the 826 cities were imputed to have 36222kg/person/year waste). The number of green jobs had a similar amount of missing data with about 53 cities reporting the metric.<br>
<br>
Similar to clustering with categorical variables, these clusters can also be thought to be graded. The high emmissions cluster being limited in progress tier and low emissions cluster being the advanced tier.


In [None]:
unimpcitycomp['Waste generation per capita (kg/person/year)'].count()

In [None]:
unimpcitycomp['Waste generation per capita (kg/person/year)'].mean()

In [None]:
unimpcitycomp['Green Jobs'].count()

In [None]:
unimpcitycomp['Green Jobs'].mean()

## Conclusion<a class="anchor" id="conclusion"></a>

The data as it is provides valuable insight to the state of cities and their progress in the on going battle against environmental issues and social issues.

The Climate Hazards and Vulnerability and Adaptation sections ask key questions that provide a comprehensive view on pressing issues for cities and their strategies. From the data we found extreme temperatures and precipitation to be most common climate hazards on city radars. Children and elderly are most vulnerable to these specific hazards. Planting trees and flood mapping are the ubiquitous strategies used to address these issues.

Furthermore, access to basic services and budget restrictions are the most common factors that prevent cities from adapting and implementing their strategies.

These insights were derived from basic exploratory data analysis on the given data (simple counting and summation for the most part). These insights show us how cities in general are adapting and the most risk social demographic. However it does not measure an individual city's preparedness and appropriateness of its response. It also does not reveal if the city takes into account social issues.

Clustering analysis provided some insight to the question at hand. Clustering on numerical data showed whether a city relatively produced low emmissions or high emmissions. These clusters were analagous to grading cities as being limited or advanced in their readiness to combat climate change and reduce emmissions. Some metrics, such as the number of Green Jobs, could have added insight into a city taking advantage of oppurtunities posed by environmental issues and tackling social issues. However, as this metric is not a standard, most cities did not report the number of Green Jobs being created.

Clustering on categorical data showed there to be 3 most common classes of cities based on whether they created risk assessments, created renewable energy targets and partnered with businesses for sustainability. The business partnership question in particular could shed light on cities tackling social issues as these partnerships could increase employment. The clustering could act as a grading system to classify a city as being limited, intermediate or advanced in their ability to handle environmental and social issues alike. The scoring system can be simplified to simply summing 'Yes' answers to precisely designed questions. The summed result could be a KPI indicating the cities effectiveness against environmental and social issues.

It is important to keep in mind the clustering results are relative. Even if a city is classified into an advanced tier does not mean it can't improve. It means they have set the precedent for other cities to follow but should still continue pioneering until climate change is no longer a threat. This scoring system would also change as time passes and new technologies develop. Cities in the current advance tier could fall behind into lower tiers if they do not keep up with new technology.

Climate change is an impending catastrophe. This study showed that climate change is on the minds' of city administrations. Most cities have risk assessments to plan for the future and are already taking action. The plans being set into action are a promising foundation for a sustainable future for all. Continuing to improve the design of this KPI will help us stay on track by quantifying our progress towards that future.
