# <center>Do you have a climate plan ?<center>
<center><img src='https://images.unsplash.com/photo-1466611653911-95081537e5b7?ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&ixlib=rb-1.2.1&auto=format&fit=crop&w=1350&q=80' height=400 width=1000/><center>
    <span>Photo by <a href="https://unsplash.com/@karsten_wuerth">Karsten Würth</a> on <a href="https://unsplash.com/">Unsplash</a></span>

## <a id='toc'>Table of Contents</a>
1. [Introduction](#1)
2. [Building indicators](#2)<br>
    2.1 [Indicators for corporation](#2.1)<br>
    2.2 [Indicators for cities](#2.2)<br> 
3. [KPI validation](#3)<br>    
4. [Discussion](#4)<br>
    4.1 [Intersection between cities and corporations](#4.1)<br>
    4.2 [Intersection between environnemental and social issues](#4.2)<br> 
5. [Conclusion](#5)<br>

In [None]:
# Importation of the library
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib_venn import venn2
import plotly_express as px
import seaborn as sns
from sklearn.preprocessing import QuantileTransformer
from plotly.offline import init_notebook_mode;
init_notebook_mode(connected=True)

In [None]:
# Loading databases

# A database of the cities contained in the CDP data with corrected population and location
Cities_2020 = pd.read_csv("../input/cities-2020-for-cdp/Cities_2020.csv")
Cities_2020[['Account Number', 'Organization', 'City','Country', 'CDP Region', 'Population','City Location', 'longitude', 'latitude']]

# Responses to the survey by cities
Cities_Responses_2020 = pd.read_csv("../input/cdp-unlocking-climate-solutions/Cities/Cities Responses/2020_Full_Cities_Dataset.csv")
Cities_Responses_2020[['Account Number','Organization', 'Country', 'CDP Region','Question Number', 'Column Number','Row Number', 'Response Answer', 'Comments']]

# Corporations that responded to the Climate Survey
Corporation_climate_2020 = pd.read_csv('../input/cdp-unlocking-climate-solutions/Corporations/Corporations Disclosing/Climate Change/2020_Corporates_Disclosing_to_CDP_Climate_Change.csv')
Corporation_climate_2020 = Corporation_climate_2020[['account_number', 'organization', 'country']]

# Corporations that responded to the Water Survey
Corporation_water_2020 = pd.read_csv('../input/cdp-unlocking-climate-solutions/Corporations/Corporations Disclosing/Water Security/2020_Corporates_Disclosing_to_CDP_Water_Security.csv')
Corporation_water_2020 = Corporation_water_2020[['account_number', 'organization', 'country']]

# Corporate Responses to the Climate Survey
Corporation_Responses_Climate_2020 = pd.read_csv("../input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses/Climate Change/2020_Full_Climate_Change_Dataset.csv", low_memory=False)
Corporation_Responses_Climate_2020 = Corporation_Responses_Climate_2020[['account_number', 'organization', 'question_number','column_number', 'page_name','row_number','response_value', 'comments']]

# Corporate Responses to the Water Survey
Corporation_Responses_Water_2020 = pd.read_csv("../input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses/Water Security/2020_Full_Water_Security_Dataset.csv", low_memory=False)
Corporation_Responses_Water_2020 = Corporation_Responses_Water_2020[['account_number', 'organization', 'question_number','column_number', 'page_name','row_number','response_value', 'comments']]

# Vulnerability indicators from World Bank
Vulnerability = pd.read_csv("../input/environmentequity-starterpack/Environment-Equity Datasets/Social Equity/Vulnerability/Climate Vulnerability and Readiness 2015.csv")
Vulnerability.columns = ['code_3digit', 'Population living below the national income poverty line','Climate Risk Score', 'Vulnerability score', 'Readiness score','Link to climate risk and adaptation profile', 'Droughts', 'Storms','Epidemics', 'Landslides', 'Floods', 'Wildfires']

# Country iso code
country_code = pd.read_csv("../input/country-code/country_code.csv")

In [None]:
# Functions used to transform categorial data into numerical data
def engagement_water(x):
    total = 0
    try :
        if ('Yes, our suppliers') in x :
            total = total + 2
        if ("Yes, our customers or other value chain partners") in x:
            total = total + 2
        if ("No, not currently but we intend to within the next two years") in x :
            total = total + 1
        if x == 'No, we do not engage with our value chain on water' or x== 'No Response':
            total = 0
        return total 
    except :
        return 0

def W14a_proportion(x):
    try :
        if x == 'None and we do not plan to request this from suppliers' or x == 'Unknown':
            return 0
        if x == 'None currently, but we plan to request this within the next two years':
            return 1
        if x == 'Less than 1%':
            return 2
        if x == '1-25':
            return 3
        if x == '26-50':
            return 4
        if x == '51-75':
            return 5
        if x == '76-100':
            return 6
        else:
            return 0
    except :
        return 0
    
def engagement_climate(x):
    total = 0
    try :
        if ('Yes, our suppliers') in x :
            total = total + 1
        if ("Yes, our customers") in x:
            total = total + 1
        if ("Yes, our investee companies [Financial services only]") in x :
            total = total + 1
        if ("Yes, other partners in the value chain") in x:
            total = total + 1 
        if x == 'No, we do not engage' or x== 'No Response':
            total = 0
        return total 
    except :
        return 0    
    
def convert(x):
    try :
        if x == 'Yes':
            return 3
        if x == 'In progress':
            return 2
        if x == 'Intending to incorporate in the next 2 years' or x == 'Intending to undertake in the next 2 years':
            return 1
        if x == 'Not intending to incorporate' or x == 'Not intending to undertake':
            return 0
        if x == 'Do not know':
            return 0
        else:
            return 0   
    except :
        return x
def convert_water(x):
    try :
        if x == 'Yes':
            return 3
        if x == 'In progress':
            return 2
        if x == 'Not intending to incorporate' or x == 'Not intending to undertake':
            return 1
        if x == 'Do not know':
            return 0
        else:
            return 0   
    except :
        return x
def convert_transport(x):
    try :
        if x == 'Yes':
            return 1
        if x == 'No':
            return 0
        if x == 'Do not know':
            return 0
        else:
            return 0   
    except :
        return x

def normalize(df):
    return (df - df.min()) / (df.max() - df.min())

scaler = QuantileTransformer(n_quantiles=20)

# <a id='1'>1. Introduction</a>

The [CDP](https://www.cdp.net/en) is a non-profit, charitable organization that helps corporations and cities to take action in response to the changing climate.
Each year, it brings together a set of data on the environmental measurements of thousands of cities and corporations.
Through the [CDP: Unlocking Climate Solutions competition](https://www.kaggle.com/c/cdp-unlocking-climate-solutions), the Kaggle community is asked to develop a methodology for calculating Key Performance Indicators (KPIs) related to environmental and social issues.

In this notebook, we will attempt to develop a KPI to assess the readiness of cities and corporation to deal with environmental risks and social issues.

Are cities and corporations aware of future environmental and social issues? Are they well prepared to deal with them?

# <a id='2'>2. Building indicators</a>
All performance indicators will be normalized so that they can be compared with each other.

We introduce a first indicator in order to see what proportion of the survey is completed.
For all cities and corporations, the communication or transparency indicator is calculated as follows with i the citie or the corporation and j the survey.

$$KPI_{Communication_{(i,j)}} = 1 - \dfrac{\sum Nan Values_{(i,j)}}{\sum rows_{(i,j)}}$$

The communication indicator takes into account the uncompleted lines out of the total lines that the city or corporation has access to in the survey.

In [None]:
# KPI_Communication_and_transparency for cities

Cities_Transparency=Cities_Responses_2020[["Account Number", "Response Answer"]]
Cities_Transparency['Transparency'] = Cities_Transparency.isnull().sum(axis = 1)
Cities_Transparency['N_row'] = 1
Cities_Transparency = Cities_Transparency.groupby('Account Number').sum()
Cities_Transparency['KPI_Communication_and_transparency_cities'] = 1 - Cities_Transparency["Transparency"] / Cities_Transparency['N_row']
Cities_Transparency=Cities_Transparency[['KPI_Communication_and_transparency_cities']]
Cities_2020 = pd.merge(Cities_2020, Cities_Transparency, how = 'outer', on = ['Account Number'])

# KPI_Communication_and_transparency for corporation on climat
Corporation_Transparency_climate=Corporation_Responses_Climate_2020[["account_number", "response_value"]]
Corporation_Transparency_climate['Transparency'] = Corporation_Transparency_climate.isnull().sum(axis = 1)
Corporation_Transparency_climate['N_row'] = 1
Corporation_Transparency_climate = Corporation_Transparency_climate.groupby('account_number').sum()
Corporation_Transparency_climate['KPI_Communication_and_transparency_climate'] = 1 - Corporation_Transparency_climate["Transparency"] / Corporation_Transparency_climate['N_row']
Corporation_Transparency_climate=Corporation_Transparency_climate[['KPI_Communication_and_transparency_climate']]
Corporation_climate_2020 = pd.merge(Corporation_climate_2020, Corporation_Transparency_climate, how = 'outer', on = ['account_number'])


# KPI_Communication_and_transparency for corporation on climat
Corporation_Transparency_water=Corporation_Responses_Water_2020[["account_number", "response_value"]]
Corporation_Transparency_water['Transparency'] = Corporation_Transparency_water.isnull().sum(axis = 1)
Corporation_Transparency_water['N_row'] = 1
Corporation_Transparency_water = Corporation_Transparency_water.groupby('account_number').sum()
Corporation_Transparency_water['KPI_Communication_and_transparency_water'] = 1 - Corporation_Transparency_water["Transparency"] / Corporation_Transparency_water['N_row']
Corporation_Transparency_water=Corporation_Transparency_water[['KPI_Communication_and_transparency_water']]
Corporation_water_2020 = pd.merge(Corporation_water_2020, Corporation_Transparency_water, how = 'outer', on = ['account_number'])


The distribution of the indicator is as follows. The questionnaire is generally well completed for all cities. There are more disparities for corporations

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(18,6))
fig.suptitle('Communication and transparency of cities and corporations')

sns.violinplot(ax=axes[0], y=Cities_2020['KPI_Communication_and_transparency_cities'], color = '#17b978')
axes[0].set_title('Cities')
axes[0].set_ylabel("Communication and Transparency")

sns.violinplot(ax=axes[1], y=Corporation_climate_2020['KPI_Communication_and_transparency_climate'], color = '#ff8264')
axes[1].set_title('Corporation for climate')
axes[1].set_ylabel("Communication and Transparency")


sns.violinplot(ax=axes[2], y=Corporation_water_2020['KPI_Communication_and_transparency_water'], color = '#53AFFF')
axes[2].set_title('Corporation for water')
axes[2].set_ylabel("Communication and Transparency")

Show = False

# <a id='2.1'>2.1 Indicators for corporation</a>

985 corporations responded to the climate survey while 295 responded to the water survey.<br>
<br>
What proportion of corporations responded to both surveys?

In [None]:
Concat_df = pd.concat([Corporation_climate_2020, Corporation_water_2020])
Group_climate = Corporation_climate_2020.shape[0]
Group_water =  Corporation_water_2020.shape[0]
Group_climate_and_water = Group_climate + Group_water - len(Concat_df['account_number'].unique())
del Concat_df
venn2(subsets = (Group_climate,Group_water,Group_climate_and_water), set_labels = ('Climate', 'Water'))
plt.title("Number of corporations that completed the survey")

plt.show()

279 corporations completed both questionnaires

# Indicators on climate
To build an indicator with the climate responses given by corporations, we are interested in the following question :

(C12.1) Do you engage with your value chain on climate-related issues?

Several options can be selected among :
* Yes, our suppliers
* Yes, our customers
* Yes, our investee companies [Financial services only]
* Yes, other partners in the value chain
* No, we do not engage

The indicator value is incremented by 1 for each option selected. Except for the answer "No, we do not engage", which is worth 0.

$$KPI_{C12.1} = \sum Options $$

We notice that several answers contain the words "plan","engagement" or "education" in the choice of their options.
For each of these three words, an indicator is constructed based on the number of occurrences of these words in the answers to the questionnaire.

$$KPI_{plan} = \sum Occurence("plan") $$
$$KPI_{education} = \sum Occurence("education") $$
$$KPI_{engagement} = \sum Occurence("engagement") $$

An overall indicator is constructed as the weighted sum of these three indicators.
$$KPI_{Words} = \dfrac{KPI_{plan} + KPI_{education} + KPI_{engagement}}{3}$$

The overall climate performance indicator for corporations is constructed as follows :
$$KPI_{climate} = \dfrac{KPI_{Communication_{(i,climate)}} + KPI_{C12.1} + KPI_{Words}}{3}$$

In [None]:
C121 = Corporation_Responses_Climate_2020[Corporation_Responses_Climate_2020['question_number'] == 'C12.1']
C121 = C121[['account_number','response_value']]
C121['response_value'] = C121['response_value'].apply(engagement_climate)
C121.columns = ['account_number', 'KPI C12.1']
Corporation_climate_2020 = Corporation_climate_2020.merge(C121, on = ['account_number'])
Corporation_climate_2020['KPI C12.1'] = normalize(Corporation_climate_2020['KPI C12.1'])

engagement_climate = Corporation_Responses_Climate_2020[Corporation_Responses_Climate_2020["response_value"].str.contains('engagement') == True]
df_engagement_climate = pd.DataFrame(engagement_climate.groupby('account_number').size())
df_engagement_climate.columns=['KPI engagement']
Corporation_climate_2020 = pd.merge(Corporation_climate_2020, df_engagement_climate, how = 'outer', on = ['account_number'])
Corporation_climate_2020['KPI engagement'] = Corporation_climate_2020['KPI engagement'].fillna(0)
Corporation_climate_2020['KPI engagement'] = scaler.fit_transform(Corporation_climate_2020['KPI engagement'].values.reshape(-1,1))

plan_climate = Corporation_Responses_Climate_2020[Corporation_Responses_Climate_2020["response_value"].str.contains('plan') == True]
df_plan_climate = pd.DataFrame(plan_climate.groupby('account_number').size())
df_plan_climate.columns=['KPI plan']
Corporation_climate_2020 = pd.merge(Corporation_climate_2020, df_plan_climate, how = 'outer', on = ['account_number'])
Corporation_climate_2020['KPI plan'] = Corporation_climate_2020['KPI plan'].fillna(0)
Corporation_climate_2020['KPI plan'] = scaler.fit_transform(Corporation_climate_2020['KPI plan'].values.reshape(-1,1))

education_climate = Corporation_Responses_Climate_2020[Corporation_Responses_Climate_2020["response_value"].str.contains('education') == True]
df_education_climate = pd.DataFrame(education_climate.groupby('account_number').size())
df_education_climate.columns=['KPI education']
Corporation_climate_2020 = pd.merge(Corporation_climate_2020, df_education_climate, how = 'outer', on = ['account_number'])
Corporation_climate_2020['KPI education'] = Corporation_climate_2020['KPI education'].fillna(0)
Corporation_climate_2020['KPI education'] = scaler.fit_transform(Corporation_climate_2020['KPI education'].values.reshape(-1,1))

Corporation_climate_2020['KPI Words'] = (Corporation_climate_2020['KPI engagement'] + Corporation_climate_2020['KPI plan'] + Corporation_climate_2020['KPI education'])/3
Corporation_climate_2020['KPI climate'] = (Corporation_climate_2020['KPI_Communication_and_transparency_climate'] + Corporation_climate_2020['KPI C12.1'] + Corporation_climate_2020['KPI Words'])/3

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(18,6))
fig.suptitle('KPI for corporation on climate')

sns.violinplot(ax=axes[0], y=Corporation_climate_2020['KPI C12.1'], color = '#f8b500')

sns.violinplot(ax=axes[1], y=Corporation_climate_2020['KPI Words'], color = '#17b978')

sns.violinplot(ax=axes[2], y=Corporation_climate_2020['KPI climate'], color = '#ff8264')

Show = False

# Indicators on water security
For the water survey, KPIs are created in the same way with the following questions :
* (W1.4) Do you engage with your value chain on water-related issues?
* (W1.4a) What proportion of suppliers do you request to report on their water use, risks and/or management information and what proportion of your procurement spend does this represent?

As well as the performance indicator calculated on the number of occurrences of the words "plan" and "engagement" and "education".

By transforming categorical data into numerical data, the following distributions for KPIs are obtained :

In [None]:
W14 = Corporation_Responses_Water_2020[Corporation_Responses_Water_2020['question_number'] == 'W1.4']
W14 = W14[['account_number','response_value']]
W14['response_value'] = W14['response_value'].apply(engagement_water)
W14.columns = ['account_number', 'W1.4']
Corporation_water_2020 = Corporation_water_2020.merge(W14, on = ['account_number'])
Corporation_water_2020['W1.4'] = normalize(Corporation_water_2020['W1.4'])

W14a = Corporation_Responses_Water_2020[Corporation_Responses_Water_2020['question_number'] == 'W1.4a']
W14a1 = W14a[W14a['column_number'] == 1.0]
W14a1 = W14a1[['account_number','response_value']]
W14a1['response_value'] = W14a1['response_value'].apply(W14a_proportion)
W14a1.columns = ['account_number', '% of suppliers']
Corporation_water_2020 = Corporation_water_2020.merge(W14a1, on = ['account_number'])
Corporation_water_2020['% of suppliers'] = normalize(Corporation_water_2020['% of suppliers'])

W14a2 = W14a[W14a['column_number'] == 2.0]
W14a2 = W14a2[['account_number','response_value']]
W14a2['response_value'] = W14a2['response_value'].apply(W14a_proportion)
W14a2.columns = ['account_number', '% of total procurement spend']
Corporation_water_2020 = Corporation_water_2020.merge(W14a2, on = ['account_number'])
Corporation_water_2020['% of total procurement spend'] = normalize(Corporation_water_2020['% of total procurement spend'])

engagement_water = Corporation_Responses_Water_2020[Corporation_Responses_Water_2020["response_value"].str.contains('engagement') == True]
df_engagement_water = pd.DataFrame(engagement_water.groupby('account_number').size())
df_engagement_water.columns=['KPI engagement']
Corporation_water_2020 = pd.merge(Corporation_water_2020, df_engagement_water, how = 'outer', on = ['account_number'])
Corporation_water_2020['KPI engagement'] = Corporation_water_2020['KPI engagement'].fillna(0)
Corporation_water_2020['KPI engagement'] = scaler.fit_transform(Corporation_water_2020['KPI engagement'].values.reshape(-1,1))

plan_water = Corporation_Responses_Water_2020[Corporation_Responses_Water_2020["response_value"].str.contains('plan') == True]
df_plan_water = pd.DataFrame(plan_water.groupby('account_number').size())
df_plan_water.columns=['KPI plan']
Corporation_water_2020 = pd.merge(Corporation_water_2020, df_plan_water, how = 'outer', on = ['account_number'])
Corporation_water_2020['KPI plan'] = Corporation_water_2020['KPI plan'].fillna(0)
Corporation_water_2020['KPI plan'] = scaler.fit_transform(Corporation_water_2020['KPI plan'].values.reshape(-1,1))

education_water = Corporation_Responses_Water_2020[Corporation_Responses_Water_2020["response_value"].str.contains('education') == True]
df_education_water = pd.DataFrame(education_water.groupby('account_number').size())
df_education_water.columns=['KPI education']
Corporation_water_2020 = pd.merge(Corporation_water_2020, df_education_water, how = 'outer', on = ['account_number'])
Corporation_water_2020['KPI education'] = Corporation_water_2020['KPI education'].fillna(0)
Corporation_water_2020['KPI education'] = scaler.fit_transform(Corporation_water_2020['KPI education'].values.reshape(-1,1))


Corporation_water_2020['KPI W1.4, W1.4a'] = (Corporation_water_2020['W1.4'] + Corporation_water_2020['% of suppliers'] + Corporation_water_2020['% of total procurement spend'])/3
Corporation_water_2020['KPI Words'] = (Corporation_water_2020['KPI engagement'] + Corporation_water_2020['KPI education'] + Corporation_water_2020['KPI plan'])/3
Corporation_water_2020['KPI Water'] = (Corporation_water_2020['KPI_Communication_and_transparency_water'] + Corporation_water_2020['KPI W1.4, W1.4a'] + Corporation_water_2020['KPI Words'])/3


fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(24,6))
fig.suptitle('KPI for corporation on water')

sns.violinplot(ax=axes[0], y=Corporation_water_2020['KPI W1.4, W1.4a'], color = '#f8b500')
axes[0].set_ylabel("KPI W1.4, W1.4a")

sns.violinplot(ax=axes[1], y=Corporation_water_2020['KPI Words'], color = '#17b978')
axes[1].set_ylabel("KPI Words")

sns.violinplot(ax=axes[2], y=Corporation_water_2020['KPI Water'], color = '#53AFFF')
axes[2].set_ylabel("KPI Water")

Show = False

We then build a database with a climate and water indicator per country corresponding to the average of the indicators of the corporations that are established in the country. 

In [None]:
Country_list = Cities_2020['Country'].unique()
Country_water = pd.DataFrame()
df_country_water = Corporation_Responses_Water_2020[Corporation_Responses_Water_2020['page_name'] == 'W0.3']
df_country_water = Corporation_water_2020.merge(df_country_water, on = ['account_number'])
for Country in Country_list:
    df_country = df_country_water[df_country_water["response_value"].str.contains(Country) == True]
    df_country['Country']=Country
    df_country=df_country[['KPI Water', 'Country' ]]
    Country_water = pd.concat([Country_water, df_country])

Country_climate = pd.DataFrame()
df_country_climate = Corporation_Responses_Climate_2020[Corporation_Responses_Climate_2020['page_name'] == 'C0.3']
df_country_climate = Corporation_climate_2020.merge(df_country_climate, on = ['account_number'])
for Country in Country_list:
    df_country = df_country_climate[df_country_climate["response_value"].str.contains(Country) == True]
    df_country['Country']=Country
    df_country=df_country[['KPI climate', 'Country' ]]
    Country_climate = pd.concat([Country_climate, df_country])
    
Country_water=Country_water.groupby('Country').mean()
Country_climate=Country_climate.groupby('Country').mean()

Country_CDP_2020 = pd.merge(Country_water, Country_climate, how = 'outer', on = ['Country'])
Country_CDP_2020 = Country_CDP_2020.fillna(0)

# <a id='2.2'>2.2 Indicators for cities</a>

For the questionnaire on cities, one indicator per section is constructed.

# Governance

The survey provides the following questions :

**(1.0) Does your city incorporate sustainability goals and targets (e.g. GHG reductions) into the master planning for the city?**

A performance indicator $KPI_{1.0}$ is constructed by transforming categorical data into numerical data. 
* Yes = > 3
* In progress = > 2
* Intending to incorporate in the next 2 years = > 1
* Not intending to incorporate = > 0
* Do not know = > 0

**(1.0a) Please detail which goals and targets are incorporated in your city’s master plan and describe how these goals are addressed in the table below.**

The answers to this question are a selection of targets. This is why an indicator $KPI_{1.0a}$ is constructed by counting the number of answered options.

An overall indicator for governance is then defined :
$$KPI_{Governance} = \dfrac{KPI_{1.0} + KPI_{1.0a}}{2}$$

The same methodology is used for the other questions. If it's a question with unique choices, we turn it into numerical data. If it is a question with multiple answers, the number of selected answers is counted. When there are several questions per section, the weighted sum of each indicator is calculated.

* In the section Climate :
> (2.0) Has a climate change risk or vulnerability assessment been undertaken for your city?
* In the section Adaptation :
> (3.2) Does your city council, or similar authority, have a published plan that addresses climate change adaptation?
* In the section City-wide Emissions:
> (4.0) Does your city have a city-wide emissions inventory to report?<br>
> (4.9) Does your city have a consumption-based inventory to measure emissions from consumption of goods and services by your residents?<br>
> (4.12) Has the city-wide GHG emissions data you are currently reporting been externally verified or audited in part or in whole?
* In the section Emissions Reduction :
> (5.5) Does your city have a climate change mitigation or energy access plan for reducing city-wide GHG emissions?
* In the section Opportunities:
> (6.0) Please indicate the opportunities your city has identified as a result of addressing climate change and describe how the city is positioning itself to take advantage of these opportunities.<br>
> (6.2) Does your city collaborate in partnership with businesses in your city on sustainability projects?
* In the section Energy :
> (8.0) Does your city have a renewable energy or electricity target?<br>
> (8.5) Does your city have a target to increase energy efficiency?
* In the section Water Security
> (14.4) Does your city have a publicly available Water Resource Management strategy?

In [None]:
Q10 = Cities_Responses_2020[Cities_Responses_2020['Question Number'] == '1.0']
Q10 = Q10[['Account Number','Response Answer']]
Q10['Response Answer'] = Q10['Response Answer'].apply(convert)
Q10.columns = ['Account Number', 'KPI 1.0']
Cities_2020 = Cities_2020.merge(Q10, on = ['Account Number'])
Cities_2020['KPI 1.0'] = normalize(Cities_2020['KPI 1.0'])

Q10a = Cities_Responses_2020[Cities_Responses_2020['Question Number'] == '1.0a']
Q10a = Q10a[['Account Number','Response Answer']]
Q10a.columns = ['Account Number', 'KPI 1.0a']
Q10a = Q10a.groupby(['Account Number']).count()
Cities_2020 = Cities_2020.merge(Q10a, on = ['Account Number'])
scaler = QuantileTransformer(n_quantiles=20)
Cities_2020['KPI 1.0a'] = scaler.fit_transform(Cities_2020['KPI 1.0a'].values.reshape(-1,1))

Cities_2020['KPI Governance'] = 0.5*Cities_2020['KPI 1.0'] + 0.5*Cities_2020['KPI 1.0a']

Q20 = Cities_Responses_2020[Cities_Responses_2020['Question Number'] == '2.0']
Q20 = Q20[['Account Number','Response Answer']]
Q20['Response Answer'] = Q20['Response Answer'].apply(convert)
Q20.columns = ['Account Number', 'KPI Climate Hazards and Vulnerability']
Cities_2020 = Cities_2020.merge(Q20, on = ['Account Number'])
Cities_2020['KPI Climate Hazards and Vulnerability'] = normalize(Cities_2020['KPI Climate Hazards and Vulnerability'])

Q32 = Cities_Responses_2020[Cities_Responses_2020['Question Number'] == '3.2']
Q32 = Q32[['Account Number','Response Answer']]
Q32['Response Answer'] = Q32['Response Answer'].apply(convert)
Q32.columns = ['Account Number', 'KPI Adaptation']
Cities_2020 = Cities_2020.merge(Q32, on = ['Account Number'])
Cities_2020['KPI Adaptation'] = normalize(Cities_2020['KPI Adaptation'])

Q40 = Cities_Responses_2020[Cities_Responses_2020['Question Number'] == '4.0']
Q40 = Q40[['Account Number','Response Answer']]
Q40['Response Answer'] = Q40['Response Answer'].apply(convert)
Q40.columns = ['Account Number', 'KPI 4.0']
Cities_2020 = Cities_2020.merge(Q40, on = ['Account Number'])
Cities_2020['KPI 4.0'] = normalize(Cities_2020['KPI 4.0'])

Q49 = Cities_Responses_2020[Cities_Responses_2020['Question Number'] == '4.9']
Q49 = Q49[Q49['Column Number'] == 1]
Q49 = Q49[['Account Number','Response Answer']]
Q49['Response Answer'] = Q49['Response Answer'].apply(convert)
Q49.columns = ['Account Number', 'KPI 4.9']
Cities_2020 = Cities_2020.merge(Q49, on = ['Account Number'])
Cities_2020['KPI 4.9'] = normalize(Cities_2020['KPI 4.9'])

Q412 = Cities_Responses_2020[Cities_Responses_2020['Question Number'] == '4.12']
Q412 = Q412[['Account Number','Response Answer']]
Q412['Response Answer'] = Q412['Response Answer'].apply(convert)
Q412.columns = ['Account Number', 'KPI 4.12']
Cities_2020 = Cities_2020.merge(Q412, on = ['Account Number'])
Cities_2020['KPI 4.12'] = normalize(Cities_2020['KPI 4.12'])

Cities_2020['KPI City-wide Emissions'] = (Cities_2020['KPI 4.0'] + Cities_2020['KPI 4.9'] + Cities_2020['KPI 4.12'])/3

Q55 = Cities_Responses_2020[Cities_Responses_2020['Question Number'] == '5.5']
Q55 = Q55[['Account Number','Response Answer']]
Q55['Response Answer'] = Q55['Response Answer'].apply(convert)
Q55.columns = ['Account Number', 'KPI Emissions Reduction']
Cities_2020 = Cities_2020.merge(Q55, on = ['Account Number'])
Cities_2020['KPI Emissions Reduction'] = normalize(Cities_2020['KPI Emissions Reduction'])

Q60 = Cities_Responses_2020[Cities_Responses_2020['Question Number'] == '6.0']
Q60 = Q60[['Account Number','Response Answer']]
Q60.columns = ['Account Number', 'KPI 6.0']
Q60 = Q60.groupby(['Account Number']).count()
Cities_2020 = Cities_2020.merge(Q60, on = ['Account Number'])
Cities_2020['KPI 6.0'] = scaler.fit_transform(Cities_2020['KPI 6.0'].values.reshape(-1,1))

Q62 = Cities_Responses_2020[Cities_Responses_2020['Question Number'] == '6.2']
Q62 = Q62[['Account Number','Response Answer']]
Q62['Response Answer'] = Q62['Response Answer'].apply(convert)
Q62.columns = ['Account Number', 'KPI 6.2']
Cities_2020 = Cities_2020.merge(Q62, on = ['Account Number'])
Cities_2020['KPI 6.2'] = normalize(Cities_2020['KPI 6.2'])

Cities_2020['KPI Opportunities'] = (Cities_2020['KPI 6.0'] + Cities_2020['KPI 6.2'])/2

Q80 = Cities_Responses_2020[Cities_Responses_2020['Question Number'] == '8.0']
Q80 = Q80[['Account Number','Response Answer']]
Q80['Response Answer'] = Q80['Response Answer'].apply(convert)
Q80.columns = ['Account Number', 'KPI 8.0']
Cities_2020 = Cities_2020.merge(Q80, on = ['Account Number'])
Cities_2020['KPI 8.0'] = normalize(Cities_2020['KPI 8.0'])

Q85 = Cities_Responses_2020[Cities_Responses_2020['Question Number'] == '8.5']
Q85 = Q85[['Account Number','Response Answer']]
Q85['Response Answer'] = Q85['Response Answer'].apply(convert)
Q85.columns = ['Account Number', 'KPI 8.5']
Cities_2020 = Cities_2020.merge(Q85, on = ['Account Number'])
Cities_2020['KPI 8.5'] = normalize(Cities_2020['KPI 8.5'])

Cities_2020['KPI Energy'] = (Cities_2020['KPI 8.0'] + Cities_2020['KPI 8.5'])/2

Q144 = Cities_Responses_2020[Cities_Responses_2020['Question Number'] == '14.4']
Q144 = Q144[['Account Number','Response Answer']]
Q144['Response Answer'] = Q144['Response Answer'].apply(convert_water)
Q144.columns = ['Account Number', 'KPI Water Security']
Cities_2020 = Cities_2020.merge(Q144, on = ['Account Number'])
Cities_2020['KPI Water Security'] = normalize(Cities_2020['KPI Water Security'])

Cities_2020['KPI City']= Cities_2020['KPI Governance']+Cities_2020['KPI Climate Hazards and Vulnerability']+Cities_2020['KPI Adaptation']+Cities_2020['KPI Adaptation']+Cities_2020['KPI Emissions Reduction']+Cities_2020['KPI Opportunities']+Cities_2020['KPI Energy']+Cities_2020['KPI Water Security']
Cities_2020['KPI City'] = scaler.fit_transform(Cities_2020['KPI City'].values.reshape(-1,1))

fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(24,6))
fig.suptitle('KPI for cities')

sns.violinplot(ax=axes[0], y=Cities_2020['KPI Governance'], color = '#229954')
sns.violinplot(ax=axes[1], y=Cities_2020['KPI Climate Hazards and Vulnerability'], color = '#27AE60')
sns.violinplot(ax=axes[2], y=Cities_2020['KPI Adaptation'], color = '#52BE80')
sns.violinplot(ax=axes[3], y=Cities_2020['KPI Adaptation'], color = '#7DCEA0')

fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(24,6))

sns.violinplot(ax=axes[0], y=Cities_2020['KPI Emissions Reduction'], color = '#2E86C1')
sns.violinplot(ax=axes[1], y=Cities_2020['KPI Opportunities'], color = '#3498DB')
sns.violinplot(ax=axes[2], y=Cities_2020['KPI Energy'], color = '#5DADE2')
sns.violinplot(ax=axes[3], y=Cities_2020['KPI Water Security'], color = '#85C1E9')

fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(4,4))
sns.violinplot(y=Cities_2020['KPI City'], color = '#F39C12')

Show = False

In [None]:
for i in range(0,Cities_2020.shape[0]):
    Cities_2020['KPI City'][i] = str(round(Cities_2020['KPI City'][i], 2)) 

The figure below shows the scores of the different CDP cities around the world.

In [None]:
Fig_Cities = px.scatter_geo(Cities_2020,lat = 'latitude',lon= 'longitude',size = 'Population',width=1200,height = 1000, color='KPI City', hover_name="City",projection="natural earth",
                hover_data = {'longitude':False,'latitude':False,'Country':True,'Population':True,'CDP Region':True,'KPI City':True},
                labels = {'Risks':'Status of action'})
Fig_Cities.update_layout(
    title={
        'text': "Key Performance Indicator of CDP cities",
        'x':0.5},
    font=dict(
        family="Arial",
        size=12,
        color= "black"),
    autosize=False,
    width=800,
    height=400,
)
Fig_Cities.show()

# <a id='3'>3. KPI validation</a>

In order to evaluate the performance of the created KPI, we use a database from World Bank [Compiled datasets for CDP Analytics Competition](https://www.kaggle.com/seraphimstreets/environmentequity-starterpack) which has been collected by [seraphimstreets](https://www.kaggle.com/seraphimstreets).

The indicators we are particularly interested in are vulnerability indicators :
* Population living below the national income poverty line
* Climate Risk Score
* Vulnerability score
* Readiness score

We create a correlation matrix between our main indicators and the indicators from World Bank.

In [None]:
Country_CDP_2020_ = Country_CDP_2020.copy()

Country_2020 = Cities_2020.groupby('Country').mean()
Country_CDP_2020 = Country_CDP_2020.reset_index()
Country_2020 = Country_2020.reset_index()
KPI_col = ['Country','KPI_Communication_and_transparency_cities', 'KPI 1.0', 'KPI 1.0a','KPI Governance', 'KPI Climate Hazards and Vulnerability','KPI Adaptation', 'KPI 4.0', 'KPI 4.9', 'KPI 4.12','KPI City-wide Emissions', 'KPI Emissions Reduction', 'KPI 6.0','KPI 6.2', 'KPI Opportunities', 'KPI 8.0', 'KPI 8.5', 'KPI Energy','KPI Water Security', 'KPI City']
Country_CDP_2020 = Country_CDP_2020.merge(Country_2020[KPI_col], on = ['Country'])

country_map = {'United States of America':'United States',
               'United Kingdom of Great Britain and Northern Ireland': 'United Kingdom',
               'Gibraltar': None, 
               'Viet Nam': 'Vietnam', 
               'Taiwan, Greater China': 'Taiwan',
               'China, Hong Kong Special Administrative Region': None,
               'Republic of Korea': 'South Korea', 
               'United Republic of Tanzania': 'Tanzania',
               'Democratic Republic of the Congo': 'Congo, Dem. rep',
               "Côte d'Ivoire": "Cote d'Ivoire",
               'Bolivia (Plurinational State of)': 'Bolivia',
               'State of Palestine': 'Palestine'
              }
Country_CDP_2020['country_mapped'] = Country_CDP_2020['Country'].map(country_map)
Country_CDP_2020.loc[Country_CDP_2020['country_mapped'].isna(), 'country_mapped'] = Country_CDP_2020['Country']

Vulnerability = pd.merge(Vulnerability[['code_3digit','Population living below the national income poverty line','Climate Risk Score', 'Vulnerability score', 'Readiness score']], country_code[['Country_name', 'code_3digit']])
Vulnerability.columns = ['code_3digit','Population living below the national income poverty line','Climate Risk Score', 'Vulnerability score', 'Readiness score','Country']

Vulnerability_corr = pd.merge(Vulnerability, Country_CDP_2020)

fig, ax = plt.subplots(figsize=(12, 12))
eval_col = ['KPI City', 'KPI climate','KPI Water','Population living below the national income poverty line','Climate Risk Score', 'Vulnerability score', 'Readiness score']
sns.heatmap(Vulnerability_corr[eval_col].corr(), annot=True)
plt.show()

Significant correlations are found between :
* $KPI_{City}$ and Readiness score
* $KPI_{climate}$ and Vulnerability score
* $KPI_{climate}$ and Population living below the national income poverty line

Less signficant correlations are between :
* $KPI_{City}$ and Climate Risk score
* $KPI_{climate}$ and Climate Risk score
* $KPI_{Water}$ and Climate Risk score
* $KPI_{Water}$ and Vulnerability score

This corresponds to the expectations we have. Cities are the most prepared for future challenges while corporations are more involved in climate risks. Corporations also appear to be well aware of social issues in the countries where they operate according to the correlation with the Population living below the national income poverty line indicator.


# <a id='4'>4. Discussion</a>
# <a id='4.1'>4.1 Intersection between cities and corporations</a>

In [None]:
df = pd.merge(Country_CDP_2020_, Cities_2020, how = 'outer', on = ['Country'])
df.sort_values(by = 'CDP Region', inplace = True)
df['CDP Region']=pd.Categorical(df['CDP Region'])
 
# create data
x = df['KPI City']
y = (df['KPI climate'] + df['KPI Water'])/2
z = df['Population']/5000

# plot
plt.rcParams["legend.title_fontsize"] = "large"
plt.rcParams["font.size"] = 20
plt.rcParams["font.sans-serif"] = 'Arial'

plt.figure(figsize=(22, 10))
plt.scatter(x, y, s=z, c=df['CDP Region'].cat.codes, cmap=plt.cm.get_cmap('RdYlBu', 8), alpha=0.6, edgecolors="grey", linewidth=2)

# Add titles (main and on axis)
plt.xlabel("KPI City")
plt.ylabel("(KPI Climate + KPI Water)/2 ")
plt.title("Intersection of KPIs between cities and corporations")


# This function formatter will replace integers with target names
formatter = plt.FuncFormatter(lambda val, loc: df['CDP Region'].unique()[val])

# We must be sure to specify the ticks matching our target names
plt.colorbar(ticks=[0, 1, 2, 3, 4, 5, 6, 7], format=formatter);

# Set the clim so that labels are centered on each block
plt.clim(-0.5, 7.5)


for area in [750, 1500, 3000]:
    plt.scatter([], [], c='k', alpha=0.4, s=area,label='     '+str(area/3000*2)+ ' M')
plt.legend(scatterpoints=1, frameon=False, labelspacing=2, title='Population')

Show = False

The graph above shows us that environmental issues are best taken into account in African countries followed by Southeast Asia ans Oceania countries. The clusters seen in the graph are due to the fact that many cities are either in the United States or in Brazil. The KPI for cities are fairly different but the KPI for corporations are grouped by country.

# <a id='4.2'>4.2 Intersection between environnemental and social issues</a>

In [None]:
df2 = pd.merge(df, Vulnerability, how = 'outer', on = ['Country'])
df2.sort_values(by = 'CDP Region', inplace = True)
df2['CDP Region']=pd.Categorical(df2['CDP Region'])
 
# create data
x = df2['Population living below the national income poverty line']
y =  df2['KPI climate']
z = df2['Population']/5000

# plot
plt.rcParams["legend.title_fontsize"] = "large"
plt.rcParams["font.size"] = 20
plt.rcParams["font.sans-serif"] = 'Arial'

plt.figure(figsize=(22, 10))
plt.scatter(x, y, s=z, c=df2['CDP Region'].cat.codes, cmap=plt.cm.get_cmap('RdYlBu', 8), alpha=0.6, edgecolors="grey", linewidth=2)

# Add titles (main and on axis)
plt.xlabel("Population living below the national income poverty line")
plt.ylabel("KPI Climate")
plt.title("Intersection of KPIs between cities and corporations")


# This function formatter will replace integers with target names
formatter = plt.FuncFormatter(lambda val, loc: df2['CDP Region'].unique()[val])

# We must be sure to specify the ticks matching our target names
plt.colorbar(ticks=[0, 1, 2, 3, 4, 5, 6, 7], format=formatter);

# Set the clim so that labels are centered on each block
plt.clim(-0.5, 7.5)


for area in [750, 1500, 3000]:
    plt.scatter([], [], c='k', alpha=0.4, s=area,label='     '+str(area/3000*2)+ ' M')
plt.legend(scatterpoints=1, frameon=False, labelspacing=2, title='Population')

Show = False

The figure below confirms the previous information. Corporations seem to be engaged in social issues. Climate issues appear even more important if there is a high level of poverty in the country. Cities and corporations that are already facing tomorrow's global challenges are already prepared for them. Cities and corporations that are not yet involved should make plans to prepare for them.

# <a id='5'>5. Conclusion</a>

We have succeeded in building KPI based on CDP data.
It was shown that the three main indicators $KPI_{City}$, $KPI_{climate}$ and $KPI_{Water}$ assess the readiness of cities and companies to deal with environmental risks and social issues. It also appears that many cities and corporations are well aware of environmental and social issues.
In addition, the corporations that obtain the highest scores are those where social issues are the most important.

If we were to answer the title question, the answer would be as follows:

Most cities are prepared for environmental and social issues and have a plan for the future. Cities or corporations that are not yet or only minimally concerned with these issues should prepare plans in case the problems accelerate.

In this notebook, it was chosen to create KPIs on the whole database. It may be interesting to explore other elements of the dataset:
- Observe the evolution of the performance indicators over the available years.
- Create more accurate indicators using a specific section of the dataset
- Analyze performance indicator values for specific cities or corporations

# References
1.[CDP: Unlocking Climate Solutions Dataset](https://www.kaggle.com/c/cdp-unlocking-climate-solutions/data)

2.[Cities 2020 for CDP](https://www.kaggle.com/jbrans/cities-2020-for-cdp) 
A database of the cities contained in the CDP data with corrected population and location by [me](https://www.kaggle.com/jbrans).

3.[Country code](https://www.kaggle.com/koki25ando/country-code) Country iso code by [thatdataanalyst](https://www.kaggle.com/koki25ando).

4.[Compiled datasets for CDP Analytics Competition](https://www.kaggle.com/seraphimstreets/environmentequity-starterpack) Many indicators by [seraphimstreets](https://www.kaggle.com/seraphimstreets) obtained through [World Bank](https://www.worldbank.org/)

## Thank you for reading so far. Do leave a vote if you liked it ✋