# Measuring Urban Climate Equity

Brooks Jessup <br>
*University of California, Berkeley*

![Cities Today](https://cities-today.com/wp-content/uploads/2018/03/34043106024_4f811f400f_k.jpg)

# Executive Summary

This notebook analyzes the current state of climate equity in cities around the world. Based on data reported by more than 500 city governments to the Climate Disclosure Project (CDP) in 2020, I develop key performance indicators (KPIs) to measure how well cities are incorporating social equity and inclusion into their responses to climate change. I find that cities are becoming increasingly aware of how climate hazards disproportionately affect some of their residents more than others, but this growing awareness has not yet translated into a corresponding increase in action aimed at mitigating the impact on these vulnerable populations.

# Table of Contents

1. **Introduction**
2. **The CDP Data**
3. **Climate Hazards**
4. **Climate Actions**
5. **Climate Equity**
6. **Conclusion**

# 1. Introduction

In cities around the world, some people are more vulnerable than others to the growing impact of climate change. When heat waves, hurricanes, flash floods, and other types of climate hazards strike urban communities, the hardest hit are usually low-income households, minorities, the elderly, and other marginalized groups who are both highly exposed and lack the resources with which to effectively respond. In order to adapt to climate change in ways that are socially inclusive and equitable, cities therefore need to prioritize the protection of these vulnerable populations. However, while some city governments are starting to lead the way, most have yet to fully incorporate social equity considerations into their climate plans and actions.([1](http://wrirosscities.org/research/publication/how-tackle-climate-change-and-inequality-jointly-cities))

The issue of urban climate equity is the focus of a recent Kaggle competition, [CDP: Unlocking Climate Solutions](https://www.kaggle.com/c/cdp-unlocking-climate-solutions). The competition was hosted by the [Climate Disclosure Project](https://www.cdp.net/en) (CDP), an international non-profit organization that runs a global disclosure system for cities and companies to report their environmental impacts. Through this disclosure system, CDP has amassed the most comprehensive collection of self-reported environmental data in the world. For the Kaggle competition, the organization made a large amount of its most recent data [publicly available](https://www.kaggle.com/c/cdp-unlocking-climate-solutions/data) and challenged participants to develop Key Performance Indicators (KPIs) to help cities adapt to climate change in ways that are socially equitable.

The present notebook takes up CDP’s challenge by developing a climate equity score for individual cities based on the public data. My approach focuses on measuring the *efforts* that cities are making to incorporate social equity into their climate responses, rather than the *outcomes* of such efforts. This approach is in keeping with the nature of the CDP data, which is voluntarily self-reported by city leaders without independent verification. For example, a lack of data reported on how climate hazards are negatively affecting vulnerable populations in a particular city is less likely to indicate a positive *outcome* of inclusive climate policies than to reflect a poor *effort* by that city to incorporate social equity into its climate impact assessments. Accordingly, the central question addressed by this notebook is: *How well are cities incorporating social equity into their climate adaptation efforts?*

First, the notebook discusses the scope and limitations of the CDP data in more detail. Second, it explores what the data can tell us about the climate hazards that are currently affecting cities worldwide, with specific focus on how these hazards impact vulnerable populations. Third, it examines the information that cities report about the actions they are taking in response to climate hazards, again with an emphasis on how these actions aim to protect vulnerable populations. Finally, it combines the foregoing analysis of climate hazards and actions into a composite score for each individual city and extracts insights about the current state of urban climate equity across the world. The notebook concludes with a summary of the findings and next steps for future study.

In [None]:
# Import libraries

import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns

# 2. The CDP Data

CDP collects environmental data from cities through its [global disclosure system](http://www.cdp.net/en/guidance/guidance-for-cities). Every year the organization invites city governments to reply to its questionnaire using an online response system. The questions on the questionnaire are revised and updated annually. All responses are completely voluntary and there is no minimum amount of data required for submission. Cities can opt to make their responses non-public if they do not want their data to be identified in CDP’s reports and open datasets.

The present notebook focuses on the most recent data that CDP collected using its [2020 questionnaire for cities](https://www.kaggle.com/c/cdp-unlocking-climate-solutions/data). The 2020 questionnaire asks for general information about the city and more specific environmental data in 11 categories: Governance and Data Management, Climate Hazards and Vulnerability, Adaptation, City-wide Emissions, Emissions Reduction, Opportunities, Energy, Transportation, Food, Waste, and Water Security. The dataset provided by CDP of public responses to its 2020 questionnaire is comprised of 869,313 rows of distinct responses by cities to the questions on the questionnaire.

In [None]:
# Load the data from the 2020 Questionnaire

fc_df = pd.read_csv("../input/cdp-unlocking-climate-solutions/Cities/Cities Responses/2020_Full_Cities_Dataset.csv")

print(fc_df.shape)
fc_df.head()

The 2020 CDP dataset includes public responses from 566 different cities. Each “city” is represented by the organization that responded to the questionnaire, typically the local city government. A “city” in the CDP dataset is therefore defined by administrative boundaries that may encompass more or less than the full extent of the local urban community. For example, if an urban area is governed by more than one city government, the CDP data may only include the boundaries of one of those governments (e.g. "Kansas City" only includes data from the part of the city that falls within Missouri, not Kansas).

In [None]:
# Read in the general data about the cities that responded to the 2020 questionnaire

cities_df = pd.read_csv("../input/cdp-unlocking-climate-solutions/Cities/Cities Disclosing/2020_Cities_Disclosing_to_CDP.csv")
print(cities_df.shape)
cities_df.head()

In [None]:
# In this data, "Kansas City" includes Kansas City, MO, but not Kansas City, KS.

cities_df.loc[cities_df["Organization"] == "Kansas City"]

In terms of geographic location, the cities in the dataset are spread out across all the regions of the world. However, it is an uneven distribution. Europe and the Americas are much better represented than Asia and Africa.

In [None]:
# Read in the cleaned geospatial data provided by Kaggle (shabou) and merge.
# Note: The original geospatial data in "City Location" has many
# missing values and errors.

# Read in the cleaned geospatial dataset from Kaggle
city_coords = pd.read_csv("../input/cdp-challenge-cities-geolocation-data/CDP-Cities-goegraphical-coordinates.csv")

# Change the key column name to facilitate merge
city_coords.rename(columns={"Account.Number":"Account Number"}, inplace=True)

# Merge the geospatial coordinates into the cities dataframe
cities_df = pd.merge(cities_df, city_coords[["Account Number","lat", "long"]])

# Convert to a GeoDataFrame
cities_gdf = gpd.GeoDataFrame(cities_df, geometry=
                             gpd.points_from_xy(cities_df['long'], cities_df['lat']))

# Set the Coordinate Reference System
cities_gdf.crs = "epsg:4326"

# Drop the old city location column
cities_gdf.drop(columns="City Location", inplace=True)

# Import a world map
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# Remove Antarctica from the map
world = world[(world.pop_est>0) & (world.name!="Antarctica")]

# Plot the city locations on top of the world map
fig, ax = plt.subplots(figsize=(20,20))
ax.set_title("Geographical Distribution of Cities in the 2020 CDP Dataset")
ax.set_aspect('equal')
world.plot(ax=ax, color='lightgrey', edgecolor='white')
cities_gdf.plot(ax=ax, marker='o', color='red', markersize=15);

In [None]:
# Visualize the distribution of cities by region

cities_by_region = cities_df["CDP Region"].value_counts(ascending=True)

plt.barh(y = cities_by_region.keys(), width = cities_by_region.values)
plt.title('Cities by Region');

In terms of population size, the dataset includes a wide range from some of the largest cities in the world down to very small communities. Some of these small communities might not conventionally be considered cities.

# 3. Climate Hazards

This section of the notebook examines what the CDP data can tell us about the climate hazards that are currently affecting cities worldwide, with a focus on how these hazards impact vulnerable populations.

## Extraction & Transformation
Cities provided information about the climate hazards they face in their responses to question 2.1 on the 2020 questionnaire: 

> "Please list the most significant climate hazards faced by your city and indicate the probability and consequence of these hazards, as well as the expected future change in frequency and intensity. Please also select the most relevant assets or services that are affected by the climate hazard and provide a description of the impact."

I extracted the most relevant information from the responses to this part of the survey and summarized it in a new dataset with one row for each distinct hazard reported and the following fields of information about those hazards:
* **Hazard Number** (int): Identifying number for each hazard reported by a given city (ex. if a city reported 5 hazards, they are identified as hazards 1 through 5).
* **Hazard Type** (string): Category for the type of hazard reported, selected from a list of standard categories provided by CDP (ex. drought, fire, etc.).
* **Risk to VPs** (1/0): Whether or not the city reported that the hazard is expected to increase the risk to already vulnerable populations.
* **Vulnerable Populations** (1/0): A series of columns indicating which vulnerable populations the city identified as being affected by the hazard, selected from a standard list provided by CDP (ex. women & girls, elderly, etc.).
* **Total VPs Affected** (int): The total number of vulnerable populations identified by the city as being affected by the hazard.

In [None]:
# Create a new dataframe to hold the hazards reported by each city (in 2020)

# Extract the data
hazards_df = fc_df.loc[(fc_df["Year Reported to CDP"] == 2020) &
                       (fc_df["Question Number"] == "2.1") &
                       (fc_df["Column Number"] == 1) &
                       (~fc_df["Response Answer"].isna()),
                       ["Account Number", "Organization", "Country", "CDP Region", "Year Reported to CDP", "Row Number", "Response Answer"]
                      ].sort_values(by=["Account Number", "Row Number"])

# Rename the columns
hazards_df.rename(columns={"Year Reported to CDP": "Year", "Response Answer": "Hazard"}, inplace=True)

# Add a column to indicate whether the hazards increased risk to already vulnerable populations.

# Extract the data from the survey
data = fc_df.loc[(fc_df["Year Reported to CDP"] == 2020) &
                 (fc_df["Question Number"] == "2.1") &
                 (fc_df["Column Number"] == 5) &
                 (fc_df["Response Answer"] == "Increased risk to already vulnerable populations"),
                 ["Account Number", "Row Number", "Response Answer"]
                ].groupby(["Account Number", "Row Number"])["Response Answer"].count()

# Merge the data into the df
hazards_df = pd.merge(hazards_df, data, how="left", on=["Account Number", "Row Number"])

# Give the column a more descriptive name
hazards_df.rename(columns={"Response Answer": "Risk to VPs"}, inplace=True)

# Fill in missing values with 0 and convert the entire column to integers
hazards_df["Risk to VPs"] = hazards_df["Risk to VPs"].fillna(value=0)
hazards_df["Risk to VPs"] = hazards_df["Risk to VPs"].astype(int)

# Add columns for the vulnearble populations impacted by the hazards.

# Extract the data from the survey
data = fc_df.loc[(fc_df["Year Reported to CDP"] == 2020) &
                 (fc_df["Question Number"] == "2.1") &
                 (fc_df["Column Number"] == 7),
                 ["Account Number", "Row Number", "Response Answer"]
                ]

# Simplify all "Other..." responses to "Other"
data.loc[(~data["Response Answer"].isna()) &
               (data["Response Answer"].str.startswith("Other")),
               "Response Answer"] = "Other"

# Dummy the new column
data = pd.get_dummies(data, columns=["Response Answer"], prefix="", prefix_sep="")

# Aggregate the data for each distinct hazard
data = data.groupby(["Account Number", "Row Number"]).sum()

# Add a column for the total number of VPs affected
data["Total VPs Affected"] = data.sum(axis=1)

# Merge the data into the df
hazards_df = pd.merge(hazards_df, data, how="left", on=["Account Number", "Row Number"])

# Clean up the column names
hazards_df.rename(columns={"Row Number": "Hazard Number", "Hazard": "Hazard Type"}, inplace=True)

print(hazards_df.shape)
hazards_df.head()

## Exploratory Analysis

### How many climate hazards does each city face on average?
* The 566 cities that responded to the CDP questionnaire reported **a total of 2,703 climate hazards**. Individual cities therefore report facing **an average of between four and five climate hazards**.

### Which types of hazards do cities most commonly face?
* The most commonly reported hazards are **rain storms, droughts, heat waves, and flash floods**. These top hazards are reported much more frequently that other types of hazards.
* The surprisingly low frequency of **monsoons** (at the very bottom of the list) appears to reflect the fact that Asian cities are underreprestented in the CDP data.

In [None]:
# Which hazards do cities most commonly face?

most_common_hazards = hazards_df.groupby("Hazard Type")["Hazard Number"].count().sort_values(ascending=True)

plt.figure(figsize=(10,10))
plt.barh(y = most_common_hazards.keys(), width = most_common_hazards.values)
plt.title('Number of Hazards Reported (2020)');

### What percentage of hazards do cities report are increasing the risk to vulnerable populations?
* Cities reported that **55.5%** of all climate hazards increase the risk to already vulnerable populations (1500 out of 2703).

In [None]:
# What percentage of hazards do cities report are increasing the risk to vulnerable populations?

total = hazards_df["Risk to VPs"].count()
risk = hazards_df["Risk to VPs"].sum()
perc = round((risk/total)*100, 1)

plt.pie(x = [risk, total-risk], labels=["risk", "no risk"]);

### Which types of hazards most frequently affect vulnerable populations?

* Cities reported that **cyclones, heat waves, and cold waves** are the hazards that most frequently affect vulnerable populations.
* More generally, the data suggests that almost all types of hazards can and do impact vulnerable populations.

In [None]:
# Which hazards most frequently affect already vulnerable populations?

freq_impact_on_vps = hazards_df.groupby("Hazard Type")["Risk to VPs"].mean().sort_values(ascending=True)

plt.figure(figsize=(10,10))
plt.barh(y = freq_impact_on_vps.keys(), width = freq_impact_on_vps.values)
plt.title('Frequency of Impact on Vulnerable Populations (2020)');

### Which types of hazards impact the largest total number of vulnerable populations?
* Cities reported that **heat waves, rain storms, droughts, extreme hot days, and flash floods** impact the most vulnerable populations.

In [None]:
# Which hazards have the greatest total impact on vulnerable populations?

most_impact_on_vps = hazards_df.groupby("Hazard Type")["Total VPs Affected"].sum().sort_values(ascending=True)

plt.figure(figsize=(10,10))
plt.barh(y = most_impact_on_vps.keys(), width = most_impact_on_vps.values)
plt.title('Number of Vulnerable Populations Affected by Hazard Type (2020)');

### Which types of hazards impact the highest average number of vulnerable populations per incident?
* Cities report that **extreme hot days, heat waves, extreme winter conditions, and cyclones** impact the highest number of vulnerable populations per incident.

In [None]:
# Which hazards have the greatest average impact on vulnerable populations?

most_impact_on_vps = hazards_df.groupby("Hazard Type")["Total VPs Affected"].mean().sort_values(ascending=True)

plt.figure(figsize=(10,10))
plt.barh(y = most_impact_on_vps.keys(), width = most_impact_on_vps.values)
plt.title('Average Number of Vulnerable Populations Affected per Incident (2020)');

### Which vulnerable populations are most frequently affected by climate hazards overall?
* Cities report that **the elderly, low-income households, and persons living in sub-standard housing** are most frequently affected by climate hazards. This makes sense given that the hazards most commonly affecting vulnerable populations are extreme weather conditions like heat waves.

In [None]:
# Which vulnerable populations are most affected by climate hazards overall?

most_affected_vps = hazards_df.iloc[:, 8:19].sum().sort_values(ascending=True)

plt.figure(figsize=(10,5))
plt.barh(y = most_affected_vps.keys(), width = most_affected_vps.values)
plt.title('Number of Vulnerable Populations Affected by Hazards (2020)');

### Which vulnerable populations are most frequently affected by specific types of hazards, such as forest fires?
* Cities report that **people with chronic diseases and the elderly** are more frequently affected by forest fires. This makes sense because people with health conditions like asthma are more affected by the smoke from forest fires.

In [None]:
# Which vulnerable populations does each type of hazard affect the most?
# Example 1: Forest fires

forest_fires = hazards_df.loc[hazards_df["Hazard Type"] == "Wild fire > Forest fire"].iloc[:, 8:19].sum().sort_values(ascending=True)

plt.figure(figsize=(10,5))
plt.barh(y = forest_fires.keys(), width = forest_fires.values)
plt.title('Number of Vulnerable Populations Affected by Forest Fires (2020)');

### Which vulnerable populations are most frequently affected by cyclones (hurricanes and typhoons)?
* Cities report that **the elderly and low income households** are most frequently affected by cyclones.

In [None]:
# Which vulnerable populations does each type of hazard affect the most?
# Example 2: Cyclones (hurricanes and typhoons)

cyclones = hazards_df.loc[hazards_df["Hazard Type"] == "Storm and wind > Cyclone (Hurricane / Typhoon)"].iloc[:, 8:19].sum().sort_values(ascending=True)

plt.figure(figsize=(10,5))
plt.barh(y = cyclones.keys(), width = cyclones.values)
plt.title('Number of Vulnerable Populations Affected by Cyclones (2020)');

## Summary of Climate Hazards

* The main story in the data reported by cities is that of extreme temperatures (particularly heat waves) affecting the elderly. This seems to be the kind of climate impact on vulnerable populations that cities are most aware of.
* However, there are other stories buried here as well. For example, the impact of forest fires on people with chronic diseases. Cities need to pay more attention to and report these other connections between climate hazards and vulnerable populations.

# 4. Climate Actions

This section of the notebook examines what the CDP data can tell us about the actions that cities are taking in response to climate hazards, with a focus on how these actions aim to protect vulnerable populations.

## Extraction & Transformation

Cities provided information about the actions they are taking to mitigate climate hazards in their responses to question 3.0 on the 2020 questionnaire: 
> “Please describe the main actions you are taking to reduce the risk to, and vulnerability of, your city’s infrastructure, services, citizens, and businesses from climate change as identified in the Climate Hazards section.” 

I extracted the most relevant information from the responses to this part of the survey and summarized it in a new dataset with one row for each distinct action reported and the following fields of information about those actions:
* **Action Number** (int): Identifying number for each action reported by a given city (ex. if a city reported 5 actions, they would be identified as actions 1 through 5, even if two of these actions were in response to one hazard and the other three in response to another hazard).
* **Action Type** (string): Category for the type of action reported, drawn from a given list of possible answers (ex. Flood mapping, community engagement, etc.). Note that one of the possible categories is “Projects and policies targeted at those most vulnerable”.
* **Hazard Type** (string): Category for the type of hazard that the action was a response to. These hazards match exactly the hazards reported in Question 2.1.
* **Poverty Reduction** (1/0): Whether or not the city reported “Poverty reduction/eradication” as a co-benefit area of the action.
* **Social Inclusion** (1/0): Whether or not the city reported “Social inclusion, social justice” as a co-benefit area of the action.


In [None]:
# Create a new dataframe to hold the actions reported by each city (in 2020)

# Extract the data
actions_df = fc_df.loc[(fc_df["Year Reported to CDP"] == 2020) &
                       (fc_df["Question Number"] == "3.0") &
                       (fc_df["Column Number"] == 2) &
                       (~fc_df["Response Answer"].isna())&
                       (fc_df["Response Answer"] != "No action currently taken"),
                       ["Account Number", "Organization", "Country", "CDP Region", "Year Reported to CDP", "Row Number", "Response Answer"]
                      ].sort_values(by=["Account Number", "Row Number"])

# Simplify all "Other..." responses to "Other"
actions_df.loc[(~actions_df["Response Answer"].isna()) &
               (actions_df["Response Answer"].str.startswith("Other")),
               "Response Answer"] = "Other"

# Rename the columns
actions_df.rename(columns={"Year Reported to CDP": "Year", "Response Answer": "Action"}, inplace=True)

# Add a column for the hazards targeted by the actions.

# Extract the data from the survey
data = fc_df.loc[(fc_df["Year Reported to CDP"] == 2020) &
                 (fc_df["Question Number"] == "3.0") &
                 (fc_df["Column Number"] == 1),
                 ["Account Number", "Row Number", "Response Answer"]
                ]

# Merge the data into the df
actions_df = pd.merge(actions_df, data, how="left", on=["Account Number", "Row Number"])

# Give the column a more descriptive name
actions_df.rename(columns={"Response Answer": "Hazard"}, inplace=True)

# Add a column to indicate whether the action benefits poverty reduction.

# Extract the data from the survey
data = fc_df.loc[(fc_df["Year Reported to CDP"] == 2020) &
                 (fc_df["Question Number"] == "3.0") &
                 (fc_df["Column Number"] == 6) &
                 (fc_df["Response Answer"] == "Poverty reduction / eradication"),
                 ["Account Number", "Row Number", "Response Answer"]
                ].groupby(["Account Number", "Row Number"])["Response Answer"].count()

# Merge the data into the df
actions_df = pd.merge(actions_df, data, how="left", on=["Account Number", "Row Number"])

# Give the column a more descriptive name
actions_df.rename(columns={"Response Answer": "Poverty Reduction"}, inplace=True)

# Clean up the values in the new column
actions_df["Poverty Reduction"] = actions_df["Poverty Reduction"].fillna(value=0)
actions_df["Poverty Reduction"] = actions_df["Poverty Reduction"].astype(int)

# Add a column to indicate whether the action benefits social inclusion.

# Extract the data from the survey
data = fc_df.loc[(fc_df["Year Reported to CDP"] == 2020) &
                 (fc_df["Question Number"] == "3.0") &
                 (fc_df["Column Number"] == 6) &
                 (fc_df["Response Answer"] == "Social inclusion, social justice"),
                 ["Account Number", "Row Number", "Response Answer"]
                ].groupby(["Account Number", "Row Number"])["Response Answer"].count()

# Merge the data into the df
actions_df = pd.merge(actions_df, data, how="left", on=["Account Number", "Row Number"])

# Give the column a more descriptive name
actions_df.rename(columns={"Response Answer": "Social Inclusion"}, inplace=True)

actions_df["Social Inclusion"] = actions_df["Social Inclusion"].fillna(value=0)
actions_df["Social Inclusion"] = actions_df["Social Inclusion"].astype(int)

# Clean up the column names

actions_df.rename(columns={"Row Number": "Action Number", "Action": "Action Type", "Hazard": "Hazard Type"}, inplace=True)

print(actions_df.shape)
actions_df.head()

## Exploratory Analysis

### How many actions, on average, are cities taking in response to climate hazards? <br>
* The 566 cities that responded to the CDP questionnaire reported a total of 2,629 climate actions. Individual cities therefore take an average of **four to five distinct actions, or about one action per hazard**.

### What are the most common types of actions reported by cities?
* Cities reported that **flood mapping, tree planting, and community education**, were the most common types of actions taken to mitigate climate hazards.
* Note that the largest number of actions reported were categorized as **'Other'**. This may suggest that cities are developing innovative actions that do not fit into conventional categories.

In [None]:
# What are the most common types of actions that cities take to mitigate climate hazards?

most_common_actions = actions_df.groupby("Action Type")["Action Number"].count().sort_values(ascending=True)

plt.figure(figsize=(10,15))
plt.barh(y = most_common_actions.keys(), width = most_common_actions.values)
plt.title('Number of Actions Reported per Action Type (2020)');

### Which types of hazards did cities respond to with the most actions?
* Cities reported that **rain storms, heat waves, flash floods, droughts, and extreme hot days** are the hazards that cities are responding to the most.
* This parallels the most common hazards faced by cities (see above).

In [None]:
# Which types of hazards had the most actions in response to them?

most_targetted_hazards = actions_df.groupby("Hazard Type")["Action Number"].count().sort_values(ascending=True)

plt.figure(figsize=(10,15))
plt.barh(y = most_targetted_hazards.keys(), width = most_targetted_hazards.values)
plt.title('Number of Actions Reported per Hazard Type (2020)');

### What percentage of hazard responses include actions aimed at protecting vulnerable populations?
* According to their own reports, cities are taking actions aimed at vulnerable populations in response to **only 4%** of the hazards they are experiencing (112 out of 2,536). 
* This is far below the 55% of hazards that cities report as affecting vulnerable populations (see above).

In [None]:
# What percentage of hazard responses include actions targeting the most vulnerable?


vp_actions = actions_df.loc[actions_df["Action Type"] == "Projects and policies targeted at those most vulnerable",
               "Action Type"
              ].count()
hazard_responses = actions_df.groupby("Account Number")["Hazard Type"].count().sum()
percentage = round((vp_actions/hazard_responses)*100)

plt.pie(x = [vp_actions, hazard_responses-vp_actions], labels=["vp actions", "no vp actions"]);

### Which types of hazards had the most actions aimed at protecting vulnerable populations?
* Cities report that **heat waves, extreme hot days, rain storms, and flash floods** have the most actions targeting vulnerable populations.
* Although the hazards that cities are responding to with actions aimed at vulnerable populations are mostly those hazards that they report to have the greatest impact on vulnerable populations (see above), there are some significant discrepancies. **Cyclones**, for example, are the type of hazard that cities report most frequently affect vulnerable populations, yet they do not often respond to cyclones with actions that aim to protect vulnerable populations.

In [None]:
# Which types of hazards had the most projects targeting vulnerable populations?

vp_hazards = actions_df.loc[actions_df["Action Type"] == "Projects and policies targeted at those most vulnerable"
              ].groupby(["Hazard Type"])["Action Type"].count().sort_values(ascending=True)

plt.figure(figsize=(10,10))
plt.barh(y = vp_hazards.keys(), width = vp_hazards.values)
plt.title('Number of Actions Targeting Vulnerable Populations by Hazard Type (2020)');

### Which types of actions are beneficial for poverty reduction?
* Cities report that **projects targetting vulnerable populations** benefit poverty reduction far more than any other type of action.

In [None]:
# Which actions have most benefited povery reduction?

pov_red = actions_df.groupby("Action Type")["Poverty Reduction"].sum().sort_values(ascending=True)

plt.figure(figsize=(10,15))
plt.barh(y = pov_red.keys(), width = pov_red.values)
plt.title('Number of Actions Benefitting Poverty Reduction (2020)');

### Which types of actions most frequently benefit poverty reduction?
* Cities report that **economic diversification measures and promotion of low flow technologies** are types of actions that most frequently have a beneficial impact on poverty reduction.

In [None]:
# Which actions most frequently benefit poverty reduction?

freq_pov_red = actions_df.groupby("Action Type")["Poverty Reduction"].mean().sort_values(ascending=True)

plt.figure(figsize=(10,15))
plt.barh(y = freq_pov_red.keys(), width = freq_pov_red.values)
plt.title('Frequency of Benefit to Poverty Reduction (2020)');

### Which types of actions are beneficial to social inclusion?
* Cities report that **projects targetting vulnerable populations** are by far the most beneficial for social inclusion, followed by **community engagement and education**.
* This suggests that community engagement on climate hazards is an important way for cities to overcome the social marginalization of vulnerable populations.

In [None]:
# Which actions have benefits for social inclusion?

soc_inc = actions_df.groupby("Action Type")["Social Inclusion"].sum().sort_values(ascending=True)

plt.figure(figsize=(10,15))
plt.barh(y = soc_inc.keys(), width = soc_inc.values)
plt.title('Number of Actions Benefitting Social Inclusion (2020)');

### Which types of actions most frequently benefit social inclusion?
* Cities report that **mangrove planting, sea level rise adaptation planning, and promotion of low flow technologies** most frequently benefit social inclusion.
* However, it is unclear whether the data on mangrove planting and sea level adapation is actually statistically significant because in each case only one action was reported.

In [None]:
# Which actions most frequently benefit social inclusion?

freq_soc_inc = actions_df.groupby("Action Type")["Social Inclusion"].mean().sort_values(ascending=True)

plt.figure(figsize=(10,15))
plt.barh(y = freq_soc_inc.keys(), width = freq_soc_inc.values)
plt.title('Frequency of Benefit to Poverty Reduction (2020)');

## Summary of Climate Actions

* The main story here is that, according to their own reporting, cities are not focusing their climate actions on mitigation of the impact on already vulnerable populations (even when they are aware that climate hazards are increasing the risk to such populations).
* A corollary is that, even when cities are taking actions aimed at vulnerable populations, they are not always in response to the hazards that most frequently affect vulnerable populations, such as cyclones and cold waves (#4 and 5). Certain types of hazards are most likely to affect vulnerable populations, and thus should include actions to mitigate that impact.
* There are also some secondary insights that certain types of projects have been reported as potentially more beneficial for vulnerable populations: economic diversification, low flow technologies, retrofit existing buildings, (for poverty reduction), and community engagement/education (for social inclusion).

# 5. Climate Equity

This section of the notebook combines the foregoing data on climate hazards and actions into Key Performance Indicators (KPIs) and a composite score for each individual city, then analyzes these scores and compares them with CDP city scores to extract insights about the current state of urban climate equity.

## KPI Modeling

Key Performance Indicators (KPIs) were selected from the foregoing data on climate hazards and actions, then combined into a composite Climate Equity Score for each city. The five KPIs selected were first combined into two intermediary categories: Awareness and Action. The KPIs in the awareness category measure how well a city identifies and evaluates the impact of climate hazards on vulnerable populations. The KPIs in the action category measure how well a city’s response to climate hazards addresses the impact on vulnerable populations. When combined together, the awareness and action KPIs provide a relatively comprehensive measurement of how well a city incorporates social equity into its climate adaption efforts (i.e. the climate equity score).

*Awareness KPIs*:
* **Evaluation of Risk to VPs** (1/0): Whether or not the city has at least one climate risk and vulnerability assessment that identifies vulnerable populations. Note that some cities have multiple assessments – only one of these needs to identify vulnerable populations.
* **Affected VPs per Hazard** (Normalized) (float): Average number of vulnerable populations the city identifies per climate hazard, normalized on a scale of (0,1) relative to other cities in the sample.

*Action KPIs*:
* **VP Actions per Hazard (Normalized)** (float): Average number of actions targeting vulnerable populations the city took per climate hazard, normalized on a scale of (0,1) relative to other cities in the sample.
* **PR Actions Per Hazard (Normalized)** (float): Average number of actions beneficial to poverty reduction per climate hazard, normalized on a scale of (0,1) relative to other cities in the sample.
* **SI Actions Per Hazard (Normalized)** (float): Average number of actions beneficial to social inclusion per climate hazard, normalized on a scale of (0,1) relative to other cities in the sample.

In [None]:
# Create a new dataframe to hold the KPI values for each distinct city (in the 2020 survey)

# Make the new dataframe with just one row per city
equity_df = pd.DataFrame(fc_df.drop_duplicates(subset=["Account Number"], keep='last', ignore_index=True))

# Drop the columns for survey answers
equity_df.drop(equity_df.columns.difference(["Account Number", "Organization", "Country", "CDP Region"]), 
               1, inplace=True)

# Sort the dataframe from account number in ascending order
equity_df.sort_values(by=["Account Number"], ignore_index=True, inplace=True)

# Add a column to indicate whether the city has undertaken *at least one*
# climate change risk assessment that identifies vulnerable populations.

# Extract the data from the survey
data = fc_df.loc[(fc_df["Year Reported to CDP"] == 2020) &
                 (fc_df["Question Number"] == "2.0b") &
                 (fc_df["Column Number"] == 7) &
                 (fc_df["Response Answer"] == "Yes"),
                 ["Account Number", "Response Answer"]
                ].groupby(["Account Number"])["Response Answer"].count()

# Merge the data into the df
equity_df = pd.merge(equity_df, data, how="left", on=["Account Number"])

# Give the column a more descriptive name
equity_df.rename(columns={"Response Answer": "Evaluation of Risk to VPs"}, inplace=True)

# Clean up the column to 0s and 1s as integers
equity_df["Evaluation of Risk to VPs"] = equity_df["Evaluation of Risk to VPs"].fillna(value=0)
equity_df["Evaluation of Risk to VPs"] = equity_df["Evaluation of Risk to VPs"].astype(int)
equity_df.loc[equity_df["Evaluation of Risk to VPs"] >= 1, "Evaluation of Risk to VPs"] = 1

# Add a column for the total number of hazards reported by each city (in 2020)

# Extract the data
data = hazards_df.groupby("Account Number")["Hazard Number"].count()

# Merge the data
equity_df = pd.merge(equity_df, data, how="left", on=["Account Number"])

# Give the column a more descriptive name
equity_df.rename(columns={"Hazard Number":"Total Hazards"}, inplace=True)

# Clean up the missing values and datatype
equity_df["Total Hazards"] = equity_df["Total Hazards"].fillna(value=0)
equity_df["Total Hazards"] = equity_df["Total Hazards"].astype(int)

# Add a column for the number of hazards identified as increasing the risk to vulnerable populations

# Extract the data
data = hazards_df.groupby("Account Number")["Risk to VPs"].sum()

# Merge the data
equity_df = pd.merge(equity_df, data, how="left", on=["Account Number"])

# Give the column a more descriptive name
equity_df.rename(columns={"Risk to VPs":"Hazards Affecting VPs"}, inplace=True)

# Clean up the missing values and datatype
equity_df["Hazards Affecting VPs"] = equity_df["Hazards Affecting VPs"].fillna(value=0)
equity_df["Hazards Affecting VPs"] = equity_df["Hazards Affecting VPs"].astype(int)

# Add a column for the total number of vulnerable populations identified as affected by the hazards

# Extract the data
data = hazards_df.groupby("Account Number")["Total VPs Affected"].sum()

# Merge the data
equity_df = pd.merge(equity_df, data, how="left", on=["Account Number"])

# Clean up the missing values and datatype
equity_df["Total VPs Affected"] = equity_df["Total VPs Affected"].fillna(value=0)
equity_df["Total VPs Affected"] = equity_df["Total VPs Affected"].astype(int)

# Add a column for the average number of vulnerable populations identified per hazard

equity_df["Affected VPs per Hazard"] = round(equity_df["Total VPs Affected"] / equity_df["Total Hazards"], 1)

# Clean up rows where no hazards were reported
equity_df["Affected VPs per Hazard"] = equity_df["Affected VPs per Hazard"].fillna(value=0)

# Add a column for the NORMALIZED average number of vulnerable populations identified per hazard

equity_df["Affected VPs per Hazard (Normalized)"] = (equity_df["Affected VPs per Hazard"] - equity_df["Affected VPs per Hazard"].min()) / (equity_df["Affected VPs per Hazard"].max() - equity_df["Affected VPs per Hazard"].min())

# Add a column for overall awareness of the impact on vulnerable populations
equity_df["Awareness Score"] = (0.5 * equity_df["Evaluation of Risk to VPs"]) + (0.5 * equity_df["Affected VPs per Hazard (Normalized)"])

# Add a column for the total number of actions targeting vulnerable populations

# Extract the data
data = actions_df.loc[actions_df["Action Type"] == "Projects and policies targeted at those most vulnerable",
                      ["Account Number", "Action Type"]
                     ].groupby("Account Number")["Action Type"].count()

# Merge the data
equity_df = pd.merge(equity_df, data, how="left", on=["Account Number"])

# Clean up the missing values and datatype
equity_df["Action Type"] = equity_df["Action Type"].fillna(value=0)
equity_df["Action Type"] = equity_df["Action Type"].astype(int)

# Give the column a more descriptive name
equity_df.rename(columns={"Action Type": "Actions Targeting VPs"}, inplace=True)

# Add a column for the average number of actions targeting vulnerable populations per hazard
equity_df["VP Actions per Hazard"] = round(equity_df["Actions Targeting VPs"] / equity_df["Total Hazards"], 1)

# Clean up rows where no hazards were reported
equity_df["VP Actions per Hazard"] = equity_df["VP Actions per Hazard"].fillna(value=0)

# Identify the rows with "inf" value and drop them from the dataframe
index_to_drop = equity_df.loc[equity_df["VP Actions per Hazard"] == equity_df["VP Actions per Hazard"].max()].index
equity_df.drop(axis=0, index=index_to_drop, inplace=True)

# Add a column for the NORMALIZED average number of actions targeting vulnerable populations per hazard
equity_df["VP Actions per Hazard (Normalized)"] = (equity_df["VP Actions per Hazard"] - equity_df["VP Actions per Hazard"].min()) / (equity_df["VP Actions per Hazard"].max() - equity_df["VP Actions per Hazard"].min())

# Add a column for the total number of actions with benefits for poverty reduction

# Extract the data
data = actions_df.groupby("Account Number")["Poverty Reduction"].sum()

# Merge the data
equity_df = pd.merge(equity_df, data, how="left", on=["Account Number"])

# Clean up the missing values and datatype
equity_df["Poverty Reduction"] = equity_df["Poverty Reduction"].fillna(value=0)
equity_df["Poverty Reduction"] = equity_df["Poverty Reduction"].astype(int)

# Give the column a more descriptive name
equity_df.rename(columns={"Poverty Reduction": "Actions Benefitting Poverty Reduction"}, inplace=True)

# Add a column for the average number of actions benefitting poverty reduction per hazard

equity_df["PR Actions per Hazard"] = round(equity_df["Actions Benefitting Poverty Reduction"] / equity_df["Total Hazards"], 1)

# Clean up rows where no hazards were reported
equity_df["PR Actions per Hazard"] = equity_df["PR Actions per Hazard"].fillna(value=0)

# Add a column for the NORMALIZED average
equity_df["PR Actions per Hazard (Normalized)"] = (equity_df["PR Actions per Hazard"] - equity_df["PR Actions per Hazard"].min()) / (equity_df["PR Actions per Hazard"].max() - equity_df["PR Actions per Hazard"].min())

# Add a column for the total number of actions with benefits for social inclusion

# Extract the data
data = actions_df.groupby("Account Number")["Social Inclusion"].sum()

# Merge the data
equity_df = pd.merge(equity_df, data, how="left", on=["Account Number"])

# Clean up the missing values and datatype
equity_df["Social Inclusion"] = equity_df["Social Inclusion"].fillna(value=0)
equity_df["Social Inclusion"] = equity_df["Social Inclusion"].astype(int)

# Give the column a more descriptive name
equity_df.rename(columns={"Social Inclusion": "Actions Benefitting Social Inclusion"}, inplace=True)

# Add a column for the average number of actions benefitting social inclusion per hazard
equity_df["SI Actions per Hazard"] = round(equity_df["Actions Benefitting Social Inclusion"] / equity_df["Total Hazards"], 1)

# Clean up rows where no hazards were reported
equity_df["SI Actions per Hazard"] = equity_df["SI Actions per Hazard"].fillna(value=0)

# Add a column for the NORMALIZED average
equity_df["SI Actions per Hazard (Normalized)"] = (equity_df["SI Actions per Hazard"] - equity_df["SI Actions per Hazard"].min()) / (equity_df["SI Actions per Hazard"].max() - equity_df["SI Actions per Hazard"].min())

# Add a column for overall action to mitigate of the impact on VPs
equity_df["Action Score"] = (0.5 * equity_df["VP Actions per Hazard (Normalized)"]) + (0.25 * equity_df["PR Actions per Hazard (Normalized)"]) + (0.25 * equity_df["SI Actions per Hazard (Normalized)"])

# Add a column for overall social equity in climate response
equity_df["Overall Climate Equity Score"] = (0.5 * equity_df["Awareness Score"]) + (0.5 * equity_df["Action Score"])

# Add a column for the ranking of the overall score
ranks = equity_df["Overall Climate Equity Score"].rank(ascending=False, method="min").astype(int)
equity_df["Overall Climate Equity Rank"] = ranks

# Create a reduced version of the dataframe focusing on the KPIs and scores
kpi_df = equity_df[["Account Number",
                    "Organization",
                    "Country",
                    "CDP Region",
                    "Total Hazards",
                    "Evaluation of Risk to VPs",
                    "Affected VPs per Hazard (Normalized)",
                    "Awareness Score",
                    "VP Actions per Hazard (Normalized)",
                    "PR Actions per Hazard (Normalized)",
                    "SI Actions per Hazard (Normalized)",
                    "Action Score",
                    "Overall Climate Equity Score",
                    "Overall Climate Equity Rank"
                   ]
                  ]

print(kpi_df.shape)
kpi_df.head()

## Exploratory Analysis

### Which cities have the highest overall climate equity scores?
* The highest scoring cities are: **Buenos Aires, Tagum City, Leon de los Aldamas, Montecario,** etc. 
* The top 10 is dominated by cities in **Latin America**, and the United States is not well represented.

In [None]:
# Which cities have the highest overall climate equity scores?

highest_overall_scores = kpi_df.sort_values(by="Overall Climate Equity Score", ascending=False).head(20)
highest_overall_scores = highest_overall_scores.sort_values(by="Overall Climate Equity Score", ascending=True)

plt.figure(figsize=(10,8))
plt.barh(y = highest_overall_scores["Organization"], width = highest_overall_scores["Overall Climate Equity Score"])
plt.title('Cities with the Top 20 Overall Climate Equity Scores (2020)');

In [None]:
# Which cities have the highest overall climate equity scores?

kpi_df.sort_values(by="Overall Climate Equity Score", ascending=False).head(20)

### Which cities have the lowest overall climate equity scores?
* These are cities that did not report any impact on or actions targeting vulnerable populations. As a result, their scores are zero across the board.

In [None]:
# Which cities have the lowest overall climate equity scores?

kpi_df.sort_values(by="Overall Climate Equity Score", ascending=True).head()

### Which cities have the highest awareness scores?
* **Nottingham, Salem, Chihuahua, and Tagum** have the highest awareness scores.
* There are many cities with high awareness scores.
* Some cities achieved high scores by reporting only a few hazards but many affects on vulnerable populations (ex. Chihuahua).

In [None]:
# Which cities have the highest awareness scores?

highest_awareness_scores = kpi_df.sort_values(by="Awareness Score", ascending=False).head(20)
highest_awareness_scores = highest_awareness_scores.sort_values(by="Awareness Score", ascending=True)

plt.figure(figsize=(10,8))
plt.barh(y = highest_awareness_scores["Organization"], width = highest_awareness_scores["Awareness Score"])
plt.title('Cities with the Top 20 Awareness Scores (2020)');

In [None]:
# Which cities have the highest awareness scores?

equity_df.iloc[:, 0:11].sort_values(by="Awareness Score", ascending=False).head(10)

### What percentage of cities have climate impact assessments that identify vulnerable populations?
* Slightly more than half of the cities that disclosed to CDP report having climate impact assessments that identify vulnerable populations.

In [None]:
# What percentage of cities have climate impact assessments that identify vulnerable populations?

vp_assessments = equity_df.loc[equity_df["Evaluation of Risk to VPs"] == 1, "Account Number"].count()
print(f"{round((vp_assessments/565)*100,1)} percent of cities have climate impact assessments that identify vulnerable populations.")

plt.pie(x = [vp_assessments, 565-vp_assessments], labels=["VP Assessment", "No VP Assessment"]);

### Which cities have the highest action scores?
* **Buenos Aires, Montecario, Leon**, etc. have the highest action scores.
* Compared to the distribution of awareness scores, there are relatively few cities that achieved high action scores.
* Some cities achieved high scores by reporting a small number of hazards and a large proportion of actions aimed at protecting vulnerable populations (ex. Suwon, Seoul).

In [None]:
# Which cities have the highest action scores?

highest_action_scores = kpi_df.sort_values(by="Action Score", ascending=False).head(20)
highest_action_scores = highest_action_scores.sort_values(by="Action Score", ascending=True)

plt.figure(figsize=(10,8))
plt.barh(y = highest_action_scores["Organization"], width = highest_action_scores["Action Score"])
plt.title('Cities with the Top 20 Action Scores (2020)');

In [None]:
# Which cities have the highest action scores?

equity_df.iloc[:,np.r_[0:4,5,11:21]].sort_values(by="Action Score", ascending=False).head(10)

### What is the geographic distribution of climate equity scores?

In [None]:
# Visualize the geographic distribution of climate equity scores

# Merge the geospatial coordinates into the cities dataframe
kpi_df = pd.merge(kpi_df, city_coords[["Account Number","lat", "long"]])

# Convert to a GeoDataFrame
kpi_gdf = gpd.GeoDataFrame(kpi_df, geometry=
                             gpd.points_from_xy(kpi_df['long'], kpi_df['lat']))

# Set the Coordinate Reference System
kpi_gdf.crs = "epsg:4326"

# Import a world map
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# Remove Antarctica from the map
world = world[(world.pop_est>0) & (world.name!="Antarctica")]

# Plot the city climate equity score on the world map
fig, ax = plt.subplots(figsize=(20,30))
ax.set_aspect('equal')
world.plot(ax=ax, color='lightgrey', edgecolor='white')
kpi_gdf.plot(ax=ax,
             column='Overall Climate Equity Score',
             cmap='RdYlGn',
             scheme='user_defined',
             classification_kwds={'bins':[.2, .4, .6, .8]},
             marker='o', 
             markersize=25,
             legend=True,
             legend_kwds={"title":"Climate Equity Score"}
            )

plt.title(label="Geographic Distribution of Climate Equity Scores",
          fontdict={"fontsize":20}
         );

### Of the 88 cities on CDP's A-List, which cities are also ranked in the top 88 climate equity scores?
* Out of the 88 cities on CDP's A-List, **only 29 cities** also ranked in the top 88 climate equity scores.
* Only 3 of the cities in the top ten climate equity scores are on the CDP's A-List.

In [None]:
# Create a new dataframe for the CDP's 2020 A-List.

cities = ['Ajuntament de Barcelona', 'Auckland Council',
       'Ayuntamiento de Hermosillo', 'Ayuntamiento de Murcia',
       'Ayuntamiento de Vitoria-Gasteiz', 'BCP Council',
       'Bristol City Council', 'Bærum Kommune', 'Canberra',
       'City of Adelaide', 'City of Athens', 'City of Baltimore',
       'City of Berkeley', 'City of Berlin', 'City of Boston',
       'City of Boulder', 'City of Buenos Aires', 'City of Calgary',
       'City of Cape Town', 'City of Cleveland', 'City of Columbus',
       'City of Copenhagen', 'City of Denver', 'City of Espoo',
       'City of Eugene', 'City of Flagstaff', 'City of Hayward',
       'City of Helsinki', 'City of Lahti', 'City of Los Angeles',
       'City of Louisville, KY', 'City of Lund', 'City of Melbourne',
       'City of Miami', 'City of Paris', 'City of Park City, UT',
       'City of Philadelphia', 'City of Porto', 'City of San Antonio',
       'City of San Francisco', 'City of San José', 'City of Stockholm',
       'City of Sydney', 'City of Toronto', 'City of Turku',
       'City of Vancouver', 'City of West Palm Beach', 'City of Windsor',
       'City Örebro', 'Comune di Firenze', 'Comune di Torino',
       'Cuyahoga County', 'District of Columbia',
       'District of Saanich, BC', 'Egedal Municipality',
       'Gladsaxe Kommune', 'Gobierno Municipal de León de los Aldamas',
       'Government of Hong Kong Special Administrative Region',
       'Greater London Authority', 'Halifax Regional Municipality',
       'Helsingør Kommune / Elsinore Municipality',
       'Hoeje-Taastrup Kommune', 'Hørsholm Kommune',
       'Iskandar Regional Development Authority', 'Malmö Stad',
       'Mexico City', 'Moscow Government', 'Municipalidad de Peñalolén',
       'Municipalidad de San José', 'Municipality of Recife',
       'Município de Braga', 'Município de Águeda',
       'New Taipei City Government', 'Newcastle City Council',
       'Pingtung County Government', 'Prefeitura do Rio de Janeiro',
       'San Luis Obispo', 'Seoul Metropolitan Government',
       'Stadt Heidelberg', 'Stadt Zürich', 'Taichung City Government',
       'Tainan City Government', 'Taoyuan City Government',
       'The Local Government of Quezon City', 'Town of Breckenridge, CO',
       'Town of Vail, CO', 'Village of Park Forest, IL', 'Västervik']

regions = ['Europe', 'Southeast Asia and Oceania', 'Latin America', 'Europe',
       'Europe', 'Europe', 'Europe', 'Europe',
       'Southeast Asia and Oceania', 'Southeast Asia and Oceania',
       'Europe', 'North America', 'North America', 'Europe',
       'North America', 'North America', 'Latin America', 'North America',
       'Africa', 'North America', 'North America', 'Europe',
       'North America', 'Europe', 'North America', 'North America',
       'North America', 'Europe', 'Europe', 'North America',
       'North America', 'Europe', 'Southeast Asia and Oceania',
       'North America', 'Europe', 'North America', 'North America',
       'Europe', 'North America', 'North America', 'North America',
       'Europe', 'Southeast Asia and Oceania', 'North America', 'Europe',
       'North America', 'North America', 'North America', 'Europe',
       'Europe', 'Europe', 'North America', 'North America',
       'North America', 'Europe', 'Europe', 'Latin America', 'East Asia',
       'Europe', 'North America', 'Europe', 'Europe', 'Europe',
       'Southeast Asia and Oceania', 'Europe', 'Latin America', 'Europe',
       'Latin America', 'Latin America', 'Latin America', 'Europe',
       'Europe', 'East Asia', 'Europe', 'East Asia', 'Latin America',
       'North America', 'East Asia', 'Europe', 'Europe', 'East Asia',
       'East Asia', 'East Asia', 'Southeast Asia and Oceania',
       'North America', 'North America', 'North America', 'Europe']

a_list = {'Organization': cities, 'CDP Region': regions}
a_list = pd.DataFrame(a_list)

# Add a column to indicate A-List status
a_list["CDP A-List"] = 1

# Merge the a-list data into the kpi dataframe
kpi_df = pd.merge(kpi_df, a_list, how="left", on=["Organization", "CDP Region"])

# Clean up the missing values and datatype
kpi_df["CDP A-List"] = kpi_df["CDP A-List"].fillna(value=0)
kpi_df["CDP A-List"] = kpi_df["CDP A-List"].astype(int)

# Which cities on the A-List are also ranked in the top 88?

overlap = kpi_df.loc[(kpi_df["CDP A-List"] == 1) &
                     (kpi_df["Overall Climate Equity Rank"] <= 88)
                    ].sort_values("Overall Climate Equity Rank", ascending=True)

print("How many cities?", overlap.shape[0])
print("Which cities?",  overlap["Organization"].values.tolist())
overlap.head(10)

### Of the cities with the top 88 climate equity scores, which cities are NOT on CDP's A-List?
* There are **59 cities** (including 7 in the top 10), such as Tagum, Montecario, Suwon, etc.
* These cities' significant efforts to incorporate social equity into their climate adaptation may have been overlooked by the CDP's ranking system.

In [None]:
# Which cities ranked in the top 88 are NOT on the A-List?

ranked_not_alist = kpi_df.loc[(kpi_df["Overall Climate Equity Rank"] <= 88) &
                              (kpi_df["CDP A-List"] == 0)
                             ].sort_values("Overall Climate Equity Rank", ascending=True)

print("How many cities?", ranked_not_alist.shape[0])
print("Which cities?",  ranked_not_alist["Organization"].values.tolist())
ranked_not_alist.head(10)

### Of the 88 cities on CDP's A-List, which cities are NOT ranked in the top 88 climate equity scores?
* There are **59 cities** (including some with very low climate equity scores) that are on the CDP's A-List but did not rank in the top 88 climate equity scores.
* This suggests that some of the cities that CDP included on its A-List are actually among the worst at incorporating social equity into their climate adaptation efforts.

In [None]:
# Which cities on the A-List were not ranked in the top 88?

alist_not_ranked = kpi_df.loc[(kpi_df["Overall Climate Equity Rank"] > 88) &
                              (kpi_df["CDP A-List"] == 1)
                             ].sort_values("Overall Climate Equity Rank", ascending=False)

print("How many cities?", alist_not_ranked.shape[0])
print("Which cities?",  alist_not_ranked["Organization"].values.tolist())
alist_not_ranked.head(10)

## Summary

* The main story here is that CDP’s A-List does not adequately reflect climate equity. My score points both to cities like Tagum that are among the best in terms of equity but don’t appear on the A-List, and to cities like Philadelphia that are on the A-List but are among the worst in terms of equity.
* Another takeaway is that cities are doing better at awareness than at action. It is the action score that really separates out the top cities. Many cities are doing better with their awareness of the impact on vulnerable populations, but relatively few are taking actions to actually mitigate that impact.

# 6. Conclusion

This notebook has taken up the challenge of using CDP’s data to develop key performance indicators (KPIs) that can help cities adapt to climate change in ways that are socially equitable. First, I analyzed the data reported by cities about the climate hazards they face, and selected KPIs for measuring their awareness of how these hazards are affecting the most vulnerable populations within their communities. Second, I analyzed the data reported by cities about the actions they are taking to mitigate climate hazards, and selected KPIs for measuring the extent to which these actions prioritize the protection of vulnerable populations. Finally, I combined these KPIs into an overall climate equity score for each city and compared the results to those of CDPs own scoring system.

My analysis shows that most cities in the CDP data have not thoroughly incorporated social equity and inclusion into their responses to climate change, though some progress is being made. 

Most of the progress has been in the growing awareness by city governments of the disproportionate impact that climate hazards have on their most vulnerable residents. More than half (53.5%) of the cities in the CDP data report having carried out a climate impact assessment that identifies vulnerable populations. This incorporation of social equity into the assessment process is helping to bring into sharper focus a picture of how climate hazards are affecting vulnerable populations. For example, cities provided data showing which types of climate hazards are most likely to affect which types of vulnerable populations. 

However, the growing awareness among city governments of how climate hazards are disproportionately affecting their most vulnerable residents has not yet translated into a corresponding increase in mitigation efforts that prioritize the protection of those residents. Although cities acknowledge that 55% of the climate hazards they face are increasing the risks to already vulnerable populations, they report that only 4% of their responses to those same hazards include actions aimed at protecting the most vulnerable.

Nevertheless, my analysis of the CDP data draws attention to certain cities that are leading the way toward greater urban climate equity. Cities like Buenos Aires in South America, Tagum City in Asia, and Louisville in North America (some of which do not appear on CDP’s A-List) have significantly incorporated social equity and inclusion into both their assessments of local climate impacts and their efforts to mitigate those impacts. The data reported by such leading cities in climate equity point to best practices that could be adopted by their peers. For example, cities that report taking actions to protect vulnerable populations also provide evidence that these efforts can help to reduce poverty and improve social inclusion in urban communities.

The analysis in this notebook hardly exhausts the many insights to be gained from CDP’s extremely rich collection of data. First of all, I have mainly focused on the more readily quantifiable answers that cities provided on the CDP questionnaire, but they also provided substantial textual descriptions of both climate hazards and actions that could be profitably mined with natural language processing. Perhaps the most obvious next step is to incorporate the data collected by CDP in years prior to 2020, which would facilitate examining the trends in urban climate equity over time. The analysis would also surely benefit from incorporating the data from CDP’s questionnaires for corporations, which cities often partner with in tackling climate issues. However, any analysis focused primarily on the CDP data will continue to suffer from its geographical limitations until more cities in Asia, Africa, and the Middle East start to participate in the global disclosure system.