In this notebook, we will try to come up with KPIs that can help CDP to assess performances of cities and corporations on different aspects, that have impacts on our environment and that in turn cause social impacts as well. All the KPI use CDP data and leverages some external data.

We have restricted most of our work to US cities, for deeper exploration of socio-environmental issues, through utilization of a diverse set of external data. But occasionally we have also looked at European cities, for possible comparisons on KPI values. 

**We have categorized cities into 4 different segments, based on population density**. Why? Because population density is a reflection of various underlying factors and their current status. For example, industry, employment, facilities, favourable geolocation etc. For last 100 years, these factors have caused increment or decrement in population in the cities, thus changing the population density. However, resource consumption varies according to population density and often causes catastrophic consequences to city and it's neighbouring environment. 

**We have opted this way for two reasons.**

**1. Along with creating KPIs, it's important to assess their performance and accuracy. This will pave the way to do that, through relevant discussions.** 

**2. Measuring and observing the values of KPIs, against the population density segment first, instead of looking at individual performance of a city, will help us to generate insights on a broader scale.**



Since this is a long notebook, let us start with an abstract of the entire work.

# Abstract

Main theme of this work is to explore the impact of our energy sources in GHG emission and water resources. Currently, our energy plants, cities and corporations generate & use power, from  different kinds of fuels. Some of them are renewable and some are not. GHG emission varies from high to low, based on the fuel. We will explore KPIs that can help to establish link between energy consumption of a city and it's ghg emission in various sectors and how it varies across different population density segment, due to availability or non-availability of renewable energy sources. We will also discuss on correlated social impacts of the total emissions.

But along with this, we will see, how choice of energy source also influence the water usage. Later, we will combine, ghg emission and excess water usage for energy needs to create **'Long-term Environment Risk ', that summarizes, how adversely a city's energy consumption pattern is affecting the air and water reserves of environment.**

Then, we will map **Long-term Environment Risk  and CDC's Social Vulnerability Index in one to,categorize cities into sections, such as 'high vulnerability-high environmental risk' category.** This will show, how dire is the situation for a city in long term.

Next, we will measure imminent environmental risk using city responses on climate hazards faced. We will combine Long-term risks and imminent risks to environment against adaptation efforts of the cities, to explore, whether cities' responses are proportionate to risk.

Although, the above will be our main focus of work, we will later explore additional KPIs , that can help CDP to assess other aspects of survey responses, such as

1. adaptation efforts of the cities
2. severity of impact on social aspects of the cities
3. severity of affected services in the cities
4. KPIs on food and waste related sectors.

Lastly, we will also look at some KPIs, related to corporations, showing , 
1. their favourability towards GHG emissions in terms of corporate expenditure and 
2. their carbon dependency or growth of GHG emission in their operations.

# Data used and approach towards data usage

GHG emission doesn't happen as an isolated event. It's sources are there in food production, waste generation, transport and our day to day city activities. CDP' survey captures data on many of these and we will study all these data parallelly, instead of individually, so that the environmental impacts become more clear.
 
 However, **to make relevant KPIs, we need data, which has not been provided by all cities in all of the sections. Some times data we need to combine, belong to different but adjacent years. However, most of the data used here are at city-based county level or at state level. We will leverage the FIPS based values, present in many of these data to increase accuracy of the derived KPIs.** 
 
 We will work with these data,under the assumption that at city level, these data do not change drastically on the very next year. For example, per capita income or emission or energy consumption, is unlikely to differ by a significant margin between 2018 and 2019.

Since, we have used a lot of external data, it's necessary to give some details about it to the readers.

**1. USA's per capita energy consumption data, sector-wise.**

    This provides data about state-wise per capita energy consumption in sectors like residential,industrial,commercial,transport.

    Source : https://www.eia.gov/state/seds/data.php?incfile=/state/seds/sep_sum/html/rank_use_capita.html&sid=US


**2. USA's water usage data for cooling needs in energy plants.**

    Plant-wise water usage for colling needs, with details of related fuel, state-based location.
    Source : https://www.eia.gov/electricity/data/water/ 
    

**3. USA's Water census data 2015.**
    
    Water census happens on every 5 years in USA. Last available census data with details of sector based usage and source of water, at county level of USA.

    Source : https://www.sciencebase.gov/catalog/item/5af3311be4b0da30c1b245d8


**4. Financial data of enlisted US companies.**

    Financial data of companies , enlisted with US securities and exchange commission.
    Source  : https://www.sec.gov/dera/data/financial-statement-data-sets.html
    
**5. Social Vulnerability Index Data.**

    CDC provided social vulnerability index data for 2019.
    
**6. CDC Census data of 500 cities.**

    

In [None]:
from glob import glob
import pandas as pd
import numpy as np
from tqdm import tqdm
from difflib import SequenceMatcher
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
from sklearn.preprocessing import normalize

import seaborn as sns
import plotly as ply
import altair as alt
import matplotlib.pyplot as plt

In [None]:
external_data_dir = '/kaggle/input/cdp-leo/External_data/'
cdp_dir = '/kaggle/input/cdp-unlocking-climate-solutions/'
data_2018 = glob(cdp_dir+'*/*/*/2018*.csv')+glob(cdp_dir+'*/*/*2018*.csv')
data_2019 = glob(cdp_dir+'*/*/*/2019*.csv')+glob(cdp_dir+'*/*/*2019*.csv')
data_2020 = glob(cdp_dir+'*/*/*/2020*.csv')+glob(cdp_dir+'*/*/*2020*.csv')

cities_response_files = glob(cdp_dir+'*/*/*Full_Cities_Dataset.csv')
cities_disclosing_files = glob(cdp_dir+'*/*/*Cities_Disclosing_to_CDP.csv')

cities_disclosing = pd.concat([pd.read_csv(file) for file in cities_disclosing_files])
cities_response = pd.concat([pd.read_csv(file) for file in cities_response_files])

corporate_climate_change_disclosing_files = glob(cdp_dir+'*/*/*/*Corporates_Disclosing_to_CDP_Climate_Change*.csv')
corporate_water_security_disclosing_files = glob(cdp_dir+'*/*/*/*Corporates_Disclosing_to_CDP_Water_Security*.csv')
corporate_climate_change_response_files = glob(cdp_dir+'*/*/*/*Full_Climate_Change_Dataset*.csv')
corporate_water_security_response_files = glob(cdp_dir+'*/*/*/*Full_Water_Security_Dataset*.csv')

svi_data = pd.read_csv(glob(cdp_dir+'*/*/*SVI*US.csv')[0])
svi_data['AREA_SQKM']= svi_data['AREA_SQMI']*2.59
census_tract_data = pd.read_csv(cdp_dir+'Supplementary Data/CDC 500 Cities Census Tract Data/500_Cities__Census_Tract-level_Data__GIS_Friendly_Format___2019_release.csv')

corporate_climate_change_disclosing = pd.concat([pd.read_csv(file) for file in corporate_climate_change_disclosing_files])
corporate_water_security_disclosing = pd.concat([pd.read_csv(file) for file in corporate_water_security_disclosing_files])
corporate_climate_change_response = pd.concat([pd.read_csv(file) for file in corporate_climate_change_response_files])
corporate_water_security_response = pd.concat([pd.read_csv(file) for file in corporate_water_security_response_files])

weightage = pd.read_csv(external_data_dir+'weightage.csv')
water_usage_data = pd.read_csv(external_data_dir+'usco2015v2.0.csv')
cooling_summary = pd.read_csv(external_data_dir+'cooling_summary_2019.csv')
corporations = pd.read_csv(external_data_dir+'climate_change_us_comps.csv')
financial_data = pd.read_csv(external_data_dir+'2020q1/num.txt',sep='\t')
tag_data = pd.read_csv(external_data_dir+'2020q1/sub.txt',sep='\t')

worldcities = pd.read_csv(external_data_dir+'worldcities.csv').rename(columns={'country':'Country'})
worldcities['Country'].replace(['Bolivia','Hong Kong','Côte D’Ivoire','Congo (Kinshasa)',
                               'Korea, South','Moldova','Russia','West Bank','Taiwan','United Kingdom','Tanzania',
                                'United States','Venezuela','Vietnam'                                
                               ],
                               
                               
                              ['Bolivia (Plurinational State of)',
 'China, Hong Kong Special Administrative Region',                              
 "Côte d'Ivoire",
 'Democratic Republic of the Congo',
 'Republic of Korea',
 'Republic of Moldova',
 'Russian Federation',
 'State of Palestine',
 'Taiwan, Greater China',
 'United Kingdom of Great Britain and Northern Ireland',
 'United Republic of Tanzania',
 'United States of America',
 'Venezuela (Bolivarian Republic of)',
 'Viet Nam'],
                              inplace=True)




worldcities = worldcities.merge(cities_response[['Country','CDP Region']].drop_duplicates(),on='Country',how='left')
world_cities_response = pd.DataFrame(cities_response,copy=True)
cities_response = cities_response[(cities_response['Country']=='United States of America')&(cities_response['Year Reported to CDP'].isin([2019,2020]))].reset_index(drop=True)

uscities = pd.read_csv(glob(cdp_dir+'*/*/uscities.csv')[0])
cities_response['state_id'] = cities_response['Organization'].apply(lambda x: x.split(',')[-1].rstrip().lstrip() if ',' in x else '')



mapping = {}
possible_mismatch = []

# Match names from worldcities data and create mapping, based on a string similarity more than 0.5

for city in cities_response['Organization'].unique():
    if 'city' in city.lower() and 'county' not in city.lower():
        similarity = 0
        candidate = 'None'
    #     print(city)


        city1 = city.split(',')[0]
        city1 = city1.replace('City of ','')
        city1 = city1.replace('Town of ','')
    #     print(city)
    #     print(country)
        for wcity in uscities['city'].unique():
            sim = SequenceMatcher(None, city1, wcity).ratio()
    #         print(city,wcity,sim)
            if sim>similarity:
                candidate = wcity
                similarity=sim


        if similarity<0.5:
            possible_mismatch.append((city1,candidate))
        else:
            mapping[city] = candidate
        
# those cities which did not have a good match, find their locations below
mapping['Metropolitan Government of Nashville and Davidson County'] = 'Nashville'
mapping['City and County of Honolulu'] = 'Honolulu'
mapping['District of Columbia'] = 'Washington'

cities_response['Organization'].replace(mapping,inplace=True)
cities_response = cities_response[cities_response['Organization'].isin(mapping.values())]



uscities.rename(columns={'city':'Organization','lng':'long'},inplace=True)
uscities.sort_values(by='population',ascending=False,inplace=True)
uscities.drop_duplicates(subset='Organization',keep='first',inplace=True)


def get_state_id(x):
    
    org = x[0]
    state_id = x[1]
    
    if state_id=='':
        return uscities[uscities['Organization']==org]['state_id'].values[0]
    else:
        return state_id
    
    
city_metadata = cities_response[cities_response['Question Number']=='0.6'][['Organization','Response Answer']].drop_duplicates(subset='Organization',keep='last')
city_metadata.rename(columns={'Response Answer':'area in sq.km'},inplace=True)
city_metadata.sort_values(by='Organization',inplace=True)
city_metadata = city_metadata.merge(cities_response[['Organization','state_id']].drop_duplicates(subset='Organization',keep='last'),on='Organization',how='left')
city_metadata['state_id'] = city_metadata[['Organization','state_id']].apply(lambda x: get_state_id(x),axis=1)


world_city_metadata = world_cities_response[world_cities_response['Question Number']=='0.6'][['Organization','Response Answer']].drop_duplicates(subset='Organization',keep='last')
world_city_metadata.rename(columns={'Response Answer':'area in sq.km'},inplace=True)
world_city_metadata.sort_values(by='Organization',inplace=True)


population_data = cities_response[(cities_response['Question Number']=='0.5')&(cities_response['Column Name']=='Current population')][['Organization','Response Answer']].drop_duplicates(subset='Organization',keep='last')
population_data.rename(columns={'Response Answer':'population'},inplace=True)
population_data.sort_values(by='Organization',inplace=True)

world_population_data = world_cities_response[(world_cities_response['Question Number']=='0.5')&(world_cities_response['Column Name']=='Current population')][['Organization','Response Answer']].drop_duplicates(subset='Organization',keep='last')
world_population_data.rename(columns={'Response Answer':'population'},inplace=True)
world_population_data.sort_values(by='Organization',inplace=True)


city_metadata = city_metadata.merge(population_data,on='Organization',how='left')
city_metadata['population'] = city_metadata['population'].astype('float')
city_metadata['area in sq.km'] = city_metadata['area in sq.km'].astype('float')
city_metadata['density'] = city_metadata['population']/city_metadata['area in sq.km']

world_city_metadata = world_city_metadata.merge(world_population_data,on='Organization',how='left')
world_city_metadata['population'] = world_city_metadata['population'].astype('float')
world_city_metadata['area in sq.km'] = world_city_metadata['area in sq.km'].astype('float')
world_city_metadata['density'] = world_city_metadata['population']/world_city_metadata['area in sq.km']
world_cities_response = world_cities_response.merge(world_city_metadata,on='Organization',how='left')


cities_response.drop(['state_id'],axis=1,inplace=True)
cities_response = cities_response.merge(city_metadata,on='Organization',how='left')
cities_response = cities_response.merge(uscities[['state_name','state_id','lat','long','Organization','county_fips']],how='left',on=['Organization','state_id'])



favorability = []
for org in cities_response['Organization'].unique():
    q_num='2.2'

    data = cities_response[((cities_response['Question Number']==q_num)&(cities_response['Year Reported to CDP'].isin([2020])))                            
                  &(cities_response['Organization']==org)].sort_values(by='Year Reported to CDP')[['Organization','Column Name','Response Answer','Row Number']]\
    .drop_duplicates(keep='first')\
    .sort_values(by=['Column Name','Row Number'])[['Column Name','Response Answer','Row Number']]
    
    data = data.pivot(index='Row Number',columns='Column Name',values='Response Answer')
    
#     data.pivot()
    favorability.append(data)
    
favorability = pd.concat(favorability).dropna().reset_index(drop=True)
favorability_weightage = pd.pivot_table(favorability[['Level of degree to which factor challenges/supports the adaptive capacity of your city','Factors that affect ability to adapt','Indicate if this factor either supports or challenges the ability to adapt']],index='Factors that affect ability to adapt',\
               columns='Indicate if this factor either supports or challenges the ability to adapt',\
               values='Level of degree to which factor challenges/supports the adaptive capacity of your city',\
               aggfunc='count',fill_value=0.0)

favorability_weightage.columns = ['Challenges','Supports']
favorability_weightage['Challenges'] = favorability_weightage['Challenges']/favorability_weightage['Challenges'].sum()
favorability_weightage['Supports'] = favorability_weightage['Supports']/favorability_weightage['Supports'].sum()


We will work with US cities having at least 35000 population. Then we will split these cities into 4 categories as per population density. Population and area(converted to sq.km) of the city are collected from city responses in the introduction section.
Below, you can see the distribution data for population density(population/Sq.km). We will pick the population density value at 25th,50th and 80th percentile to create four density segments. 

**1. Very High density : population >= 3371/sq.km**

**2. High density : population between 1747 and 3371/sq.km**

**3. Medium density : population between 1148 and 1747/sq.km**

**4. Low density : population less than 1148 people/sq.km**



In [None]:
cities_with_35k_pop = list(city_metadata[city_metadata['population']>=35000]['Organization'].values)
cities_response = cities_response[cities_response['Organization'].isin(cities_with_35k_pop)]


enlisted_cities = []
for city,state_id in zip(city_metadata['Organization'],city_metadata['state_id']):
    
    data = svi_data[svi_data['FIPS'].isin(census_tract_data[(census_tract_data['PlaceName']==city)&(census_tract_data['StateAbbr']==state_id)]['TractFIPS'].unique())][['E_TOTPOP','M_TOTPOP','AREA_SQKM']].sum()
#     print('City name : ',city)
    pop_error= ((data['E_TOTPOP']-data['M_TOTPOP'])/city_metadata[city_metadata['Organization']==city]['population'].values[0])
#     print('population error : ',pop_error)
    area_error = (data['AREA_SQKM']/city_metadata[city_metadata['Organization']==city]['area in sq.km'].values[0])
#     print('Area error : ', area_error )
    pop_upper_margin = 1.17
    pop_lower_margin = 0.83
    area_upper_margin = 1.5
    area_lower_margin = 0.5
#     print(pop_error)
    if (pop_lower_margin<pop_error) & (pop_error<pop_upper_margin) & (area_lower_margin<area_error) & (area_error<area_upper_margin):
        enlisted_cities.append(city)
        
cities_response = cities_response[cities_response['Organization'].isin(enlisted_cities)]



very_high_density_cities_ = cities_response[cities_response['density']>=3371]['Organization'].unique()
high_density_cities_ = cities_response[(cities_response['density']>=1747)&(cities_response['density']<3371)]['Organization'].unique()
medium_density_cities_ = cities_response[(cities_response['density']>=1148)&(cities_response['density']<1747)]['Organization'].unique()
low_density_cities_ = cities_response[cities_response['density']<1148]['Organization'].unique()
very_high_density_cities_ = list(cities_response[cities_response['Organization'].isin(list(very_high_density_cities_))].groupby(['Organization']).count().sort_values(by='Country',ascending=False).index.values)
high_density_cities_ = list(cities_response[cities_response['Organization'].isin(list(high_density_cities_))].groupby(['Organization']).count().sort_values(by='Country',ascending=False).index.values)
medium_density_cities_ = list(cities_response[cities_response['Organization'].isin(list(medium_density_cities_))].groupby(['Organization']).count().sort_values(by='Country',ascending=False).index.values)
low_density_cities_ = list(cities_response[cities_response['Organization'].isin(list(low_density_cities_))].groupby(['Organization']).count().sort_values(by='Country',ascending=False).index.values)


very_high_density_cities = []
high_density_cities = []
medium_density_cities = []
low_density_cities = []
limit=10
count=0
for city in low_density_cities_:
    if city in census_tract_data['PlaceName'].unique():
        low_density_cities.append(city)
        count+=1
        if count==limit:
            break
            
count=0
for city in medium_density_cities_:
    if city in census_tract_data['PlaceName'].unique():
        medium_density_cities.append(city)
        count+=1
        if count==limit:
            break
            
count=0
for city in high_density_cities_:
    if city in census_tract_data['PlaceName'].unique():
        high_density_cities.append(city)
        count+=1
        if count==limit:
            break
            
count=0
for city in very_high_density_cities_:
    if city in census_tract_data['PlaceName'].unique():
        very_high_density_cities.append(city)
        count+=1
        if count==limit:
            break   
            
all_cities = very_high_density_cities+high_density_cities+medium_density_cities+low_density_cities 

city_metadata['density'].fillna(1.0,inplace=True)
city_metadata['normalized_density'] = normalize(city_metadata['density'].values.reshape(1,-1),norm='max')[0]

def find_category(x):
    
    if x>=3371:
        return 'very high density'
    if (x<3371)&(x>=1747):
        return 'high density'  
    if (x<1747)&(x>=1148):
        return 'medium density'    
    else:
        return 'low density'

city_metadata['density_category'] = city_metadata['density'].apply(lambda x: find_category(x))
world_city_metadata['density_category'] = world_city_metadata['density'].apply(lambda x: find_category(x))
world_cities_response = world_cities_response.merge(world_city_metadata,on='Organization',how='left')

total_cities_response = pd.DataFrame(cities_response,copy=True)
total_cities_response = total_cities_response.merge(city_metadata,on='Organization',how='left')

q_num = '4.6a'
row_name = ['Total Generation of grid-supplied energy','Stationary energy > Residential buildings',
            'Stationary energy > Commercial buildings & facilities',
            'Stationary energy > Industrial buildings & facilities',  
            'Total Transport',
           'Total Stationary Energy',
            'Total Waste',
            'Stationary energy > Agriculture'
           ]

col_name = ['Direct emissions (metric tonnes CO2e)']

selected_cities = {}

for density in ['very high density','high density','medium density','low density']:
    selected_cities[density] = list(total_cities_response[(total_cities_response['density_category']==density)&\
                                                 (total_cities_response['Question Number']==q_num)&\
                                                 (total_cities_response['Column Name'].isin(col_name))&\
                                                 (total_cities_response['Row Name'].isin(row_name))&\
                                                 (total_cities_response['Year Reported to CDP'].isin([2020]))]\
                           [['Organization','Response Answer','density_category']]\
                           .replace({'Question not applicable':np.nan,'0':np.nan})\
                           .dropna()\
                           .groupby(['Organization'])\
                           .count()\
                           .sort_values(by='Organization',ascending=False)
                           .index.values)

carbon_intensity_per_kw_consumption = {}

for density in ['very high density','high density','medium density','low density']:
    for city in selected_cities[density]:
#         for row_n in row_name:
            carbon_intensity_per_kw_consumption[city] = [ total_cities_response[(total_cities_response['density_category']==density)&\
                                                     (total_cities_response['Question Number']==q_num)&\
                                                     (total_cities_response['Column Name'].isin(col_name))&\
                                                     (total_cities_response['Row Name'].isin([row_n]))&\
                                                      (total_cities_response['Organization']==city)&\
                                                     (total_cities_response['Year Reported to CDP'].isin([2020]))]\
                               ['Response Answer'].astype('float').values.sum()

                                                        for row_n in row_name]

ghg_from_consumption = pd.DataFrame(carbon_intensity_per_kw_consumption).T.reset_index().rename(columns={'index':'Organization'})
ghg_from_consumption = ghg_from_consumption.merge(city_metadata,on='Organization',how='left')
# ghg_from_consumption = ghg_from_consumption.merge(uscities[['Organization','state_id']],on='Organization',how='left')
ghg_from_consumption.fillna(0.0,inplace=True)
ghg_from_consumption.rename(columns={0:'Total emissions due to Generation of grid-supplied energy',
                                     1:'Total emissions from residential buildings',
                                     2:'Total emissions from commercial buildings',
                                     3:'Total emissions from industrial buildings',
                                     4:'Total emissions from transport',
                                     5:'Total emissions from Stationary Energy',
                                     6:'Total emissions from Waste',
                                     7:'Total emissions from agriculture'
                                     
                                    },inplace=True)

consumption_data_residential = pd.read_csv(external_data_dir+'Total Energy Consumption Estimates per Capita by End-Use Sector.csv')[['state_name','Residential_consumption_per_capita']]
consumption_data_commercial = pd.read_csv(external_data_dir+'Total Energy Consumption Estimates per Capita by End-Use Sector.csv')[['state_name.1','Commercial_consumption_per_capita']].rename(columns={'state_name.1':'state_name'})
consumption_data_industrial = pd.read_csv(external_data_dir+'Total Energy Consumption Estimates per Capita by End-Use Sector.csv')[['state_name.2', 'Industrial_consumption_per_capita']].rename(columns={'state_name.2':'state_name'})
consumption_data_transport = pd.read_csv(external_data_dir+'Total Energy Consumption Estimates per Capita by End-Use Sector.csv')[['state_name.3', 'Transportation_consumption_per_capita']].rename(columns={'state_name.3':'state_name'})
consumption_data_total= pd.read_csv(external_data_dir+'Total Energy Consumption Estimates per Capita by End-Use Sector.csv')[['state_name.4', 'Total_consumption_per_capita']].rename(columns={'state_name.4':'state_name'})

consumption_data = consumption_data_residential[['state_name','Residential_consumption_per_capita']]\
.merge(consumption_data_commercial[['state_name','Commercial_consumption_per_capita']],on='state_name',how='left')\
.merge(consumption_data_industrial[['state_name', 'Industrial_consumption_per_capita']],on='state_name',how='left')\
.merge(consumption_data_transport[['state_name', 'Transportation_consumption_per_capita']],on='state_name',how='left')\
.merge(consumption_data_total[['state_name', 'Total_consumption_per_capita']],on='state_name',how='left')

ghg_from_consumption = ghg_from_consumption.merge(uscities[['Organization','state_name']],on='Organization',how='left')
ghg_from_consumption = ghg_from_consumption.merge(consumption_data,on='state_name')

ghg_from_consumption = ghg_from_consumption.sort_values(by=['density_category','Total emissions due to Generation of grid-supplied energy',
       'Total emissions from residential buildings',
       'Total emissions from commercial buildings',
       'Total emissions from industrial buildings',
       'Total emissions from transport'],ascending=False).reset_index(drop=True)

# ENERGY

We will start by collecting sector wise GHG emission in every city.

In [None]:
ghg_from_consumption.head(10).style.set_caption('Total GHG emission sector wise')

In [None]:
ghg_from_consumption['Residential building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita'] =\
(ghg_from_consumption['Total emissions from residential buildings']/ghg_from_consumption['population'])/ghg_from_consumption['Residential_consumption_per_capita']
                                                                    
ghg_from_consumption['Commercial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita'] =\
(ghg_from_consumption['Total emissions from commercial buildings']/ghg_from_consumption['population'])/ghg_from_consumption['Commercial_consumption_per_capita']
                                                                    
ghg_from_consumption['Industrial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita'] =\
(ghg_from_consumption['Total emissions from industrial buildings']/ghg_from_consumption['population'])/ghg_from_consumption['Industrial_consumption_per_capita']
                                                                    
ghg_from_consumption['Transport emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita'] =\
(ghg_from_consumption['Total emissions from transport']/ghg_from_consumption['population'])/ghg_from_consumption['Transportation_consumption_per_capita']  

ghg_from_consumption['Residential building emissions - carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km'] =\
(ghg_from_consumption['Total emissions from residential buildings']/ghg_from_consumption['population'])*ghg_from_consumption['density']/ghg_from_consumption['Residential_consumption_per_capita']
                                                                    
ghg_from_consumption['Commercial building emissions - carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km'] =\
(ghg_from_consumption['Total emissions from commercial buildings']/ghg_from_consumption['population'])*ghg_from_consumption['density']/ghg_from_consumption['Commercial_consumption_per_capita']
                                                                    
ghg_from_consumption['Industrial building emissions - carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km'] =\
(ghg_from_consumption['Total emissions from industrial buildings']/ghg_from_consumption['population'])*ghg_from_consumption['density']/ghg_from_consumption['Industrial_consumption_per_capita']
                                                                    
ghg_from_consumption['Transport emissions - carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km'] =\
(ghg_from_consumption['Total emissions from transport']/ghg_from_consumption['population'])*ghg_from_consumption['density']/ghg_from_consumption['Transportation_consumption_per_capita']    

ghg_from_consumption_kpis_per_sqkm = ghg_from_consumption.groupby(['density_category'])[['Residential building emissions - carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km',
       'Commercial building emissions - carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km',
       'Transport emissions - carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km']].describe(percentiles=[.50]).T\
.reset_index()\
.rename(columns={'level_0':'KPI','level_1':'stats'})

ghg_from_consumption_kpis_per_capita = ghg_from_consumption.groupby(['density_category'])[['Residential building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Commercial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Transport emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',]].describe(percentiles=[.50]).T\
.reset_index()\
.rename(columns={'level_0':'KPI','level_1':'stats'})

# GHG emission/million Btu energy consumption KPI

Here, we will measure sector-wise GHG generation rate against energy consumption in that very sector.IN CDP surveys, GHG generation is captured using metric tonnes of Co2 equivalent. We will gather per capita energy consumption of that sector in the corresponding state, the city belongs to. Then we will try to get an approximate energy consumption in every sq.km of the city and also approximate energy consumption per capita , using population density, for that sector.

Per capita energy consumption data , that we have is in the scale of million Btu(British Thermal Unit).

**Source of data :** We have data on sector-wise per capita energy consumption data for each state in USA and also city based details of energy consumption in corresponding sector from CDP, along with population values.

So, a KPI like **'Residential building emissions - carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km' means** :

**In every sq.km of the city,**

**how much GHG, in metric tonnes of Co2e, is getting generated,**

**for every 1 million Btu of energy consumed,**

**in the residential buildings.**

For 'industrial building' section , there is only one sample in medium density category. Hence, we summarize the carbon intensity KPI for industrial buildings without it.

Do note, we have considered only direct emissions here.


***Carbon  Intensity / Million  Btu Energy Consumption KPI for Sector X = Per capita GHG emission in sector X in metric tonnes of Co2e/Per capita energy consumption in sector X in million Btu.*** 


We multiply this with population density to arrive at carbon intensity per sq.km.

## How is this Carbon intensity/million Btu energy consumption KPI helpful?

While other metrics like 'total emission' or 'emission per capita' helps in understanding, how much overall emission is or how much emission a person's activities generate on average, it skips a critical aspect of it. Source of energy. Depending on the source of energy, GHG generation for equal amount of energy consumption, can be higher or lower. That's why, we have the term 'clean energy' on fire nowadays. It simply means, GHG generation is far too less, compared to other sources of energy. But how do we measure that? That's where KPIs like these come into picture. 

Since we are talking in terms of a city's consumption, Carbon intenity for each million Btu of energy consumed in a particular sector, can tell us about a city's efforts or endorsement towards renewable or clean energy. For equal amount of enery consumed, if a city's per capita carbon intensity is less, that is good indication of having a better sources of energy.



In [None]:
cm = sns.light_palette("orangered", as_cmap=True)
ghg_from_consumption[['density_category', 'Organization', 'Residential building emissions - carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km',
       'Commercial building emissions - carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km',
       'Industrial building emissions - carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km',
       'Transport emissions - carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km',
]].style.background_gradient(cmap=cm,axis=1).set_caption('carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km')

In [None]:
cm = sns.light_palette("orangered", as_cmap=True)
ghg_from_consumption[['density_category', 'Organization', 'Residential building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Commercial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Industrial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Transport emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
]].style.background_gradient(cmap=cm,axis=1).set_caption('carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita')

## What can we say about transport section from the above data?

Well, it looks pretty bad. Total emission and per capita emission, in both categories, transport is emerging as a winner, over others. So, it's fair to say, that focusing on transport related emissions, should be a top priority to the cities. We will look into it further, when we check out the transport section of the survey.

## But why are the figures in transport so high, compared to other sectors? 

Since, these KPIs relate source of energy and emission, we must point out to the primary source of energy in transport. Fossil fuels, Petroleum or hydrocarbon based fuel. While other sectors can resort to some kind of clean energy, like solar or hydro energy, not enough has been done, to change the picture in transport sector. On top of it, city life and it's economics are hugely dependent on mobility. 


## Why are both Carbon intensity/million Btu , per capita KPI and per sqkm KPI important? 

**Per capita KPI** creates a level playing field for cities from every density segment, thus helping with a **inter-density-segment comparison**. That's why, in the below section, where we do the density-segment based study with per capita KPI, we find cities with medium and low densities are in worse condition, that others. 

**Per sq.km KPI** can help us with **intra-density-segment comparison**. Within a particular density segment, how cities are doing compared to each other.





Let's look at the below study, where we compare how cities from different density segments are doing in each of these KPIs.

Per sq.km KPI tells us nothing much in this summary,as cities with higher density will be above the others and that is what reflects in the first table, below. However, it can tell you how San Fransisco is doing compared to another highly dense city like New York or Washington.

In [None]:
ghg_from_consumption_kpis_per_sqkm[ghg_from_consumption_kpis_per_sqkm['stats'].isin(['mean','50%'])]\
.style.highlight_max(color = 'orangered', axis = 1)

However, per capita KPIs show an interesting picture.

1. **Low density cities are doing worst in transport section.** Could this be because of lack of mass transport systems? 

2. In commercial buildng section, low density cities are far behind, in a good way and understandably so.
3. Whereas, in residential building section, carbon intensity/million Btu energy consumption per capita is almost equal for all. 

**4. 'Carbon intensity/energy consumption Per capita KPIs' also show that very high category density-segment is doing fairly better in all sectors, except industrial section, where it's understandably highest.** 

In [None]:
ghg_from_consumption_kpis_per_capita[ghg_from_consumption_kpis_per_capita['stats'].isin(['mean','50%'])]\
.style.highlight_max(color = 'orangered', axis = 1)\
.highlight_min(color = 'lightgreen', axis = 1)

In [None]:
# since there is only 1 value from medium density category, we are not considering it over here

ghg_from_consumption_industrial_kpi_per_sqkm = ghg_from_consumption[(ghg_from_consumption['Industrial building emissions - carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km']>0)&\
                                                          (ghg_from_consumption['density_category']!='medium density')]\
.groupby(['density_category'])[['Industrial building emissions - carbon intensity in metric tonnes of CO2e, for each million Btu of energy consumed, in every sq.km']]\
.describe(percentiles=[.50]).T\
.reset_index()\
.rename(columns={'level_0':'KPI','level_1':'stats'})

ghg_from_consumption_industrial_kpi_per_capita = ghg_from_consumption[(ghg_from_consumption['Industrial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita']>0)&\
                                                          (ghg_from_consumption['density_category']!='medium density')]\
.groupby(['density_category'])[['Industrial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita']]\
.describe(percentiles=[.50]).T\
.reset_index()\
.rename(columns={'level_0':'KPI','level_1':'stats'})


ghg_from_consumption_industrial_kpi_per_sqkm[ghg_from_consumption_industrial_kpi_per_sqkm['stats'].isin(['mean','50%'])]\
.style.highlight_max(color = 'orangered', axis = 1)

In [None]:
ghg_from_consumption_industrial_kpi_per_capita[ghg_from_consumption_industrial_kpi_per_capita['stats'].isin(['mean','50%'])]\
.style.highlight_max(color = 'orangered', axis = 1)

In [None]:
q_num = '8.1'
col_name = ['Coal', 
'Gas', 
'Oil', 
'Nuclear', 
'Hydro', 
'Biomass', 
'Wind', 
'Geothermal', 
'Solar',
'Other sources'            
]

row_name = ['Electricity source','Percent']

selected_cities = {}

for density in ['very high density','high density','medium density','low density']:
    selected_cities[density] = list(total_cities_response[(total_cities_response['density_category']==density)&\
                                                 (total_cities_response['Question Number']==q_num)&\
                                                 (total_cities_response['Column Name'].isin(col_name))&\
                                                 (total_cities_response['Row Name'].isin(row_name))&\
                                                 (total_cities_response['Year Reported to CDP'].isin([2020]))]\
                           [['Organization','Response Answer','density_category']]\
                           .replace({'Question not applicable':np.nan,'0':np.nan})\
                           .dropna()\
                           .groupby(['Organization'])\
                           .count()\
                           .sort_values(by='Organization',ascending=False)
                           .index.values)


## Direct GHG emission per capita, excluding generation because of grid-supplied energy

= total direct ghg emission by city/ city population, it's a widely used KPI and will be helpful to CDP as well.

## Direct GHG emission per sq.km

= total direct ghg emission by city/ city area.

GHG emissions per capita, when looked at from density point-of-view, shows medium density cities are ahead of the rest and low density cities are also at par with it.


In [None]:
q_num = '4.6a'
row_name = ['Total Emissions (excluding generation of grid-supplied energy)'
           ]

col_name = ['Direct emissions (metric tonnes CO2e)']


scope1_scope3_emissions = total_cities_response[
#     (total_cities_response['density_category']==density)&\
                                                 (total_cities_response['Question Number']==q_num)&\
                                                 (total_cities_response['Column Name'].isin(col_name))&\
                                                 (total_cities_response['Row Name'].isin(row_name))&\
                                                 (total_cities_response['Year Reported to CDP'].isin([2020]))]\
                           [['Organization','Response Answer','density_category','Column Name','Row Name']]\
                           .replace({'Question not applicable':np.nan,'0':np.nan})\
                           .dropna()\
                           .sort_values(by='Organization').pivot(index='Organization',columns='Row Name',values='Response Answer')\
                           .dropna().reset_index() 

scope1_scope3_emissions = scope1_scope3_emissions.merge(city_metadata,on='Organization',how='left')
scope1_scope3_emissions['Direct emissions per capita(metric tonnes CO2e),excluding generation of grid-supplied energy)'] = scope1_scope3_emissions['Total Emissions (excluding generation of grid-supplied energy)'].astype('float')/scope1_scope3_emissions['population'].astype('float')
scope1_scope3_emissions['Direct emissions per sq.km(metric tonnes CO2e),excluding generation of grid-supplied energy)'] = scope1_scope3_emissions['Total Emissions (excluding generation of grid-supplied energy)'].astype('float')/scope1_scope3_emissions['area in sq.km'].astype('float')
# scope1_scope3_emissions['Total Generation of grid-supplied energy per capita'] = scope1_scope3_emissions['Total Generation of grid-supplied energy'].astype('float')/scope1_scope3_emissions['population'].astype('float')


scope1_scope3_emissions.sort_values(by='Direct emissions per capita(metric tonnes CO2e),excluding generation of grid-supplied energy)',ascending=False).head(10)

In [None]:
scope1_scope3_emissions.groupby(['density_category'])['Direct emissions per capita(metric tonnes CO2e),excluding generation of grid-supplied energy)'].describe(percentiles=[.50])\
.style.apply(lambda x: ['None','None','None','None','background-color: lightyellow','None'],axis=1)\
.set_caption('Direct emissions per capita(metric tonnes CO2e),excluding generation of grid-supplied energy)')

## How can this total GHG emissions per capita help?

Let's look at the 'Asthma' data, we have from CDC's 500 city census dataset.Between GHG emissions per capita and percentage of asthma patients in the city population, we have a positive correlation, presented by the trendline. It's not a conclusive evidence for sure. However,below, we can observe some odd and coherent facts in the data, coming from two different sources.

For example, **city of Cleveland, at the top right corner of the graph, is having an unusually high GHG emissions per capita. 22.12 metric tonnes Co2e. This is what we have from CDP data.**

**At the same time, from CDC's census data, we can gather that, 12.5% of Cleveland's population suffer from asthma. That's 3.4% higher than the median of 9.1% and is a 95th percentile value. That's second highest among the cities who have responded with GHG emissions data to CDP.**

Same goes for Providence,Lousville,Pittsburgh.

So, GHG emission per capita could be an indicator of lot of underlying and brewing problems and health hazards.

In [None]:
scope1_scope3_emissions['estimated percentage of asthma patients'] = 0.0

for idx in scope1_scope3_emissions.index.values:
    org = scope1_scope3_emissions['Organization'].iloc[idx]
    state_id = scope1_scope3_emissions['state_id'].iloc[idx]
    perc_asthma_patients = (census_tract_data[(census_tract_data['PlaceName']==org)&(census_tract_data['StateAbbr']==state_id)]['CASTHMA_CrudePrev'].values[0]*\
                                     svi_data[svi_data['FIPS'].isin(census_tract_data[(census_tract_data['PlaceName']==org)&(census_tract_data['StateAbbr']==state_id)]['TractFIPS'].unique())]['E_TOTPOP']).values.sum()/svi_data[svi_data['FIPS'].isin(census_tract_data[(census_tract_data['PlaceName']==org)&(census_tract_data['StateAbbr']==state_id)]['TractFIPS'].unique())]['E_TOTPOP'].sum()
    scope1_scope3_emissions.at[idx,'estimated percentage of asthma patients']=perc_asthma_patients


scope1_scope3_emissions[['estimated percentage of asthma patients','Direct emissions per capita(metric tonnes CO2e),excluding generation of grid-supplied energy)']].describe(percentiles=[.5,.75,.95]).drop(['count','min','max','std'])\
.style.set_caption('Percentage of asthma patients vs Direct emissions per capita')

In [None]:
fig = px.scatter(scope1_scope3_emissions,y='estimated percentage of asthma patients',
    x='Direct emissions per capita(metric tonnes CO2e),excluding generation of grid-supplied energy)',
                 text='Organization',
    trendline="ols")

fig.update_layout(
        template="plotly_dark",
        title = dict(text='Percentage of asthma affected population and GHG emissions per capita',x=0.5,y=.97),
        height=300,
        width=1000,
        font_color="rgb(199,233,180)"
        )

fig.show()


Furthermore, this estimated percentage of asthma affected population is barely positively related with population, has good degree of positive correlation with direct emissions per capita. But it is negatively related with density! 

In [None]:
scope1_scope3_emissions[['estimated percentage of asthma patients','population','density','Direct emissions per capita(metric tonnes CO2e),excluding generation of grid-supplied energy)']].corr().iloc[0:1]

Intuitively, it may seem population in higher density areas are more susceptible to asthma. However,statistics is showing, lower density areas are expected to have slightly greater percentage of affected people , than rest. This actually is in line with per capita GHG emissions data in CDP, we saw previously, where low density cities were showing quite higher per capita GHG emissions. Although remember, these two data are coming from two different sources, yet pointing out a relatable information.  

### Estimated percentage of asthma patients in cities - density segment wise

In [None]:
scope1_scope3_emissions.groupby(['density_category'])['estimated percentage of asthma patients'].describe()

Here are the details of the low density cities for your reference.

In [None]:
scope1_scope3_emissions[scope1_scope3_emissions['density_category']=='low density']

# Source of Energy

Now that we have looked at the GHG emission from different sectors and it's relation with energy consumption,
let's look into the source of energy of the cities, especially, in the renewable energy section.

## Renewable Energy Percentage
**'Renewable Energy Percentage' can be a good KPI to understand a city's efforts to address GHG emission.**
Below summary, based on density-segments, shows **very high density cities are ahead** of cities from other segment by a good margin. ***This explains why in carbon intensity/energy consumption per capita KPIs, very high density segment was doing better than rest.***


On the other hand, in cities from ***low and medium density segments, use of gas and oil are considerably higher, compared to high density and very high density cities.***
This explains why, in **transport section's carbon intensity/energy consumption per capita KPI, low density segment showed much higher emission.**

At the same time, medium density cities on an average use 28% of energy generated from renewable sources, which is a little higher than half  of what high and very high density segments are using.



In [None]:
q_num = '8.1'
col_name = ['Coal', 
'Gas', 
'Oil', 
'Nuclear', 
'Hydro', 
'Biomass', 
'Wind', 
'Geothermal', 
'Solar',
'Other sources'            
]

row_name = ['Electricity source','Percent']

source_of_energy = total_cities_response[(total_cities_response['Question Number']==q_num)&\
                      (total_cities_response['Column Name'].isin(col_name))&\
                      (total_cities_response['Row Name'].isin(row_name))&\
                      (total_cities_response['Year Reported to CDP'].isin([2020]))]\
                           [['Organization','Response Answer','Row Name','Column Name']].fillna(0.0).pivot(index='Organization',values='Response Answer',columns='Column Name')

for col in source_of_energy.columns.values:
        source_of_energy[col] = source_of_energy[col].astype('float')

        
source_of_energy.reset_index(inplace=True)

source_of_energy.drop([6,32,40,45,48],inplace=True)


source_of_energy = source_of_energy.merge(city_metadata,on='Organization',how='left')

source_of_energy['Renewable Energy percentage'] = source_of_energy[['Biomass','Geothermal','Hydro','Nuclear','Solar','Wind']].sum(axis=1)

source_of_energy[source_of_energy['Renewable Energy percentage']!=0.0].groupby(['density_category'])[['Biomass', 'Geothermal', 'Hydro',
       'Nuclear', 'Solar', 'Wind','Other sources','Renewable Energy percentage']].mean()\
.style.highlight_max(color = 'lightgreen', axis = 0)\
.highlight_min(color = 'orangered', axis = 0).set_caption('Renewable energy percentage in total source of energy')

In [None]:
source_of_energy[source_of_energy['Renewable Energy percentage']!=0.0].groupby(['density_category'])[['Coal', 'Gas','Oil']].mean()\
.style.highlight_min(color = 'lightgreen', axis = 0)\
.highlight_max(color = 'orangered', axis = 0).set_caption('Non-renewable Energy percentage across density-segments')

In [None]:
source_of_energy.head(10).style.set_caption('Source of energy for cities')

# Evaluation of Carbon Intensity/energy consumption per capita KPIs
### Source of energy generation stats vs GHG generation from energy consumption stats 

Let's look at the individual city performances. In the table below, we will find values for 'Salt Lake City' and 'San Francisco'.However, below table shows, how the sector-wise carbon intensity/energy consumption, per capita KPI looks like, along with 'Renewable Energy percentage'. 

**A gap of 65% in renewable energy percentage of the whole enrgy production is reflecting in Carbon Intensity/energy consumption (CI/EC from here on) per capita KPIs. If you look at the individual sources of energy, you will further notice, that highest source of energy production for Salt lake City is coal, making up for 59%.On the other hand, San Francisco gets 18% of it's energy from nuclear sources.**

**This makes San Francisco a far more energy efficient city than Salt Lake city, inspite of having 10x population density.** ****

In [None]:
city_metadata[city_metadata['Organization'].isin(['Salt Lake City','San Francisco'])]

In [None]:
ghg_from_consumption[ghg_from_consumption['Organization'].isin(['Salt Lake City','San Francisco'])][[ 'Organization', 'Residential building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Commercial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Transport emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
]].set_index('Organization').T\
.style.highlight_min(color = 'lightgreen', axis = 1)\


In [None]:
source_of_energy[source_of_energy['Organization'].isin(['Salt Lake City','San Francisco'])][['Organization','Renewable Energy percentage','Biomass', 'Geothermal', 'Hydro',
       'Nuclear', 'Other sources', 'Solar', 'Wind']].set_index('Organization').T\
.style.highlight_max(color = 'lightgreen', axis = 1)\


In [None]:
source_of_energy[source_of_energy['Organization'].isin(['Salt Lake City','San Francisco'])][['Organization','Coal','Oil','Gas']].set_index('Organization').T\
.style.highlight_max(color = 'orangered', axis = 1)\

Let's do some PCA on the energy data and CI/EC per capita data, we have gathered till now and make a scatter plot based on principal components.

**For 1st principal component, features that explain variance in data, relate to usage of renewable energy, with maximum variance being explained by usage of Geothermal energy.**

**For 2nd principal component, it is usage of fossil fuel energy, with maximum variance being explained by usage of oil.**

This puts Seattle and Providence in two opposite corners, with Seattle being more energy efficient, inspite of both being very high density cities. It also create a cluster of cities with San Francisco,Oakland,Fremont etc., which are higher in renewable enregy usage and have less reliance on oil and gas.

In [None]:
source_of_energy[source_of_energy['Organization'].isin(['Seattle','Providence','San Francisco','Salt Lake City'])][['Organization','Renewable Energy percentage','Biomass', 'Geothermal', 'Hydro',
       'Nuclear', 'Other sources', 'Solar', 'Wind','Coal','Oil','Gas']].set_index('Organization').T\
.style.highlight_max(color = 'lightgreen', axis = 1)\
.set_caption('Energy source in percentages across some of the cities')

In [None]:
ghg_from_consumption

In [None]:
from sklearn.decomposition import PCA

pca = PCA(n_components=2)

data_to_cluster = ghg_from_consumption[[ 'Organization', 'Residential building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Commercial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Transport emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
]]\
.merge(source_of_energy[['Organization','Renewable Energy percentage','Biomass', 'Coal', 'Gas', 'Geothermal', 'Hydro',
       'Nuclear', 'Oil', 'Other sources', 'Solar', 'Wind']],on='Organization',how='left').set_index('Organization').fillna(0.0)


data_to_cluster.drop(['New York','Santa Monica'],inplace=True)

pca_values = pd.DataFrame(pca.fit_transform(normalize(data_to_cluster.values,axis=0)),columns=['pca_1','pca_2'])
pca_values['Organization'] = ['Long Beach', 'Cambridge', 'San Francisco', 'Chicago',
       'Philadelphia', 'Rochester', 'Washington', 'Somerville', 'Phoenix',
       'San Antonio', 'Grand Rapids', 'Cincinnati', 'Ann Arbor',
       'Lakewood', 'New Bedford', 'Louisville', 'West Palm Beach',
       'Anchorage', 'Nashville', 'Salt Lake City', 'Carmel', 'Fremont',
       'Hayward', 'Los Angeles', 'Oakland', 'Cleveland', 'Pittsburgh',
       'Denver', 'Boston', 'Minneapolis', 'Seattle', 'Providence',
       'San Leandro', 'Alameda']


fig = px.scatter(pca_values,y='pca_1',
    x='pca_2',
    text='Organization',         
    )


fig.update_layout(
        template="plotly_dark",
        title = dict(text='PCA of cities by energy source data and GHG generation from per capita energy consumption data',x=0.5,y=.97),
        xaxis_title="Principal component representing - Usage of oil & other fossil fuel energy - high to low -->",
        yaxis_title="Principal component representing - Usage of renewable energy",    
        height=500,
        width=1200,
        font_color="rgb(199,233,180)",
        shapes=[
                dict(
                  type= 'line',
                  yref= 'paper', y0= 0, y1= 0,
                  xref= 'x', x0= 0, x1= 0,
                 
                )
            ]
        )

fig.show()

This is how it looks for all other cities and Alameda city's official site kind of confirms, what they have reported below.

In [None]:
source_of_energy.sort_values(by='Renewable Energy percentage',ascending=False)[['density_category','Organization',
                                       'Renewable Energy percentage','Biomass', 'Coal', 'Gas', 'Geothermal', 'Hydro',
       'Nuclear', 'Oil', 'Other sources', 'Solar', 'Wind' ]].head(15)\
.style.set_caption('Source of energy for cities')

A simple heatmap based on correlation between CI/EC per capita KPIs and percentage of energy sources, shows, how negatively correlated they are with renewable sources.

In [None]:
ghg_consumption_renewability = ghg_from_consumption[['Organization','Residential building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Commercial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Industrial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Transport emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
]]\
.merge(source_of_energy[['Organization','Renewable Energy percentage','Biomass', 'Coal', 'Gas', 'Geothermal', 'Hydro',
       'Nuclear', 'Oil', 'Other sources', 'Solar', 'Wind']],on='Organization',how='left')


plt.figure(figsize=(10,10))
sns.heatmap(ghg_consumption_renewability[['Residential building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Commercial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Industrial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Transport emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
        'Renewable Energy percentage','Coal','Gas','Solar','Hydro'
                             ]].corr())

Now that we have worked with source of energy data of the cities. We will extend this work to water usage aspect of this. 


# Water usage in energy plants

Before we dive in to prepare the KPIs, let's understand the context first. Below picture from USGS shows water usage across different sectors, as per 2015 water census data. This shows the volume of water withdrawn and delivered to power plants for cooling purposes. 

![wss-wuse-pipe-diagram-2015.png](https://prd-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/styles/full_width/public/thumbnails/image/wss-wuse-pipe-diagram-2015.png)

## Why is this crucial?

Above picture shows that 40% of the entire water withdrawn in USA goes to energy plants, mostly for cooling needs. This tells us that , how large the water requirement is for this sector.

Apart from surface water, a good share of this water is groundwater as well. Cities are crucially dependent on groundwaters. Also, sources of surface water are also often same for cities and power plants. Huge water withdrawal from these sources can actually become crucial in future, where groundwater for most of the cities, around the world are depleting fast and other surface based sources are also drying up. 

## What can we gain out of it?

Cooling needs for energy plants varies,based on two factors.
1. Fuel used.
2. Reusability of water in the energy plant.
3. Size of the plant.


In below section we will see, how renewable energy sources, have less water needs and if plants are replaced with renewable sources or if cities switches to a renewable energy source, it can result in sigificant amount of water saving, which can be utilized to meet other needs in future.

So, we will try to come up with a KPI, which can estimate, how much excess water is being used from surrounding water sources, that can be saved if energy generation is switched to a renewable source.

## How do we do it?

Cities are powered by plants, that differ in their water requirement. We will first calculate, 'water withdrawal intensity (m3/MWh)'. This will tell us, how much cubic meters of water is withdrawn for the plant to produce 1 megawatt-hour of power. 

We have the ratio of non-renewable energy source usage from CDP data, that we used in the previous section.
We multiply this ratio with water withdrawal intensity to get, for 1 megawatt-hour of power production, how much water is being used for non-renewable source based power production. This amount of water withdrawal can be reduced. We convert this to litres/MWh and name this - 

## Estimated reducable water withdrawal (cubic meters/MWh) KPI

## What else can be done?

We will further check from the water census data of 2015, how much data is available per capita at each city. Here, the census data is at county level. We will fetch only few of them and try to understand , by how much the above reducable water can increase the water availability.

Below we can see, how based on source of energy, **water withdrawal intensity, measured in cubic meter of water withdrawn for every megawatts/h of power production,** varies considerably. Coal, oil and gas based power generation shows considerably higher water requirement.**Especially for petroleum based power generation, it's much higher than rest. On the other hand, for biomass, solar, nuclear, water requirement is considerably lower.** For wind based power generation, there was no data for water withdrawal. It perhaps is safe to assume that, there is no requirement for water. If so,it can be most helpful for coastal cities and we have lot of them.



In [None]:
def change_format(x):
#     print(x)
    return pd.to_numeric(x.replace(' ','').replace(',',''))


cooling_summary['Water Withdrawal Volume (Million Gallons)'] = cooling_summary['Water Withdrawal Volume (Million Gallons)'].apply(lambda x: change_format(x))
cooling_summary['Net Generation (MWh)'] = cooling_summary['Net Generation from Steam Turbines (MWh)'].apply(lambda x: change_format(x))

water_withdrawal_data = cooling_summary.groupby(['State','Generator Primary Technology'])[['Water Withdrawal Volume (Million Gallons)','Net Generation (MWh)']].sum().reset_index()
water_withdrawal_data['water withdrawal intensity (m3/MWh)'] = 3785.41178*(water_withdrawal_data['Water Withdrawal Volume (Million Gallons)']/water_withdrawal_data['Net Generation (MWh)']).replace(np.inf,0.0)

temp = water_withdrawal_data.groupby(['Generator Primary Technology'])['water withdrawal intensity (m3/MWh)'].describe(percentiles=[0.5]).reset_index().rename(columns={'50%':'water withdrawal intensity (m3/MWh)'})
temp[['Generator Primary Technology','water withdrawal intensity (m3/MWh)']].style.set_caption('Median water withdrawal intensity (m3/MWh) for different energy sources').background_gradient(cmap=cm,axis=0)

In [None]:
water_withdrawal_data.head(15).style.set_caption('Water withdrawal data - state & plant wise')

This is how the values for KPI - 'Estimated reducable water withdrawal (m3/MWh)' looks.
For Cambridge , we have a value of 27367.66 litres/MWh. This means for every MWh of power the city uses currently, maximum 27367.66 litres of water saving can be done by switching to renewable energy sources. We used the word maximum here, considering the fact that renewable energy sources would also have certain water requirement.

In [None]:
# estimate reduction in water withdrawal for cooling needs of energy plant

# get the state and source of non-renewable energy : coal,gas,oil and the share of production  
source_of_energy['Estimated reducable water withdrawal (m3/MWh)'] = 0.0
source_of_energy['Net water withdrawal intensity (m3/MWh)'] = 0.0

for idx in source_of_energy.index.values:
    
    net_water_withdrawal_intensity_per_MWh = 0.0
    org = source_of_energy['Organization'].iloc[idx]
    state_id = source_of_energy['state_id'].iloc[idx]
    power_ratio = source_of_energy[['Coal','Oil','Gas','Solar','Nuclear','Geothermal','Biomass','Other sources']].iloc[idx]/source_of_energy[['Coal','Oil','Gas','Solar','Nuclear','Geothermal','Biomass','Other sources']].iloc[idx].sum()

    net_water_withdrawal_intensity_per_MWh = 3785.41178*water_withdrawal_data[water_withdrawal_data['State']==state_id]['Water Withdrawal Volume (Million Gallons)'].sum()/water_withdrawal_data[water_withdrawal_data['State']==state_id]['Net Generation (MWh)'].sum()
    net_water_withdrawal_intensity_per_MWh = 0.0 if np.isnan(net_water_withdrawal_intensity_per_MWh) else net_water_withdrawal_intensity_per_MWh
    
    net_water_withdrawal_intensity_per_MWh_for_coal = net_water_withdrawal_intensity_per_MWh*power_ratio['Coal']
    net_water_withdrawal_intensity_per_MWh_for_gas = net_water_withdrawal_intensity_per_MWh*power_ratio['Gas']
    net_water_withdrawal_intensity_per_MWh_for_oil = net_water_withdrawal_intensity_per_MWh*power_ratio['Oil']
    
    source_of_energy.at[idx,'Net water withdrawal intensity (m3/MWh)'] = net_water_withdrawal_intensity_per_MWh
    source_of_energy.at[idx,'Estimated reducable water withdrawal (m3/MWh)'] = net_water_withdrawal_intensity_per_MWh_for_coal+net_water_withdrawal_intensity_per_MWh_for_oil+net_water_withdrawal_intensity_per_MWh_for_gas

source_of_energy['Estimated reducable water withdrawal (litres/MWh)'] = 0.0    
source_of_energy['Estimated reducable water withdrawal (litres/MWh)'] = source_of_energy['Estimated reducable water withdrawal (m3/MWh)']*1000

source_of_energy[source_of_energy['Estimated reducable water withdrawal (litres/MWh)']>0][['Organization', 
       'state_id','Oil','Coal','Gas',
       'density_category', 'Renewable Energy percentage',
       'Estimated reducable water withdrawal (m3/MWh)',
       'Net water withdrawal intensity (m3/MWh)',
       'Estimated reducable water withdrawal (litres/MWh)',
       ]].head(8)

Now, to understand , how this KPI can help, we will look at water census data of 2015. This census data is at county level and hence it will give us more accurate values for cities per capita water availability back in 2015. Population value of 2015 is included in census data.

In [None]:


def get_name(x):
#     print(x)
    x = x.lower()
    for w in x.split():
        if w in ['municipality','&','municipio','district','council','city','county','govern','commune','of','the','census','area','borough',]:
            x = x.replace(w,'')
            
    x = x.rstrip().lstrip()
    try:
        return ' '.join([w[0].upper()+w[1:] for w in x.split(' ')])
    except Exception as e:
        return x

water_usage_data['COUNTY'] = water_usage_data['COUNTY'].apply(lambda x: get_name(x))
water_usage_data.rename(columns={'COUNTY':'Organization','STATE':'state_id'},inplace=True)

source_of_energy = source_of_energy.merge(water_usage_data[['Organization','state_id','TP-TotPop','DO-WDelv ','PS-Wtotl','IN-Wtotl','LI-WFrTo']],on=['Organization','state_id'],how='left').fillna(0.0)
source_of_energy['Approx. annual water availability per capita 2015 (m3)'] = (((3785.41178*(source_of_energy['DO-WDelv ']+source_of_energy['PS-Wtotl']+source_of_energy['IN-Wtotl']+source_of_energy['LI-WFrTo']))/(source_of_energy['TP-TotPop']*1000))*365).fillna(0.0)
water_availability = source_of_energy[source_of_energy['Approx. annual water availability per capita 2015 (m3)']>0.0][['Organization','state_id','density_category','Approx. annual water availability per capita 2015 (m3)','Estimated reducable water withdrawal (m3/MWh)']]
water_availability[['Organization','state_id','density_category','Approx. annual water availability per capita 2015 (m3)']]\
.style.set_caption('Annual water avilability as per water census data 2015')

Let's look at the water issues reported by cities along with annual water availability and estimated reducable water volume. We can see almost all cities are reporting increased water stress and Dallas and Sacremento have reported drought situations.

In the next table an overall summary of reported water related issues for US cities have been shown in every density segment.

In [None]:
q_num='14.2a'
col_name = ['Water security risk drivers']

temp = pd.pivot_table(total_cities_response[(total_cities_response['Organization'].isin(['Alameda', 'Anchorage', 'Austin', 'Dallas', 'Denver', 'Honolulu',
       'Los Angeles', 'Philadelphia', 'Providence', 'Sacramento',
       'San Francisco']))&\
                                                 (total_cities_response['Question Number']==q_num)&\
                                                 (total_cities_response['Column Name'].isin(col_name))&\
#                                                  (total_cities_response['Row Name'].isin(row_name))&\
                                                 (total_cities_response['Year Reported to CDP'].isin([2020]))]\
                           [['Organization','Response Answer','density_category']],
index='Organization',columns='Response Answer',aggfunc='count',fill_value=0)\
.reset_index()
temp.columns = ['Organization','Declining water quality',
        'Drought',
        'Energy supply issues',
        'Inadequate or ageing water supply infrastructure',
        'Increased water demand',
        'Increased water stress',
        'Question not applicable']

temp.merge(source_of_energy[source_of_energy['Approx. annual water availability per capita 2015 (m3)']>0.0][['Organization','state_id','Approx. annual water availability per capita 2015 (m3)','Estimated reducable water withdrawal (m3/MWh)']],on=['Organization'],how='left')\
.style.set_caption('Water related issues present in corresponding city')

In [None]:
temp2 = total_cities_response[
                                                 (total_cities_response['Question Number']==q_num)&\
                                                 (total_cities_response['Column Name'].isin(col_name))&\
#                                                  (total_cities_response['Row Name'].isin(row_name))&\
                                                 (total_cities_response['Year Reported to CDP'].isin([2020]))]\
                           [['Response Answer','density_category']]

# temp2 = pd.crosstab(temp2['Response Answer'],temp2['density_category']).style.set_caption('test')
pd.crosstab(temp2['Response Answer'],temp2['density_category'])


We will try to see now, how the previously calculated 'Estimated reducable water withdrawal (m3/MWh)' KPI can help. For Dallas, we have the following values:

**population** : 1345047.0

**Annual energy consumption** : 720000 MWh (source : offcial website of City of Dallas Office of Environmental Quality & Sustainability http://greendallas.net/energy/city-energy/#:~:text=The%20City%20of%20Dallas%20uses,year%20(Source%3A%20EPA) )


**Estimated reducable water withdrawal (m3/MWh)** : 104.259507

**Approx. annual water availability per capita 2015 (m3)** : 303.392441

So, **maximum increment in per capita water availability, by reducing water requirement** is :

(Annual energy consumption * Estimated reducable water withdrawal (m3/MWh)) / population = 55.81

**How much increment is this, on top of 2015's water availability value, as a  percentage**: (55.81/303.392441) * 100 = 18.4% 

This has not only potential to solve water availability problems of the cities to good extent, it can also help cities to recover depleting groundwater levels even. Most importantly, All this can be done. only by changing our way of energy generation.

**But do note, in this discussion, we are not looking at the water stress level in and around the city, as the approach towards water usage cannot be reactive anymore. Our cities should be more proactive and water conserving in action, instead of looking at how much water is available in natural reserves.**

## Long-term Environment Risk Indices (LERI)

So, now we have some KPIs with us on cities, that tells us the following :

1. Residential building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita
2. Commercial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita
3. Transport emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita
4. Direct emissions per capita(metric tonnes CO2e),excluding generation of grid-supplied energy) 
5. Estimated reducable water withdrawal (m3/MWh)

Emissions KPIs tell us, how bad is the energy emission scenario in the city. Reducable water withdrawal KPI tells us, how much excess water is being withdrawn currently.

So, can we combine these informations into one index to present the overall scenario, that will tell, how adversely the air of the cities and water reserves are being affected? Since, these damages to environment are unlikely to be resolved in shorter period of time and can have long-term effects on environment, we recognize this as Long Term Environment Risk.

For both, emissions KPI and reducable water KPI, higher value is worse.

So, We create 'Long-term Environment Risk  1' in following way : 

Normalize the variable and then

**<u>Long-term Environment Risk  1  = Square root of ((Residential building emissions per capita KPI)^2 + (Commercial building emissions per capita KPI)^2 + (Transport emissions per capita KPI)^2 + (Estimated reducable water withdrawal(cubic meters/MWh))^2)</u>**

This will basicall give the length of the vector or L2 -distance from origin in 4 dimensional vector space.

But we cannot visualize this. So, only for convinience of visualization, we will create 

**<u>Long-term Environment Risk  2  = Square root of ((Direct emissions per capita(metric tonnes CO2e)^2 + (Estimated reducable water withdrawal(cubic meters/MWh))^2))</u>**


In [None]:
aei1 = source_of_energy[['Organization','state_id','density_category','Estimated reducable water withdrawal (m3/MWh)']].merge(ghg_from_consumption[['Organization','state_id',
                                              'Residential building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Commercial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Transport emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita'                                
                                                                              ]],on=['Organization','state_id'],how='left').dropna()

aei2 = source_of_energy[['Organization','state_id','density_category','Estimated reducable water withdrawal (m3/MWh)']].merge(scope1_scope3_emissions[['Organization','state_id',
                                                                     'Direct emissions per capita(metric tonnes CO2e),excluding generation of grid-supplied energy)'
                                                                              ]],on=['Organization','state_id'],how='left').dropna()


aei1['Long-term Environment Risk  1'] = ((normalize(aei1[['Estimated reducable water withdrawal (m3/MWh)','Residential building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Commercial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Transport emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita']],norm='max',axis=0))**2).sum(axis=1)**0.5

aei1[['Estimated reducable water withdrawal (m3/MWh)','Residential building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Commercial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Transport emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita']] = normalize(aei1[['Estimated reducable water withdrawal (m3/MWh)','Residential building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Commercial building emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita',
       'Transport emissions - carbon intensity in metric tonnes of CO2e generated, for each million Btu of energy consumed, per capita']],norm='max',axis=0)


aei2['Long-term Environment Risk 2'] = ((normalize(aei2[['Estimated reducable water withdrawal (m3/MWh)','Direct emissions per capita(metric tonnes CO2e),excluding generation of grid-supplied energy)']],norm='max',axis=0))**2).sum(axis=1)**0.5
aei2[['Estimated reducable water withdrawal (m3/MWh)','Direct emissions per capita(metric tonnes CO2e),excluding generation of grid-supplied energy)']] = (normalize(aei2[['Estimated reducable water withdrawal (m3/MWh)','Direct emissions per capita(metric tonnes CO2e),excluding generation of grid-supplied energy)']],norm='max',axis=0))



fig = px.scatter(aei2[aei2['Estimated reducable water withdrawal (m3/MWh)']>0],y='Estimated reducable water withdrawal (m3/MWh)',
    x='Direct emissions per capita(metric tonnes CO2e),excluding generation of grid-supplied energy)',
                 text='Organization',
#                  color='state_id'
                 
    )

fig.update_layout(
        template="plotly_dark",
        title = dict(text='Long-term Environment Risk  2',x=0.5,y=.97),
        height=600,
        width=1000,
        
        font_color="rgb(199,233,180)",
        shapes=[
            dict(opacity=0.3,line_dash='dot',fillcolor='rgb(199,233,180)',
              type= 'rect',
              y0= 0, y1= 0.5,   # adding a horizontal line at Y = 1
              x0= 0, x1= 0.5
                 ),
            dict(opacity=0.3,line_dash='dot',fillcolor='rgb(215,48,39)',
              type= 'rect',
              y0= 0.5, y1= 1,   # adding a horizontal line at Y = 1
              x0= 0.5, x1= 1
                 ),
            ]
        )

fig.show()


Plot above, shows where the cities are and also it has been separated into 4 quadrants. The bottom-left green quadrant shows the cities that are less adverse toward the environment and upper-right red quadrant shows cities are highly adverse toward environment.

# Socio-Environmental Risk monitor

While Long-term Environment Risk (LERI) is capable of showing how bad or friendly a city is towards it's environment, it shows only half of the picture. CDC's Social Vulnerability Index has the other half. As CDC mentions, Social Vulnerability Index measures how well prepared a community is to recover from any natural disaster or hazard. However, risks defined by both of these KPIs are long term in nature and cannot be changed soon. So, we combine Long-term Environment Risk  and SVI together to create Socio-Environmental Risk Monitor.

## Why do we need it?

A city that has high adverserial score, yet well prepared to cope up with climate hazards may not require immediate attention. However, a city that has high vulnerability according to SVI and has high adverserial score in LERI, is subjecting it's population to a greater disaster,as the population in these areas has lesser chances of recovering from it.

Socio-Environmental Risk Monitor essentially can give us an excellent outlook on both social risk front and environmental risk front.

## How do we calculate it?

We have LERI score at city level. However, SVI works differently. SVI is created by considering ,
1. Soci-economic status
2. Household composition & Disability
3. Minority status and language
4. Housing type & transportation

and it's scores are sum of all the flagged up values, that has individual KPI scores above 90th percentile.
The values are also at CDC's census FIPS level. In this notebook, we consider FIPS level SVI scores present within a city and city's total population to get a derived SVI score at city level. We use this derived SVI score with LERI 1 score.

<u>**derived SVI score =  (SVI score at city's relevant FIPS level * FIPS population)/city's total population**</u>

<u>**Socio-Environmental Risk Monitor = Square root of ((Derived SVI score)^2 + (LERI 1 score)^2))**</u>


In [None]:
aei1.reset_index(drop=True,inplace=True)
aei1['derived SVI score'] = 0.0

for idx in aei1.index.values:
#     print(idx)
    org = aei1['Organization'].iloc[idx]
    state_id = aei1['state_id'].iloc[idx]
    derived_svi_score = (svi_data[svi_data['FIPS'].isin(census_tract_data[(census_tract_data['PlaceName']==org)&(census_tract_data['StateAbbr']==state_id)]['TractFIPS'].unique())&(svi_data['F_TOTAL']>-1)]['F_TOTAL']*\
                                     svi_data[svi_data['FIPS'].isin(census_tract_data[(census_tract_data['PlaceName']==org)&(census_tract_data['StateAbbr']==state_id)]['TractFIPS'].unique())&(svi_data['F_TOTAL']>-1)]['E_TOTPOP']).values.sum()/svi_data[svi_data['FIPS'].isin(census_tract_data[(census_tract_data['PlaceName']==org)&(census_tract_data['StateAbbr']==state_id)]['TractFIPS'].unique())&(svi_data['F_TOTAL']>-1)]['E_TOTPOP'].sum()
    aei1.at[idx,'derived SVI score'] = derived_svi_score

# aei1[['Long-term Environment Risk  1','derived SVI score']] = normalize(aei1[['Long-term Environment Risk  1','derived SVI score']],axis=0)    
aei1['Socio-Environmental Risk Monitor'] = ((normalize(aei1[['Long-term Environment Risk  1','derived SVI score']],axis=0)**2).sum(axis=1))**0.5
fig = px.scatter(aei1,y='derived SVI score',
    x='Long-term Environment Risk  1',
                 text='Organization',
#                  color='state_id'
                 
    )

fig.update_layout(
        template="plotly_dark",
        title = dict(text='Socio-Environmental Risk Monitor',x=0.5,y=.97),
        height=600,
        width=1000,
        
        font_color="rgb(199,233,180)",
        shapes=[
            dict(opacity=0.3,line_dash='dot',fillcolor='rgb(199,233,180)',
              type= 'rect',
              y0= 0, y1= 1.88,   # adding a horizontal line at Y = 1
              x0= 0, x1= 0.73
                 ),
            dict(opacity=0.3,line_dash='dot',fillcolor='rgb(215,48,39)',
              type= 'rect',
              y0= 1.88, y1= 3.97,   # adding a horizontal line at Y = 1
              x0= 0.73, x1= 1.41
                 ),
            ]
        )

fig.show()


So the plot above shows where cities stand in terms of Socio-Environmental Risk Monitor. **<u>This value can help policy makers and administrators to identify high risk cities in red-zone and prioritize actions on them.</u>**
    

The bottom-left green quadrant shows the cities having less than the 50th percentile values in both factors. These cities have low vulnerability and low Long-term Environment Risk on environment. 
    

**<u>The upper-right red quadrant shows cities which have high vulnerability and high Long-term Environment Risk . These cities have higher chances of facing climate hazards and climate hazard related social issues and yet have a population, that has relatively less strength and capability to recover. This can be driven by unemployment, higher percentage of aging population and below poverty level population.</u>**

In [None]:
aei1[['density_category','Organization','Socio-Environmental Risk Monitor','derived SVI score','Long-term Environment Risk  1']].sort_values(by='Socio-Environmental Risk Monitor',ascending=False).head(10)\
.style.set_caption('Socio-Environmental Risk Monitor')

In [None]:
temp = aei1[['density_category','Organization','Socio-Environmental Risk Monitor']]\
.groupby(['density_category'])['Socio-Environmental Risk Monitor'].describe().reset_index().sort_values(by='mean')

fig = px.bar(temp, x='density_category', y='mean')
fig.update_layout(title_text='Average Socio-Environmental Risk monitor value across density segments ',
        height=400,
        width=900,)

fig.show()
temp[['density_category','mean','50%']].style.set_caption('Socio-Environmental Risk monitor value across density segments')

Stats above shows, very high density cities are having a higher degree of socio-environmental risk than other segments. Risk in low density cities are no different from that of  high density cities.

#  Imminent Environmental Risk KPI 

Imminent Environmental Risk measures the climate related hazard reported by the city, using frequency of the hazard, happening at all other places, magnitude of hazard and impact of the hazard.


Imminent Environmental Risk = 

sum of(Frequency of the hazard  *  Magnitude of the hazard  *  impact of the hazard) for all hazards

It's important to segregate imminent or acute environmental risk from long-term or chronic environmental risk, because of the way it affects the population.

In [None]:
# let's build a common framework
# we need a question number - against year
# we need the country - US/UK
location_values = cities_response[['lat','long','Organization']].drop_duplicates()

data = cities_response[(((cities_response['Question Number']=='2.1')&(cities_response['Year Reported to CDP'].isin([2019,2020])))
               |((cities_response['Question Number']=='2.2a')&(cities_response['Year Reported to CDP']==2018)))
                &(cities_response['Column Name']=='Climate Hazards')
               ]\
[['Response Answer','CDP Region','Organization','Year Reported to CDP','Country','Column Name','Row Number']].drop_duplicates().dropna().reset_index(drop=True)

data['Sub-Response'] = data['Response Answer'].apply(lambda x: x.split('>')[1] if '>' in x else ' None')
data['Response Answer'] = data['Response Answer'].apply(lambda x: x.split('>')[0] if '>' in x else x)

temp = data[['Response Answer','Sub-Response']].groupby(['Response Answer'])['Sub-Response'].value_counts()
temp = pd.DataFrame(temp).rename(columns={'Sub-Response':'Count'}).reset_index()
temp['Sub-Response'] = temp['Sub-Response'].apply(lambda x: x[1:]) 

common = list(set(temp['Response Answer'].values).intersection(set(temp['Sub-Response'].values)))

hazard_mapping = {}
for hazard in common:
#     print(hazard,'  ',temp[temp['Sub-Response']==hazard]['Response Answer'].values[0])
    hazard_mapping[hazard] = temp[temp['Sub-Response']==hazard]['Response Answer'].values[0]
    
data['Response Answer'].replace(hazard_mapping,inplace=True)
climate_hazard = data

hazard_probability = cities_response[(((cities_response['Question Number']=='2.1')&(cities_response['Year Reported to CDP'].isin([2020])))
               )
                &(cities_response['Column Name'].isin(['Current probability of hazard']))
               ]\
[['Response Answer','CDP Region','Organization','Year Reported to CDP','Country','Column Name','Row Number']].sort_values(by='Year Reported to CDP',ascending=False).drop_duplicates().dropna().reset_index(drop=True)

hazard_magnitude = cities_response[(((cities_response['Question Number']=='2.1')&(cities_response['Year Reported to CDP'].isin([2020])))
               )
                &(cities_response['Column Name'].isin(['Current magnitude of hazard']))
               ]\
[['Response Answer','CDP Region','Organization','Year Reported to CDP','Country','Column Name','Row Number']].sort_values(by='Year Reported to CDP',ascending=False).drop_duplicates().dropna().reset_index(drop=True)



climate_hazard_info = pd.pivot_table(climate_hazard[['Year Reported to CDP','CDP Region','Country','Organization','Response Answer','Column Name']].drop_duplicates(subset=['Year Reported to CDP','CDP Region','Country','Organization','Response Answer']),
                                     index=['Year Reported to CDP','CDP Region','Country','Organization'],
                                     columns='Response Answer',aggfunc='count',fill_value=0.0)
climate_hazard_info.reset_index(inplace=True)
climate_hazard_info.columns = ['Year Reported to CDP','CDP Region','Country',
       'Organization','Biological hazards ',
       'Chemical change ',
       'Extreme Precipitation ',
       'Extreme cold temperature ',
       'Extreme hot temperature ',
       'Flood and sea level rise ',
       'Mass movement ',
       'Storm and wind ',
       'Water Scarcity ', 'Wild fire ']

# climate_hazard_info.drop(['No Answer'],axis=1,inplace=True)
CDP_enlisted_hazards = list(climate_hazard['Response Answer'].unique())


climate_hazard_info['Total no. of hazards faced'] = climate_hazard_info[CDP_enlisted_hazards].sum(axis=1)
climate_hazard_info['percentage of CDP enlisted climate hazards faced'] = np.round(100*(climate_hazard_info['Total no. of hazards faced']/len(climate_hazard['Response Answer'].unique())),4)


# climate_hazard_summary = get_regional_summary(climate_hazard_info,'percentage of CDP enlisted climate hazards faced')

climate_hazard_info = climate_hazard_info.merge(location_values,on='Organization',how='left')



climate_hazard_last_status = climate_hazard_info.sort_values(by=['Year Reported to CDP','Organization']).drop_duplicates(subset=['Organization'],keep='last')

hazards_temp = (climate_hazard_last_status[CDP_enlisted_hazards].sum(axis=0)/climate_hazard_last_status.shape[0]).sort_values(ascending=False)*100
hazards_temp_trace = go.Bar(x=hazards_temp.index.values, y=hazards_temp.values,
             showlegend=True,text=np.round(hazards_temp.values,2),textposition='auto'
             )

climate_hazard = climate_hazard.rename(columns={'Response Answer':'Hazard'})\
.merge(hazard_probability.rename(columns={'Response Answer':'Hazard Probability'}),on=['Organization','Row Number'],how='left')\
.merge(hazard_magnitude.rename(columns={'Response Answer':'Hazard Magnitude'}),on=['Organization','Row Number'],how='left')\
.drop(['CDP Region_x','CDP Region_y','Country_y','Country_x','Year Reported to CDP_y','Year Reported to CDP_x','Column Name','Column Name_x','Column Name_y'],axis=1)

climate_hazard.replace(dict(hazards_temp/100),inplace=True)
climate_hazard.replace({'Medium High':0.8, 'Medium':0.6, 'High':1, 'Medium Low':0.4, 'Low':0.2,
       'Does not currently impact the city':0.0, 'Do not know':0.0},inplace=True)

climate_hazard.replace({'Medium High':0.8, 'Medium':0.6, 'High':1, 'Medium Low':0.4, 'Low':0.2,
       'Does not currently impact the city':0.0, 'Do not know':0.0},inplace=True)

climate_hazard['Imminent Risk indicator'] = climate_hazard['Hazard']*climate_hazard['Hazard Magnitude']*climate_hazard['Hazard Probability']
climate_hazard = climate_hazard.groupby(['Organization'])['Imminent Risk indicator'].sum().reset_index()
climate_hazard = climate_hazard.merge(location_values,on='Organization',how='left')
climate_hazard['text'] = climate_hazard[['Organization','Imminent Risk indicator']].\
apply(lambda x: 'Organization : '+str(x[0])+', Severity: '+str(x[1]),axis=1)

smb = go.Scattermapbox(name='Severity map',
        lon = climate_hazard['long'],
        lat = climate_hazard['lat'],
        text = climate_hazard["Imminent Risk indicator"],
        mode = 'markers',
#         locationmode='USA-states',
        hovertext=climate_hazard['text'],
        marker = dict(
#             sizemin = 5,
#             sizemode='area',
            size = 10,
            opacity = 0.8,
            reversescale = False,
            autocolorscale = False,
            colorscale = 'Reds',
            cmin = 0,
            color = climate_hazard["Imminent Risk indicator"],
            cmax = climate_hazard["Imminent Risk indicator"].max(),
            colorbar_title="Severity scale",
            colorbar_thickness=10,
            colorbar_title_side='right',
            colorbar_len=.3,
            colorbar_xanchor='left',
            colorbar_yanchor='bottom',
            colorbar_y=0.7
        ))

fig = make_subplots(
    rows=2, cols=2,
    specs=[
           [{"type": "scattermapbox",'colspan':2},None],
#            [{"type": "histogram2d"}],
           [{"type": "bar",'colspan':2},None],

          ],
    subplot_titles=['Imminent Environmental Risk map',
                    'Percentage of cities facing the climate hazard',
                    
                    ],
#     y_title='Exploration of climate hazards across the world'.upper(),
    vertical_spacing=0.14,
    row_heights=[0.75,0.25],
        
#     column_widths=[1,1,1]
    
                    )

fig.add_trace(smb,row=1,col=1)
fig.update_layout(
        mapbox_style="carto-positron",
#         title = dict(text='% of CDP enlisted services affected',x=0.5,y=.97),
        height=900,
        width=800,
        hovermode='closest',
        mapbox=dict(
        bearing=0,
        center=go.layout.mapbox.Center(
            lat=35,
            lon=-95
        ),
        pitch=0,
        zoom=3
    )
        )

fig.update_layout(
    template="plotly_dark",
    margin=dict(r=25, t=25, b=140, l=100),
    showlegend=False,
    bargroupgap=0.25,
    bargap=0.25
)


fig.append_trace(hazards_temp_trace,row=2,col=1)

fig.update_traces(opacity=0.65)

fig.update_layout(font_color="rgb(199,233,180)",font_family='Arial')

fig.show()

Now that we have explored the risk, let's see the response of the cities in terms of adaptation.

# ADAPTATION

#### Adaptation KPIs sums up the adaptation efforts being made, by the city. It individually measures different aspects of these efforts. We sum up all of these KPIs in an index at the end, to give a high level overview of these efforts by the cities.

In this section we depeloped smaller KPIs based on maximum and detailed use of CDP data only. KPI scores are calculated using custom assigned weightages, shown below. 

KPIs are calculated in three ways. 

1. Count :  No. of parameters reported on the corresponding section, by the city.
2. Sum : Sum of weightages against the reported parameters in the corresponding section, by the city.
3. Ratio :  Sum of weightages/maximum possible value. For example, 

status score of the projects = sum of weightages provided against all the status/(No. of projects * weightage of final status)

This tells us, to what extent projects have progressed in a normalized scale.

Overall , We considered and calculated following :

1. **'actions_taken_score'** : Count of actions taken 
2. '**status_of_action_score'**:  sum of weightages provided against all the status/(No. of projects * weightage of final status)

Current overall status of all the projects. Calculated in 'Ratio' way.

3. **'co_benefit_score'**: Count of co-beneficiaries involved in these actions
4. **'sector_coverage_score'**: We assigned weights to each sector, based on importance. Such as water, energy and transport has higher weightage than others. Calculated as sum of weightages.

5. **'finance_status_score'**: sum of weightages provided against all the finance status/(No. of projects * weightage of final status)

Status of financing of the projects. Calculated in 'Ratio' way.

6. **'funding_diversity_score'**: Count of funding sources
7. **'total_funding'**: Total funding as disclosed by city.
8. **'total_govt_funding'**:  Total govt. funding as disclosed by city.
9. **'funding_reliability_strength'**: total_govt_funding/total_funding
10. **'funding_dependency_score'**: Total cost provided by the majority funding source/total funding

11. **'funding_to_per_capita_income_ratio_per_sqkm'**: Funding per sq.km of the city/Income per sq.km
12. **'funding_to_per_capita_income_ratio_per_head'**: per capita funding/per capita income

Per capita income and per sq.km income data collected from SVI data, based on FIPS, present within a city.


13. **'adaptation_favorability_score'** : This score shows the level of favorability, city faces, in terms of different factors, that are reported either to be challenging them or supporting them. 

In [None]:
weightage.head(20).style.set_caption('Weightage to adaptation related values')

In [None]:
def get_score(type_,sub_type,section,score_type='ratio'):
    
    var_ = '_'.join(section.split(' '))
#     print(globals()[var_])
        
    values = []
    for val in globals()[var_]:
        try:
            values.append(weightage[(weightage['Type']==type_)&(weightage['Sub-type']==sub_type)\
                                       &(weightage['Section']==section)&(weightage['Parameter'].isin([val]))]['Weightage'].values[0])
        except Exception as e:
            pass
#     print(values)
    if score_type=='ratio':
        score = np.sum(values)/(len(values)*weightage[(weightage['Type']==type_)&(weightage['Sub-type']==sub_type)\
                                   &(weightage['Section']==section)]['Weightage'].max())
    if score_type=='sum':
        score = np.sum(values)
    
    return score


def get_count(q_num,years,col_name,org):
    
    return cities_response[((cities_response['Question Number']==q_num)&(cities_response['Year Reported to CDP'].isin(years)))
                      &(cities_response['Column Name']==col_name)&(cities_response['Organization']==org)
                      ].sort_values(by=['Year Reported to CDP','Column Number','Row Number'],ascending=False)[['Response Answer','Row Number']].drop_duplicates(keep='first')\
    ['Response Answer'].dropna().unique().shape[0]


def get_values(q_num,years,col_name,org):
    
    return cities_response[((cities_response['Question Number']==q_num)&(cities_response['Year Reported to CDP'].isin(years)))
                      &(cities_response['Column Name']==col_name)&(cities_response['Organization']==org)].sort_values(by=['Year Reported to CDP','Column Number','Row Number'],ascending=False)[['Response Answer','Row Number']].drop_duplicates(keep='first')['Response Answer'].dropna().values



adaptation_action_kpi_recorder = {}
kpis =  ['actions_taken_score','status_of_action_score','co_benefit_score',\
               'sector_coverage_score','finance_status_score','funding_diversity_score','total_funding','total_govt_funding',
               'funding_reliability_strength','funding_dependency_score','funding_to_per_capita_income_ratio_per_sqkm',
               'funding_to_per_capita_income_ratio_per_head'
               ]

for org,state_id in zip(city_metadata[city_metadata['Organization'].isin(all_cities)]['Organization'].values,city_metadata[city_metadata['Organization'].isin(all_cities)]['state_id'].values):
    
    # No. of actions taken


    q_num='3.0'
    type_='Adaptation'
    sub_type='Adaptation-action'

    years=[2019,2020]
    col_name='Action title'

    actions_taken_score = get_count(q_num,years,col_name,org)
    
    # Status of action - with weightage
    col_name='Status of action'
    Status_of_action = get_values(q_num,years,col_name,org)

    section=col_name
    status_of_action_score = get_score(type_,sub_type,section)

    col_name = 'Co-benefit area'
    co_benefit_score = get_count(q_num,years,col_name,org)

    # Action Coverage KPI : Sectors affected of the corresponding cities/Sectors covered under adaptation plan - with weightage
    col_name='Sectors/areas adaptation action applies to'
    Sectors_areas_adaptation_action_applies_to = get_values(q_num,years,col_name,org)

    section='Sectors areas adaptation action applies to'
    sector_coverage_score = get_score(type_,sub_type,section,score_type='sum')

    # Finance Status - with weightage

    col_name='Finance status'
    Finance_status = get_values(q_num,years,col_name,org)

    section=col_name
    finance_status_score = get_score(type_,sub_type,section,score_type='ratio')

    # Funding diversity KPI : No. of Majority funding sources 

    col_name='Majority funding source'
    funding_diversity_score = get_count(q_num,years,col_name,org)


    # Funding reliability strength - share of reliable funding source such as Government funding in total funding 
    col_name = 'Total cost of the project (currency)'
    years=[2020]
    total_funding = get_values(q_num,years,col_name,org).astype('int').sum()


    col_name = 'Total cost provided by the local government (currency)'
    total_govt_funding = get_values(q_num,years,col_name,org).astype('int').sum()

    if total_funding=='No Answer':
        total_funding=0.0
    if total_govt_funding=='No Answer':
        total_govt_funding=0.0

    if (total_funding!=0.0) & (total_govt_funding!=0.0):    
        funding_reliability_strength = float(total_govt_funding)/float(total_funding)
    else:
        funding_reliability_strength = 0.0


    # Funding dependency score - how much of total funding comes from the majority funding source   

    col_name = 'Total cost provided by the majority funding source (currency)'
    funding_dependency_score = get_values(q_num,years,col_name,org).astype('int').sum()/total_funding

    # Growth in overall funding from previous year

#     years=[2019]
#     total_funding_prev_year = get_values(q_num,years,col_name,org).astype('int').sum()


#     growth_in_funding = (total_funding/total_funding_prev_year)-1 if (total_funding_prev_year!=0.0) else 0.0


    # Funding to per capita income ratio : Funding per sq.km/Avg. income per sq.km 

    ## first find funding per sq.km

    funding_per_sqkm = total_funding/city_metadata[city_metadata['Organization']==org]['area in sq.km'].values[0]
    funding_per_head = total_funding/city_metadata[city_metadata['Organization']==org]['population'].values[0]

    ## Then find per capita income of the city from SVI data
    per_capita_income_of_city = (svi_data[svi_data['FIPS'].isin(census_tract_data[(census_tract_data['PlaceName']==org)&(census_tract_data['StateAbbr']==state_id)]['TractFIPS'].unique())]['E_PCI']*\
                                     svi_data[svi_data['FIPS'].isin(census_tract_data[(census_tract_data['PlaceName']==org)&(census_tract_data['StateAbbr']==state_id)]['TractFIPS'].unique())]['E_TOTPOP']).values.sum()/svi_data[svi_data['FIPS'].isin(census_tract_data[(census_tract_data['PlaceName']==org)&(census_tract_data['StateAbbr']==state_id)]['TractFIPS'].unique())]['E_TOTPOP'].sum()

    ## multiply per capita income with density and you get avg. income per sq.km
    avg_income_per_sqkm = (per_capita_income_of_city*city_metadata[city_metadata['Organization']==org]['density'].values[0])


    
    funding_to_per_capita_income_ratio_per_head = funding_per_head/per_capita_income_of_city 
    funding_to_per_capita_income_ratio_per_sqkm = funding_to_per_capita_income_ratio_per_head*city_metadata[city_metadata['Organization']==org]['density'].values[0]
    
    # Record KPI values:
    
    adaptation_action_kpi_recorder[org] = [globals()[kpi] for kpi in kpis]
    

## Per sq.km funding to per sqkm income ratio 

We can interpret this as, **amount that comes back to a sq.km area of the city as funding against the avg. income of per sq.km area of the city.**


For City of Cincinnati, we have Funding to per capita income ratio per sqkm of 14.08. So we can say, **for every 1000 USDs earned in a sq.km area, 14.08 USDs come back to the city as part of environmental funding.**


## Per capita funding to per capita income ratio 

This kpi is almost same as above. It tries to capture funding to income ratio on per head basis. 

For City of Cincinnati, we have Funding to per capita income ratio per head of 0.009(approx.). So we can say, **for every 1000 USDs earned per person, 9 USDs are spent as part of environmental funding.**

#### Reason for judging the funding amount against per capita income, instead of looking at per capita funding, is to see, if there is any disparity, in wealth generation capacity of a locality and the funding it receives. This could help to ensure social equity , while addressing environmental issues.

Average income data has been collected from CDC's social vulnerability index dataset. 

In [None]:
adaptation_action_kpi = pd.DataFrame(adaptation_action_kpi_recorder).T
adaptation_action_kpi.columns = kpis
adaptation_action_kpi.fillna(0.0,inplace=True)
adaptation_action_kpi.reset_index(inplace=True)
adaptation_action_kpi.rename(columns={'index':'Organization'},inplace=True)


adaptation_action_kpi[['Organization', 'funding_to_per_capita_income_ratio_per_sqkm',
       'funding_to_per_capita_income_ratio_per_head', 'actions_taken_score', 'status_of_action_score',
       'co_benefit_score', 'sector_coverage_score', 'finance_status_score',
       'funding_diversity_score', 'total_funding', 'total_govt_funding',
       'funding_reliability_strength', 'funding_dependency_score',       
       ]][adaptation_action_kpi['funding_to_per_capita_income_ratio_per_head']>0].sort_values(by='funding_to_per_capita_income_ratio_per_head',ascending=True).head(10).style.set_caption('Adaptation KPIs')

In [None]:
# Define weightage for levels in both support and challenge

level_weightage = {'Significantly challenges':-1,
'Moderately challenges':-0.6,
'Somewhat challenges':-0.3,
'Significantly supports':1,
'Moderately supports':0.6,
'Somewhat supports':0.3,
}

In [None]:

favorability_scores = {}


for org in all_cities:
    q_num='2.2'
#     print(org)
    data = cities_response[((cities_response['Question Number']==q_num)&(cities_response['Year Reported to CDP'].isin([2020])))                            
                  &(cities_response['Organization']==org)].sort_values(by='Year Reported to CDP')[['Organization','Column Name','Response Answer','Row Number']]\
    .drop_duplicates(keep='first')\
    .sort_values(by=['Column Name','Row Number'])[['Column Name','Response Answer','Row Number']]
    
    data = data.pivot(index='Row Number',columns='Column Name',values='Response Answer').iloc[:,:3]
    data.columns = ['Factors that affect ability to adapt','Indicate if this factor either supports or challenges the ability to adapt','Level of degree to which factor challenges/supports the adaptive capacity of your city']
    data = data.merge(favorability_weightage,on='Factors that affect ability to adapt',how='left').dropna()
    data.replace(level_weightage,inplace=True)
    
    data['factor_related_favorability_score'] = 0.0
    data['factor_related_favorability_score'] = data[['Indicate if this factor either supports or challenges the ability to adapt','Level of degree to which factor challenges/supports the adaptive capacity of your city','Challenges','Supports']]\
    .apply(lambda x: x[1]*x[2] if x[0]=='Challenges' else x[1]*x[3],axis=1)
    
    favorability_scores[org] = data['factor_related_favorability_score'].sum()
    
    
# include this favorability score for each city in adaptation kpi dataframe. Include density category of the cities.

adaptation_action_kpi = adaptation_action_kpi.merge(pd.DataFrame(favorability_scores.values(),favorability_scores.keys()).reset_index()\
.rename(columns={'index':'Organization',0:'adaptation_favorability_score'}),on='Organization',how='left')

adaptation_action_kpi = adaptation_action_kpi.merge(city_metadata[['Organization','density_category']],on='Organization',how='left')    

## Observations on KPI values for cities with different densities

Many of the selected cities have not disclosed funding related amounts. So we avoid those and choose relevant ones only and look at some of the statistics.

**Following the median values we can say, cities with high densities seem to be getting more funding per person compared to per capita income.**  However, when we switch to **funding per sq. km., very high density cities are getting twice the amount** compared to low density cities and medium density and high desnity cities are lagging way behind. We will later see, how this disparity affects the overall performances of the cities towards adaptation.

In [None]:
adaptation_action_kpi[adaptation_action_kpi['funding_to_per_capita_income_ratio_per_head']>0].groupby(['density_category'])['funding_to_per_capita_income_ratio_per_head'].describe()\
.style.set_caption('funding to per capita income ratio per head')

In [None]:
adaptation_action_kpi[adaptation_action_kpi['funding_to_per_capita_income_ratio_per_sqkm']>0].groupby(['density_category'])['funding_to_per_capita_income_ratio_per_sqkm'].describe()\
.style.set_caption('funding to per capita income ratio per sq.km')

Let's have a look at the top 20 cities, who have given more information than the rest and try to look at the basic statistics, coming out of it.

## Further observations on KPI values for cities with different densities

1. Cities with very high density are ahead of high and medium density cities, in terms of number of projects.


2. In terms of preoject execution, low density cities are behind by margin of 3-5%.


3. Co-benefit scores are higher in medium density cities. 

4. In terms of financing and getting diverse sources of funding very high density cities are doing understandably better.

5. Very high & high density cities are less reliable on govt. funding and low density cities are most dependent.

6. Low density cities faces higher degree of challenges and receives higher per capita funding for per capita income.

In [None]:
adaptation_action_kpi_stats = adaptation_action_kpi[adaptation_action_kpi['Organization'].isin(['Washington', 'Miami', 'Tempe', 'Louisville', 'Phoenix',
       'West Palm Beach', 'Providence', 'Indianapolis', 'Denver',
       'Ann Arbor', 'New Bedford', 'Oakland', 'Somerville',
       'Grand Rapids', 'Hayward', 'Boston', 'Santa Monica', 'Cleveland',
       'Nashville', 'Pittsburgh'])].groupby(['density_category']).describe(percentiles=[.5]).T.reset_index()\
                                .rename(columns={'level_0':'KPI','level_1':'stat'})
adaptation_action_kpi_stats = adaptation_action_kpi_stats[adaptation_action_kpi_stats['stat'].isin(['mean','50%'])]



adaptation_action_kpi_stats[(adaptation_action_kpi_stats['stat']=='mean')&(adaptation_action_kpi_stats['KPI'].isin(['actions_taken_score', 'actions_taken_score',
       'status_of_action_score', 'status_of_action_score',
       'co_benefit_score', 'co_benefit_score', 'sector_coverage_score',
       'sector_coverage_score', 'finance_status_score',
       'finance_status_score', 'funding_diversity_score',
       'funding_diversity_score', 
       'funding_reliability_strength', 'funding_reliability_strength',       
       'funding_to_per_capita_income_ratio_per_sqkm',
       'funding_to_per_capita_income_ratio_per_sqkm',
       'funding_to_per_capita_income_ratio_per_head',
       'funding_to_per_capita_income_ratio_per_head',
        
       ]))].style.highlight_max(color = 'lightgreen', axis = 1).set_caption('Adaptation KPIs across density segments with maximum values highlighted')

In [None]:
adaptation_action_kpi_stats[(adaptation_action_kpi_stats['stat']=='mean')&(adaptation_action_kpi_stats['KPI'].isin([
        'adaptation_favorability_score',      
]))].style.highlight_min(color = 'orangered', axis = 1).set_caption('adaptation favorability score across density segments')

In [None]:
plt.figure(figsize=(8,8))
sns.heatmap(adaptation_action_kpi.corr())

# Observations from heatmap:

1. **'actions_taken_score'** has high positive correlation with **'funding_to_per_capita_ratios'**, which states the obvious fact, that **more funding results in more projects.** 


2. **'status_of_action'** score is highly correlated with **'finance_status_score'**. This can be interpreted as , **as financing of the projects get more secured, projects also advance towards becoming operational.**


3. **'co-benefit_score'** has highest positive correlation with **'sector_coverage_score'**, which makes sense, **as more sectors are covered , scope to include more co-beneficiaries also go up.**


4. **'funding_diversity_score'** is positively correlated with **'finance_status_score' and 'co-benefit_score'**. While **higher diversity helping to advance finance status of a project** is understandable, a positive correlation with 'co-benefit_score' might mean , **different sources of funding also perhaps comes with a need for commitment towards more diverse implementation of the same project**. Same issues affecting different sectors, might make different sources of funding to come and work together.


5. **'funding_reliability_strength' and 'funding_dependency_score'** are having positive correlation. While relibility strength defines reliability on funding sources that are more safe and reliable, dependency score defines, how much the project is dependent on the biggest funding source. **Government funding for example is mostly the biggest source of funding for low density areas**,as evident from the previous section. So a positive correlation between, 'funding_reliability_strength' and 'funding_dependency_score' is explainable. 



### Lets see , if we can sum all of it up in one KPI or index that can tell the whole comparative picture of adaptation work done in a city.

#### First,we will select the following KPIs. 
actions_taken_score, status_of_action_score,co_benefit_score, sector_coverage_score,
finance_status_score, funding_diversity_score,funding_reliability_strength,funding_dependency_score,
funding_to_per_capita_income_ratio_per_sqkm,funding_to_per_capita_income_ratio_per_head,adaptation_favorability_score

#### Then we will normalize them and assign a weighage to each of the KPI, based on our observations above. Since, funding and no. of projects are the most important and driving factors, we give more weightage to them.

However, since many cities have chosen not to disclose the received amount of funding, we will create indices with and without funding considered.

**adaptaton_index_with_funding** = actions_taken_score * 0.15+
       status_of_action_score * 0.1+
       co_benefit_score * 0.05+ 
       sector_coverage_score * 0.08+
       finance_status_score * 0.08+ 
       funding_diversity_score * 0.05+
       funding_reliability_strength * 0.05+
       funding_dependency_score * 0.04+
       funding_to_per_capita_income_ratio_per_sqkm * 0.15+
       funding_to_per_capita_income_ratio_per_head * 0.15+
       adaptation_favorability_score * 0.1
       
       
**adaptation_index_without_funding** = actions_taken_score * 0.2+
       status_of_action_score * 0.15+
       co_benefit_score * 0.1+ 
       sector_coverage_score * 0.13+
       finance_status_score * 0.13+ 
       funding_diversity_score * 0.1+
       funding_reliability_strength * 0.05+
       funding_dependency_score * 0.04+
       adaptation_favorability_score * 0.1

In [None]:
kpi_weightage_with_funding = {'actions_taken_score':0.15,
       'status_of_action_score':0.1,
       'co_benefit_score':0.05, 
       'sector_coverage_score':0.08,
       'finance_status_score':0.08, 
       'funding_diversity_score':0.05,
       'funding_reliability_strength':0.05,
       'funding_dependency_score':0.04,
       'funding_to_per_capita_income_ratio_per_sqkm':0.15,
       'funding_to_per_capita_income_ratio_per_head':0.15,
       'adaptation_favorability_score':0.1}

kpi_weightage_without_funding = {'actions_taken_score':0.2,
       'status_of_action_score':0.15,
       'co_benefit_score':0.1, 
       'sector_coverage_score':0.13,
       'finance_status_score':0.13, 
       'funding_diversity_score':0.1,
       'funding_reliability_strength':0.05,
       'funding_dependency_score':0.04,
       'adaptation_favorability_score':0.1}

selected_kpis_with_funding = ['actions_taken_score', 'status_of_action_score',
       'co_benefit_score', 'sector_coverage_score',
       'finance_status_score', 'funding_diversity_score','funding_reliability_strength',
       'funding_dependency_score',
       'funding_to_per_capita_income_ratio_per_sqkm',
       'funding_to_per_capita_income_ratio_per_head',
       'adaptation_favorability_score']

selected_kpis_without_funding = ['actions_taken_score', 'status_of_action_score',
       'co_benefit_score', 'sector_coverage_score',
       'finance_status_score', 'funding_diversity_score','funding_reliability_strength',
       'funding_dependency_score',
       'adaptation_favorability_score']

kpis_to_be_normalized = ['actions_taken_score','co_benefit_score','sector_coverage_score','funding_diversity_score',
                        'funding_to_per_capita_income_ratio_per_sqkm','adaptation_favorability_score'                      
                        ]

adaptation_action_kpi[kpis_to_be_normalized] = normalize(adaptation_action_kpi[kpis_to_be_normalized],axis=0,norm='max')

In [None]:
adaptation_action_kpi['adaptation_index_with_funding'] = np.dot(adaptation_action_kpi[selected_kpis_with_funding],pd.DataFrame(kpi_weightage_with_funding.values(),kpi_weightage_with_funding.keys())).reshape(-1,1)
adaptation_action_kpi['adaptation_index_without_funding'] = np.dot(adaptation_action_kpi[selected_kpis_without_funding],pd.DataFrame(kpi_weightage_without_funding.values(),kpi_weightage_without_funding.keys())).reshape(-1,1)

### Interpretation of adaptation index

Higher score on adaptation index means a better performing city towards the adaptation related goals.
We have selected only top 50% to avoid low scoring cities. A low score in CDP survey reponse, doesn't essentially mean poor performing city always.Itcould be becuase of non-disclosure of certain data, such as funding, which resulted with no score on important funding KPIs.

A density based look at the top 50% cities, on mean 'adaptation index' score shows, **cities with low density are ahead of the rest.** considering funding KPIs. Otherwise, high density cities are doing better than the other segments.



In [None]:
adaptation_action_kpi[adaptation_action_kpi['adaptation_index_with_funding']!=0]\
.sort_values(by='adaptation_index_with_funding',ascending=False)[:20]\
.groupby(['density_category'])['adaptation_index_with_funding'].describe().style.set_caption('Adaptation index considering funding')

In [None]:
adaptation_action_kpi[adaptation_action_kpi['adaptation_index_without_funding']!=0]\
.sort_values(by='adaptation_index_without_funding',ascending=False)[:20]\
.groupby(['density_category'])['adaptation_index_without_funding'].describe().style.set_caption('Adaptation index without considering funding')

In [None]:
adaptation_action_kpi[['Organization','adaptation_index_with_funding','adaptation_index_without_funding','density_category']].sort_values(by='adaptation_index_with_funding',ascending=False).reset_index(drop=True).head(20)\
.style.set_caption('Adaptation indices of US cities')

# Response to Risk KPI

Having an adaptation index with details gives us an opportunity to measure a city's response to the environmental risks, we calculated before. 

Response to Risk KPI shows, whether city's adaptation efforts are proportionate enough, considering both social vulnerability risk  and both long term and imminent environmental risk.

**Response to risk = adaptation_index_without_funding of the city/(Socio-Environmental Risk of of city+Imminent Environmental risk)**

Below we can see, most of the cities are following the ideal line, marked in the plot. However, cities like Oakland, Cincinnati are low adaptation effort, compared to the risk they are facing.

***Response to risk KPI considers adaptation effort, social vulnerability, long term and imminent environmental risk and shows whether city authority's actions are taking these factors into account or not.***

In [None]:
response_to_risk = adaptation_action_kpi[['Organization','adaptation_index_without_funding','adaptation_index_with_funding']]\
.merge(aei1[['Organization','Socio-Environmental Risk Monitor']],on='Organization').merge(climate_hazard[['Organization','Imminent Risk indicator']],on='Organization',how='left')

response_to_risk['Cumulated risk'] = (response_to_risk['Socio-Environmental Risk Monitor']+response_to_risk['Imminent Risk indicator'])
response_to_risk['response to risk'] = response_to_risk['adaptation_index_without_funding']/(response_to_risk['Socio-Environmental Risk Monitor']+response_to_risk['Imminent Risk indicator'])
response_to_risk.sort_values(by='response to risk',ascending=True).head(10)\
.style.set_caption('Response to Risk - sorted in low response to high response')

In [None]:
fig = px.scatter(response_to_risk,y='adaptation_index_without_funding',
    x='Cumulated risk',
                 text='Organization',
#                trendline='ols'
                 
    )

fig.update_layout(
        template="plotly_dark",
        title = dict(text='Response to Risk',x=0.5,y=.97),
        height=600,
        width=1000,
        
        font_color="rgb(199,233,180)",
        shapes=[
            dict(opacity=0.3,line_dash='dot',fillcolor='rgb(199,233,180)',
              type= 'line',
              y0= 0.15, y1= 0.5,   # adding a horizontal line at Y = 1
              x0= 0.5, x1= 14
                 ),
#             dict(opacity=0.3,line_dash='dot',fillcolor='rgb(215,48,39)',
#               type= 'rect',
#               y0= 1.88, y1= 3.97,   # adding a horizontal line at Y = 1
#               x0= 0.73, x1= 1.41
#                  ),
            ]
        )

fig.show()


# Severity of impact on services

Severity of impact on services is calculated as percentage of CDP enlisted city services or assets, observed to be impacted in the city.

This shows , how badly cities operations are affected.

In [None]:
# select the data - by setting question no. and year based conditions

# climate hazard related responses
# 2018 reponse options are not in sync with 2019 and 2020.
data = cities_response[((cities_response['Question Number']=='2.1')&(cities_response['Year Reported to CDP'].isin([2019,2020])))
               |((cities_response['Question Number']=='2.2a')&(cities_response['Year Reported to CDP']==2018))
               ][['Year Reported to CDP',
                  'CDP Region',
                  'Column Name',
                  'Country',
                  'Question Name',
                  'Organization',
                  'Response Answer']].drop_duplicates().reset_index(drop=True)

data['Column Name'].replace(['Current consequence of hazard','Magnitude of impact'],'Current magnitude of hazard',inplace=True)
data['Question Name'] = 'Most significant climate hazards faced by city'
CDP_enlisted_major_impacts = list(data[data['Column Name']=='Social impact of hazard overall']['Response Answer'].value_counts()[:30].index.values[:11])
social_impact_info = pd.pivot_table(data[['Year Reported to CDP','CDP Region','Country','Organization','Response Answer','Column Name']][(data['Column Name']=='Social impact of hazard overall')&(data['Response Answer'].isin(CDP_enlisted_major_impacts))],index=['Organization','CDP Region','Country','Year Reported to CDP'],columns='Response Answer',aggfunc='count',fill_value=0.0)
social_impact_info['Total no. of impacted social aspects'] = social_impact_info.sum(axis=1)
social_impact_info.reset_index(inplace=True)
social_impact_info['percentage of CDP enlisted social aspects impacted'] = (100*social_impact_info['Total no. of impacted social aspects']/len(CDP_enlisted_major_impacts)).round(2)


affected_services = data[data['Column Name'].isin(['Most relevant assets / services affected overall',
                               'Top three assets/ services affected' 
                              ])]
affected_services.dropna(inplace=True)
affected_services.sort_values(by=['Year Reported to CDP','Organization'],inplace=True)
CDP_enlisted_sectors = list(affected_services['Response Answer'].value_counts()[:17].index.values)

affected_services_info = pd.pivot_table(affected_services[['Year Reported to CDP','CDP Region','Country','Organization','Response Answer','Column Name']][affected_services['Response Answer'].isin(CDP_enlisted_sectors)],index=['Year Reported to CDP','CDP Region','Country','Organization'],columns='Response Answer',aggfunc='count',fill_value=0.0)
affected_services_info['Total no. of impacted services'] = affected_services_info.sum(axis=1)
affected_services_info.reset_index(inplace=True)
affected_services_info['percentage of CDP enlisted services impacted'] = np.round(100*(affected_services_info['Total no. of impacted services']/len(CDP_enlisted_sectors)),4)

def get_regional_summary(info,feature):
    info = info.groupby(['CDP Region','Year Reported to CDP'])[feature].describe()
    return info.reset_index()

social_impact_summary = get_regional_summary(social_impact_info,'percentage of CDP enlisted social aspects impacted')
affected_services_summary = get_regional_summary(affected_services_info,'percentage of CDP enlisted services impacted')
affected_services_info.columns = ['Year Reported to CDP', 'CDP Region', 'Country', 
       'Organization', 'Commercial',
       'Education',
       'Emergency services', 'Energy',
       'Environment, biodiversity, forestry',
       'Food & agriculture',
       'Industrial',
       'Information & communications technology',
       'Land use planning',
       'Law & order', 'Public health',
       'Residential',
       'Society / community & culture',
       'Tourism', 'Transport',
       'Waste management',
       'Water supply & sanitation',
       'Total no. of impacted services',
       'percentage of CDP enlisted services impacted']

affected_services_info = affected_services_info.merge(cities_response[['lat','long','Organization']].drop_duplicates(),on='Organization',how='left')

affected_services_info['text'] = affected_services_info[['Organization','Country','percentage of CDP enlisted services impacted']].\
apply(lambda x: 'Country : '+x[1]+', Organization : '+x[0]+', % of impact: '+str(x[2]),axis=1)

affected_services_last_status = affected_services_info.sort_values(by=['Year Reported to CDP','Organization']).drop_duplicates(subset=['Organization'],keep='last')

smb = go.Scattermapbox(name='severity map',
        lon = affected_services_last_status['long'],
        lat = affected_services_last_status['lat'],
        text = affected_services_info["percentage of CDP enlisted services impacted"],
        mode = 'markers',
#         locationmode='USA-states',
        hovertext=affected_services_last_status['text'],
        marker = dict(
#             sizemin = 5,
#             sizemode='area',
            size = 10,
            opacity = 0.8,
            reversescale = False,
            autocolorscale = False,
            colorscale = 'Reds',
            cmin = 0,
            color = affected_services_last_status["percentage of CDP enlisted services impacted"],
            cmax = affected_services_last_status["percentage of CDP enlisted services impacted"].max(),
            colorbar_title="Impact% - Severity scale",
            colorbar_thickness=10,
            colorbar_title_side='right',
            colorbar_len=.3,
            colorbar_xanchor='left',
            colorbar_yanchor='bottom',
            colorbar_y=0.7
        ))


services_temp = (affected_services_last_status[CDP_enlisted_sectors].sum(axis=0)/affected_services_last_status.shape[0]).sort_values(ascending=False)*100
services_temp_trace = go.Bar(x=services_temp.index.values, y=services_temp.values,
             showlegend=True,text=np.round(services_temp.values,2),textposition='auto'
             )

services_temp2 = affected_services_last_status.groupby(['CDP Region'])[CDP_enlisted_sectors].sum()/affected_services_last_status.groupby(['CDP Region'])[CDP_enlisted_sectors].count()



fig = make_subplots(
    rows=2, cols=2,
    specs=[
           [{"type": "scattermapbox",'colspan':2},None],
#            [{"type": "histogram2d"}],
           [{"type": "bar",'colspan':2},None],
          ],
    subplot_titles=['Severity map of impacted services',
                    'Service-wise analysis : % of cities having the corresponding sector affected',
                    ],
#     y_title='Exploration of affected services across the world'.upper(),
    vertical_spacing=0.14,
    row_heights=[0.75,0.25],
        
#     column_widths=[1,1,1]
    
                    )

fig.add_trace(smb,row=1,col=1)
fig.update_layout(
        mapbox_style="carto-positron",
#         title = dict(text='% of CDP enlisted services affected',x=0.5,y=.97),
        height=900,
        width=800,
        hovermode='closest',
        mapbox=dict(
        bearing=0,
        center=go.layout.mapbox.Center(
            lat=35,
            lon=-95
        ),
        pitch=0,
        zoom=3
    )
        )

fig.update_layout(
    template="plotly_dark",
    margin=dict(r=25, t=25, b=140, l=100),
    showlegend=False,
    bargroupgap=0.25,
    bargap=0.25
)


fig.append_trace(services_temp_trace,row=2,col=1)

fig.update_traces(opacity=0.65)

fig.update_layout(font_color="rgb(199,233,180)",font_family='Arial')

fig.show()

# Severity of social impact

Severity of social impact is calculated as percentage of CDP enlisted social impacts observed in the city. 

In [None]:


social_impact_info.columns = ['Organization', 'CDP Region','Country',
       'Year Reported to CDP','Fluctuating socio-economic conditions',
        'Increased conflict and/or crime',
        'Increased demand for healthcare services',
        'Increased demand for public services',
        'Increased incidence and prevalence of disease and illness',
        'Increased resource demand',
        'Increased risk to already vulnerable populations',
        'Loss of tax base to support public services',
        'Loss of traditional jobs',
        'Migration from rural areas to cities',
        'Population displacement',
        'Total no. of impacted social aspects',
        'percentage of CDP enlisted social aspects impacted']

social_impact_info = social_impact_info.merge(cities_response[['lat','long','Organization']].drop_duplicates(),on='Organization',how='left')
social_impact_info['text'] = social_impact_info[['Organization','Country','percentage of CDP enlisted social aspects impacted']].\
apply(lambda x: 'Country : '+x[1]+', Organization : '+x[0]+', % of impact: '+str(x[2]),axis=1)


social_impact_last_status = social_impact_info.sort_values(by=['Year Reported to CDP','Organization']).drop_duplicates(subset=['Organization'],keep='last')
social_impact_last_status[CDP_enlisted_major_impacts].sum(axis=0)/social_impact_last_status.shape[0]

smb = go.Scattermapbox(name='Severity map',
        lon = social_impact_last_status['long'],
        lat = social_impact_last_status['lat'],
        text = social_impact_info["percentage of CDP enlisted social aspects impacted"],
        mode = 'markers',
#         locationmode='USA-states',
        hovertext=social_impact_last_status['text'],
        marker = dict(
#             sizemin = 5,
#             sizemode='area',
            size = 10,
            opacity = 0.8,
            reversescale = False,
            autocolorscale = False,
            colorscale = 'Reds',
            cmin = 0,
            color = social_impact_last_status["percentage of CDP enlisted social aspects impacted"],
            cmax = social_impact_last_status["percentage of CDP enlisted social aspects impacted"].max(),
            colorbar_title="Impact% - Severity scale",
            colorbar_thickness=10,
            colorbar_title_side='right',
            colorbar_len=.3,
            colorbar_xanchor='left',
            colorbar_yanchor='bottom',
            colorbar_y=0.7
        ))
 

aspects_temp = (social_impact_last_status[CDP_enlisted_major_impacts].sum(axis=0)/social_impact_last_status.shape[0]).sort_values(ascending=False)*100
aspects_temp_trace = go.Bar(x=aspects_temp.index.values, y=aspects_temp.values,
             showlegend=True,text=np.round(aspects_temp.values,2),textposition='auto'
             )

services_temp2 = social_impact_last_status.groupby(['CDP Region'])[CDP_enlisted_major_impacts].sum()/social_impact_last_status.groupby(['CDP Region'])[CDP_enlisted_major_impacts].count()



fig = make_subplots(
    rows=2, cols=2,
    specs=[
           [{"type": "scattermapbox",'colspan':2},None],
#            [{"type": "histogram2d"}],
           [{"type": "bar",'colspan':2},None],
          ],
    subplot_titles=['Severity map of impacted social aspects',                    
                    'Service-wise analysis : % of cities having the corresponding social aspect affected',
                    ],
#     y_title='Exploration of social impacts across the world'.upper(),
    vertical_spacing=0.14,
    row_heights=[0.75,0.25],
        
#     column_widths=[1,1,1]
    
                    )

fig.add_trace(smb,row=1,col=1)

fig.update_layout(
        mapbox_style="carto-positron",
#         title = dict(text='% of CDP enlisted services affected',x=0.5,y=.97),
        height=900,
        width=800,
        hovermode='closest',
        mapbox=dict(
        bearing=0,
        center=go.layout.mapbox.Center(
            lat=35,
            lon=-95
        ),
        pitch=0,
        zoom=3
    )
        )

fig.update_layout(
    template="plotly_dark",
    margin=dict(r=25, t=25, b=140, l=100),
    showlegend=False,
    bargroupgap=0.25,
    bargap=0.25
)


fig.append_trace(aspects_temp_trace,row=2,col=1)


# fig.append_trace(sun1,row=4,col=2)


# fig.update_histogram(barmode='overlay')
# Reduce opacity to see both histograms
fig.update_traces(opacity=0.65)

fig.update_layout(font_color="rgb(199,233,180)",font_family='Arial')

fig.show()

In [None]:
affected_services_info = affected_services_info.merge(city_metadata,on='Organization',how='left')
affected_services_info.groupby(['density_category','Year Reported to CDP'])['percentage of CDP enlisted services impacted'].describe(percentiles=[0.5])\
.style.set_caption('Severity of impacted services across density segments')

In [None]:
social_impact_info = social_impact_info.merge(city_metadata,on='Organization',how='left')
social_impact_info.groupby(['density_category','Year Reported to CDP'])['percentage of CDP enlisted social aspects impacted'].describe(percentiles=[0.5])\
.style.set_caption('Social impact severity across density segments')

## Observation 

Percentages of both social impact and affected services are on the rise across all segments. However, low density cities are comparatively less affected than other segments.


# Transport

In the previous sections, we found that transport is a big pain point, when it comes to GHG emissions.
Let's explore further. 

In [None]:
q_num = '10.1'
row_name = ['Mode','Mode Share'
           ]

col_name = ['Private motorized transport', 
'Rail/Metro/Tram', 
'Buses (including BRT)', 
'Ferries/ River boats', 
'Walking', 
'Cycling', 
'Taxis or For Hire Vehicles']

# selected_cities = {}

# for density in ['very high density','high density','medium density','low density']:
#     selected_cities[density] = list(total_cities_response[
#                                                  (total_cities_response['Question Number']==q_num)&\
#                                                  (total_cities_response['Column Name'].isin(col_name))&\
#                                                  (total_cities_response['Row Name'].isin(row_name))&\
#                                                  (total_cities_response['Year Reported to CDP'].isin([2020]))]\
#                            [['Organization','Response Answer','density_category','CDP Region','Country']]\
#                            .replace({'Question not applicable':np.nan,'0':np.nan})\
#                            .dropna()\
#                            .groupby(['Organization'])\
#                            .count()\
#                            .sort_values(by='Organization',ascending=False)
#                            .index.values)

transport_mode_perc = {}

for density in ['very high density','high density','medium density','low density']:
    for city in selected_cities[density]:
#         for row_n in row_name:
            transport_mode_perc[city] = [ total_cities_response[(total_cities_response['density_category']==density)&\
                                                     (total_cities_response['Question Number']==q_num)&\
                                                     (total_cities_response['Column Name'].isin(col_name))&\
                                                     (total_cities_response['Row Name'].isin([row_n]))&\
                                                      (total_cities_response['Organization']==city)&\
                                                     (total_cities_response['Year Reported to CDP'].isin([2020]))]\
                               ['Response Answer'].astype('float').values.sum()

                                                        for row_n in row_name]
    
    
transport_mode = total_cities_response[
                                                 (total_cities_response['Question Number']==q_num)&\
                                                 (total_cities_response['Column Name'].isin(col_name))&\
#                                                  (total_cities_response['Row Name'].isin(row_name))&\
                                                 (total_cities_response['Year Reported to CDP'].isin([2020]))]\
                           [['Organization','Response Answer','Column Name','density_category']]\
.replace({'Question not applicable':np.nan,'0':np.nan})\
.dropna()\
.sort_values(by=['Organization','Column Name'])\
.pivot(index='Organization',columns='Column Name',values='Response Answer')\
.fillna(0.0)\
.reset_index()\
.merge(city_metadata,on='Organization',how='left')

transport_mode['Mass transport mode usage in %'] = transport_mode[['Buses (including BRT)','Rail/Metro/Tram','Ferries/ River boats']].astype('float').sum(axis=1)
transport_mode['Non-motorized mode usage in %'] = transport_mode[['Cycling','Walking']].astype('float').sum(axis=1)
transport_mode['Private vehicle based commute mode usage in %'] = transport_mode[['Private motorized transport','Taxis or For Hire Vehicles']].astype('float').sum(axis=1)

world_transport_mode = world_cities_response[(world_cities_response['CDP Region']=='Europe')&\
                                                 (world_cities_response['Question Number']==q_num)&\
                                                 (world_cities_response['Column Name'].isin(col_name))&\
#                                                  (world_cities_response['Row Name'].isin(row_name))&\
                                                 (world_cities_response['Year Reported to CDP'].isin([2020]))]\
                           [['Organization','Response Answer','Column Name','density_category']]\
.replace({'Question not applicable':np.nan,'0':np.nan})\
.dropna()\
.sort_values(by=['Organization','Column Name'])\
.pivot(index='Organization',columns='Column Name',values='Response Answer')\
.fillna(0.0)\
.reset_index()\
.merge(world_city_metadata,on='Organization',how='left')

world_transport_mode['Mass transport mode usage in %'] = world_transport_mode[['Buses (including BRT)','Rail/Metro/Tram','Ferries/ River boats']].astype('float').sum(axis=1)
world_transport_mode['Non-motorized mode usage in %'] = world_transport_mode[['Cycling','Walking']].astype('float').sum(axis=1)
world_transport_mode['Private vehicle based commute mode usage in %'] = world_transport_mode[['Private motorized transport','Taxis or For Hire Vehicles']].astype('float').sum(axis=1)

Here, we will create KPIs that, summarizes the transport mode usage by the population of the cities.

We will categorize it into 3 KPIs.

### 1. Private vehicle based commute mode usage in %
### 2. Mass transport mode usage in %
### 3. Non-motorized mode usage in %

In below table, if we look at mode share in cities, we can see higher density results into stronger mass transport system. In Low density cities only 4% of the passengers commute by mass transport system and almost equal share resorts to walking or cycling. This is consistent with our findings on carbon intensity/million Btu per capita KPI values for low density segment and answers the following question we raised back there.

<b> Low density cities are doing terrible in transport section. Could this be because of lack of mass transport systems? 
</b>

It sure is. Although, there are few exceptions in all categories. 


In [None]:
transport_mode_summary = transport_mode.groupby(['density_category'])[['Private vehicle based commute mode usage in %',
                                             'Mass transport mode usage in %',
                                             'Non-motorized mode usage in %' 
                                             ]].describe(percentiles=[0.50]).T\
.reset_index()\
.rename(columns={'level_0':'Transport Mode','level_1':'stat'})

transport_mode_summary[transport_mode_summary['stat'].isin(['50%','mean'])].style.set_caption('Percentage of population availing different transport modes in USA')

But how does this fair against the European cities? **European cities are pretty ahead**, it seems. **A lower rate of using private vehicle and almost 10-12% more citizens rely on mass transport systems, even in low density cities, where US is lagging far behind.**

'Non-motorized mode usage in %' is higher across all segments. That's a **clear indication of strong citizen awareness and engagement**. Some of the cities that top the chart are Paris,Berlin,Rotterdam,Copenhagen.

If we look at BIKEWAYS/SQ.KM AREA KPI, 

**<u>Paris scores 700/105 = 16.19km/sq.km,(based on 2015 data)</u>**

**<u>Rotterdam scores 600/325 = 1.84km/sq.km,</u>**

**<u>Copenhagen scores 500/88 = 5.68km/sq.km</u>**

source: Data from Google

In [None]:
world_transport_mode_summary = world_transport_mode.groupby(['density_category'])[['Private vehicle based commute mode usage in %',
                                             'Mass transport mode usage in %',
                                             'Non-motorized mode usage in %' 
                                             ]].describe(percentiles=[0.50]).T\
.reset_index()\
.rename(columns={'level_0':'Transport Mode','level_1':'stat'})

world_transport_mode_summary[world_transport_mode_summary['stat'].isin(['50%','mean'])]

Here are the top 10 cycling and walking friendly cities in Europe.

In [None]:
world_transport_mode.sort_values(by=['Non-motorized mode usage in %'],ascending=False)[['Organization','Mass transport mode usage in %','Non-motorized mode usage in %','Private vehicle based commute mode usage in %']].head(10)\
.style.apply(lambda x: ['None','None','background-color: lightgreen','None'],axis=1)

Let's move on to 
# FOOD & WASTE

The reason for studying food and waste together doesn't need to be mentioned. But since, we are looking into USA data, it's more significant. Why? In United States, food waste is estimated at between 30-40 percent of the food supply. 21.6% of total solid waste generated in US is food waste.

source : https://www.epa.gov/sites/production/files/2020-11/documents/2018_tables_and_figures_fnl_508.pdf


But this is not a US specific problem. Study of Food and Agriculture Organization (FAO) of the United Nations estimates that if 'global food waste' was a country, it would be the third highest emitter of greenhouse gases after the US and China.

So a **KPI like,** 
## Estimated amount of recyclable/compostable food waste (tonnes/year)
=.216*solid waste generated in city (reported in waste section - 13.0)
can help city councils in US not only to 

**a. monitor and reduce the food waste by enforcing policies.**
**b. bring down the GHG emission due to total solid waste disposal or incineration/burning of waste**, but also to 
**c. recycle or compost this waste effciently to produce renewable energy.** 


Thus, it will help the energy recovery process. 


Below, we can also see '<b>Solid waste generated per capita</b>' KPI values of the cities. 

In [None]:
q_num = '13.0'
col_name = [
    'Amount of waste generated (tonnes/year)'
           ]

row_name = ['Amount']

solid_waste_data = total_cities_response[
#                       (total_cities_response['Organization'].isin(orgs))&\
#                                                  (total_cities_response['density_category']==density)&\
                                                 (total_cities_response['Question Number']==q_num)&\
                                                 (total_cities_response['Column Name'].isin(col_name))&\
#                                                  (total_cities_response['Row Name'].isin(row_name))&\
                                                 (total_cities_response['Year Reported to CDP'].isin([2020]))]\
                           [['Organization','Response Answer']]\
                            .replace({'Question not applicable':np.nan,'0':np.nan})\
                            .dropna()\
                            .rename(columns={'Response Answer':'Solid waste generated (tonnes/year)'}).reset_index()

solid_waste_data.drop([29],inplace=True)
solid_waste_data = solid_waste_data.merge(city_metadata[['Organization','population','density_category']],on='Organization',how='left')
solid_waste_data['Estimated amount of recyclable/compostable food waste (tonnes/year)'] = solid_waste_data['Solid waste generated (tonnes/year)'].astype('float')*.216
solid_waste_data['Solid waste generated per capita (tonnes/year)'] = solid_waste_data['Solid waste generated (tonnes/year)'].astype('float')/solid_waste_data['population']
solid_waste_data.head()

Per capita solid waste generaton at very high density cities is almost twice of that of medium density cities.
Although, surprisingly low density cities are producing significant per capita solid waste.

In [None]:
solid_waste_data.groupby(['density_category'])[['Solid waste generated per capita (tonnes/year)']].median()

What about sustainability in food policies? We measure, by counting , how many categories a city has already acted on, among the following CDP questions.

1. Do you incentivise fresh fruit/vegetables vendor locations?
2. Do you subsidise fresh fruits and vegetables?
3. Do you tax/ban higher carbon foods (meat, dairy, ultra-processed)?
4. Do you use regulatory mechanisms that limit advertising of higher carbon foods (meat, dairy, ultra-processed)?

## sustainability in food policy KPI

**A sustainability_score_on_food_policy shows lower density in cities are showing a higher sustainability score, based on limited responses, CDP have received from Europe and North American regions** 

**<u>Most cities are incentivising fresh fruit/vegetables vendor locations, while preferring not to tax or ban higher carbon foods (meat, dairy, ultra-processed).</u>**




In [None]:
q_num = '12.4'
col_name = ['Action implemented']

row_name = ['Amount']

sustainability_score_on_food_policy = world_cities_response[
                      (world_cities_response['CDP Region'].isin(['Europe','North America']))&\
#                                                  (world_cities_response['density_category']==density)&\
                                                 (world_cities_response['Question Number']==q_num)&\
                                                 (world_cities_response['Column Name'].isin(col_name))&\
#                                                  (world_cities_response['Row Name'].isin(row_name))&\
                                                 (world_cities_response['Year Reported to CDP'].isin([2020]))]\
                           [['Organization','Response Answer','Column Name','Row Name']]\
                        .replace({'Question not applicable':np.nan,'0':np.nan})\
                           .dropna()\
                           .pivot(index='Organization',values='Response Answer',columns='Row Name')\
                           .dropna().reset_index()\
                           .replace({'No':0,'Do not know':0,'Yes':1}) 

sustainability_score_on_food_policy['sustainability_score_on_food_policy'] = sustainability_score_on_food_policy[['Do you incentivise fresh fruit/vegetables vendor locations?',
       'Do you subsidise fresh fruits and vegetables?',
       'Do you tax/ban higher carbon foods (meat, dairy, ultra-processed)?',
       'Do you use regulatory mechanisms that limit advertising of higher carbon foods (meat, dairy, ultra-processed)?']].mean(axis=1)  
sustainability_score_on_food_policy = sustainability_score_on_food_policy.merge(world_cities_response[['Organization','density_category','CDP Region']].drop_duplicates(),on='Organization',how='left')

sustainability_score_on_food_policy[sustainability_score_on_food_policy['sustainability_score_on_food_policy']>0].groupby(['density_category'])['sustainability_score_on_food_policy']\
.mean().sort_values(ascending=False)\
.plot.barh(title='Sustainbility score on food policies across density segments');

In [None]:

temp = sustainability_score_on_food_policy[['Do you incentivise fresh fruit/vegetables vendor locations?',
       'Do you subsidise fresh fruits and vegetables?',
       'Do you tax/ban higher carbon foods (meat, dairy, ultra-processed)?',
       'Do you use regulatory mechanisms that limit advertising of higher carbon foods (meat, dairy, ultra-processed)?']].sum(axis=0)/sustainability_score_on_food_policy.shape[0]
temp = pd.DataFrame(temp).reset_index().rename(columns={0:'values'})
fig = px.bar(temp, x='index', y='values')
fig.update_layout(title_text='Percentage of cities opting for food sustainability policies')
fig.show()


# Corporate Data

Since, we are looking into US data more, here also, we will narrow down on companies who are reporting on their US operations.
We have 356 companies responding to climate change questionnaire and 69 companies responding to water security questionnaire, on 2020. We have selected 135 from first 356 companies, who have provided adequate answers to the questions.

Here we will focus on two KPIs, that will show , how corporations are doing in terms of carbon usage in their operations and how they are favouring carbon emission in terms of expenditure.


First we will look at industry-wise scope1 emissions.

In [None]:
# ws_us_comps = corporate_water_security_response[(corporate_water_security_response['survey_year']==2020)&(corporate_water_security_response['question_number']=='W0.3')&(corporate_water_security_response['response_value']=='United States of America')]['organization'].unique()

cc_us_comps = corporate_climate_change_response[(corporate_climate_change_response['survey_year']==2020)&(corporate_climate_change_response['question_number']=='C0.3')&(corporate_climate_change_response['response_value']=='United States of America')]['organization'].unique()

response_count = pd.DataFrame(pd.DataFrame(corporate_climate_change_response[(corporate_climate_change_response['organization'].isin(cc_us_comps))&\
                                               (corporate_climate_change_response['question_number'].isin(['C4.3b','C6.5','C7.5','C7.6b','C7.3b','C8.2d']))]\
             .groupby(['organization',"module_name","question_number"]).size()).reset_index().groupby(['organization'])[0].sum()).rename(columns={0:'count'}).reset_index().sort_values(by='count',ascending=False)

cc_us_comps = response_count[response_count['count']>125]['organization'].values
corporate_climate_change_response = corporate_climate_change_response[(corporate_climate_change_response['survey_year']==2020)&(corporate_climate_change_response['organization'].isin(cc_us_comps))]
cc_us_comps = corporate_climate_change_disclosing[(corporate_climate_change_disclosing['survey_year']==2020)&(corporate_climate_change_disclosing['organization'].isin(cc_us_comps))][['organization','activities']].drop_duplicates()

cc_us_comps = cc_us_comps.merge(corporations[['organization','category']],on='organization',how='left')
corporate_climate_change_response = corporate_climate_change_response.merge(cc_us_comps,on='organization',how='left')



q_num = 'C6.1'
row_name = ['Reporting year']
col_name = ['C6.1_C1Gross global Scope 1 emissions (metric tons CO2e)']

scope1_emission = corporate_climate_change_response[(corporate_climate_change_response['question_number']==q_num)&\
                                  (corporate_climate_change_response['column_name'].isin(col_name))&\
                                  (corporate_climate_change_response['row_name'].isin(row_name))
                                 ][['organization','row_name','column_name','response_value']]\
.replace('0.0',np.nan)\
.dropna()\
.pivot(index='organization',values='response_value',columns='row_name')\
.reset_index()\
.merge(cc_us_comps,on='organization',how='left')\
.rename(columns={'Reporting year':'Gross scope 1 emission (metric tons CO2e)'})

scope1_emission['Gross scope 1 emission (metric tons CO2e)'] = scope1_emission['Gross scope 1 emission (metric tons CO2e)'].astype('float')


q_num = 'C6.3'
row_name = ['Reporting year']
col_name = ['C6.3_C1Scope 2, location-based']

scope2_emission_location_based = corporate_climate_change_response[(corporate_climate_change_response['question_number']==q_num)&\
                                  (corporate_climate_change_response['column_name'].isin(col_name))&\
                                  (corporate_climate_change_response['row_name'].isin(row_name))
                                 ][['organization','row_name','column_name','response_value']]\
.replace('0.0',np.nan)\
.dropna()\
.pivot(index='organization',values='response_value',columns='row_name')\
.reset_index()\
.merge(cc_us_comps,on='organization',how='left')\
.rename(columns={'Reporting year':'Gross location based scope 2 emission (metric tons CO2e)'})

scope2_emission_location_based['Gross location based scope 2 emission (metric tons CO2e)'] = scope2_emission_location_based['Gross location based scope 2 emission (metric tons CO2e)'].astype('float')



q_num = 'C6.3'
row_name = ['Reporting year']
col_name = ['C6.3_C2Scope 2, market-based (if applicable)']

scope2_emission_market_based = corporate_climate_change_response[(corporate_climate_change_response['question_number']==q_num)&\
                                  (corporate_climate_change_response['column_name'].isin(col_name))&\
                                  (corporate_climate_change_response['row_name'].isin(row_name))
                                 ][['organization','row_name','column_name','response_value']]\
.replace('0.0',np.nan)\
.dropna()\
.pivot(index='organization',values='response_value',columns='row_name')\
.reset_index()\
.merge(cc_us_comps,on='organization',how='left')\
.rename(columns={'Reporting year':'Gross market based scope 2 emission (metric tons CO2e)'})

scope2_emission_market_based['Gross market based scope 2 emission (metric tons CO2e)'] = scope2_emission_market_based['Gross market based scope 2 emission (metric tons CO2e)'].astype('float')



q_num = 'C6.5'
row_name = ['Purchased goods and services']
col_name = ['C6.5_C2Metric tonnes CO2e']

scope3_emission = corporate_climate_change_response[(corporate_climate_change_response['question_number']==q_num)&\
                                  (corporate_climate_change_response['column_name'].isin(col_name))
#                                   (corporate_climate_change_response['row_name'].isin(row_name))
                                 ][['organization','row_name','column_name','response_value']]\
.replace('0.0',np.nan)\
.dropna()\
.pivot(index='organization',values='response_value',columns='row_name')\
.reset_index()\
.merge(cc_us_comps,on='organization',how='left')\
.fillna(0.0)
# .rename(columns={'Purchased goods and services':'Gross scope 3 emission (metric tons CO2e)'}).fillna(0.0)

# scope3_emission['Gross scope 3 emission (metric tons CO2e)'] = scope3_emission['Gross scope 3 emission (metric tons CO2e)'].astype('float')


scope3_category = ['Business travel', 'Capital goods',
       'Downstream leased assets',
       'Downstream transportation and distribution', 'Employee commuting',
       'End of life treatment of sold products', 'Franchises',
       'Fuel-and-energy-related activities (not included in Scope 1 or 2)',
       'Investments', 'Other (downstream)', 'Other (upstream)',
       'Processing of sold products', 'Purchased goods and services',
       'Upstream leased assets', 'Upstream transportation and distribution',
       'Use of sold products', 'Waste generated in operations']

scope3_emission[scope3_category] = scope3_emission[scope3_category].astype('float')
scope3_emission['Gross scope 3 emission (metric tons CO2e)'] = scope3_emission[scope3_category].sum(axis=1)


temp = []

for s3_category in scope3_category:
    temp.append(pd.DataFrame(scope3_emission[scope3_emission[s3_category]>0].groupby(['category'])[s3_category].mean()).T.reset_index())
    
    
scope3_data_category_wise = pd.concat(temp).rename(columns={'level_0':'scope3_category','level_1':'stat','index':'scope3_category'}).fillna(0.0).T
scope3_data_category_wise.columns=scope3_data_category_wise.iloc[0]
scope3_data_category_wise.drop(['scope3_category'],inplace=True)


scope3_data_category_wise['Sum_of_scope3_category_wise_average_emissions'] = scope3_data_category_wise[scope3_category].sum(axis=1)



temp = scope1_emission.groupby(['category'])['Gross scope 1 emission (metric tons CO2e)'].describe(percentiles=[0.5]).sort_values(by=['mean'],ascending=False).reset_index().rename(columns={'mean':'scope 1 avg. emission'})\

temp.head(10).style.set_caption('Industry-wise gross scope 1 emission (metric tons CO2e)')

In [None]:
temp1 = scope3_emission.groupby(['category'])['Gross scope 3 emission (metric tons CO2e)'].describe(percentiles=[0.5]).sort_values(by=['mean'],ascending=False).reset_index().rename(columns={'mean':'scope 2 avg. emission'})
temp2 = scope2_emission_market_based.groupby(['category'])['Gross market based scope 2 emission (metric tons CO2e)'].describe(percentiles=[0.5]).sort_values(by=['mean'],ascending=False).reset_index().rename(columns={'mean':'scope 3 avg. emission'})
temp3 = temp.merge(temp1,on='category').merge(temp2,on='category')
temp3[['category', 'scope 1 avg. emission', 
       'scope 2 avg. emission',  'scope 3 avg. emission']].style.set_caption('Emissions in scope1,2 and 3 across industry categories')

In [None]:
sum_ghg = temp3[['scope 1 avg. emission','scope 2 avg. emission','scope 3 avg. emission']].sum(axis=1).values
temp3['scope 1 avg. emission'] = temp3['scope 1 avg. emission']/sum_ghg
temp3['scope 2 avg. emission'] = temp3['scope 2 avg. emission']/sum_ghg
temp3['scope 3 avg. emission'] = temp3['scope 3 avg. emission']/sum_ghg
category=temp3.category.values

fig = go.Figure(data=[
    go.Bar(name='Scope 1 avg. emission', x=category, y=temp3['scope 1 avg. emission']),
    go.Bar(name='Scope 2 avg. emission', x=category, y=temp3['scope 2 avg. emission']),
    go.Bar(name='Scope 3 avg. emission', x=category, y=temp3['scope 3 avg. emission']),
])
# Change the bar mode
fig.update_layout(barmode='stack',title={
        'text': "Share of GHG emission in scope1,2 and 3 across industries",
#         'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

The plot above shows, how associated GHG emission of a corporation differs in different scopes. Base on the nature of the industry, emission in one scope significantly greater than the emission in other scopes.. Although corporations are not directly responsible for such emissions, they do help such emission favouring product and processes by purchasing/availing them.


# KPI - GHG emission favourability by operational cost paid

In this KPI, we will measure a **corporation's carbon favouraility by calculating, how much GHG it has generated, directly or indirectly, for every 1000 USD it has spent**. not only scope 1 and 2, but corporate expenditure towards high emission favouring upstream or downstream scope 3 activities is one of the root causes , that makes it difficult to break the chain. It aggrevates the problem and motivates creation of a GHG favouring trade environment. With this KPI, we will see, how terrible it is.


We gather expenditure data, as 'Cost and expense' or 'Cost of Goods Sold', disclosed in financial reports, reported to US securities and exchange commission.

**GHG favourability by operational cost = Total GHG emissions in scope1,2 and 3/ Total expenditure**

As you can see below, how energy plants , especially the ones running on coal and gas are emitting as high as <b><u>19 metric tonnes of Co2e for every 1000 USD spent in operation.</u> </b>

    
This may result into cheaper energy for short term, but it's environmental side effects are alarming.     

In [None]:
corp_desc = corporate_climate_change_response[['organization','response_value']].sort_values(by='response_value',ascending=True).drop_duplicates(keep='first').rename(columns={'response_value':'description'})

corp_desc['nyse_listing'] = corp_desc['description'].apply(lambda x: 'nyse' in str(x).lower())
corp_desc = corp_desc.merge(cc_us_comps,on='organization',how='left')


financial_data = financial_data.merge(tag_data[['adsh','name']].drop_duplicates(),on='adsh',how='left')

mapping = {}
possible_mismatch = []


for corp in corp_desc['organization'].unique():
        similarity = 0
        candidate = 'None'
    #     print(corp)
    #     print(corp)
    #     print(country)
        for wcorp in tag_data['name'].unique():
            sim = SequenceMatcher(None, corp.lower(), wcorp.lower()).ratio()
    #         print(corp,wcorp,sim)
            if sim>similarity:
                candidate = wcorp
                similarity=sim


        if similarity<0.8:
            possible_mismatch.append((corp,candidate))
        else:
            mapping[corp] = candidate
            
            
            

cost_corps = []

for corp in list(mapping.keys())[:-1]:
# get tags for each corp
#     corp_adsh = financial_data[financial_data['name']==mapping[corp]]['adsh'].unique()
    corp_tags = financial_data[(financial_data['name']==mapping[corp])&\
                               (financial_data['qtrs']==4)&\
                               (financial_data['ddate']==20191231)]['tag'].unique()

    # check presence of the word in tags 
    for tag in corp_tags:
        if ('CostsAndExpenses'.lower()==tag.lower()) or ('CostOfGoodsAndServicesSold'.lower()==tag.lower()):
             cost_corps.append(corp)
             break   

missing = []
for corp in cost_corps:
    
    if corp not in scope1_emission.organization.values:
#         print(corp)
        missing.append(corp)
    elif corp not in scope2_emission_location_based.organization.values:
#         print(corp)
        missing.append(corp)
    elif corp not in scope3_emission.organization.values:
#         print(corp)  
        missing.append(corp)
        
selected_corps = set(cost_corps)-set(missing)        


# get corp name
ghg_favourability_kpi = {}

for corp in list(selected_corps):
    corp_adsh = financial_data[financial_data['name']==mapping[corp]]['adsh'].unique()
    corp_tags = financial_data[financial_data['name']==mapping[corp]]['tag'].unique()
#     print(corp_tags)
    est_cost = None
    total_emission = None
# check if cost & expense data is available
    if 'CostsAndExpenses' in corp_tags:
        est_cost = financial_data[(financial_data['name']==mapping[corp])&\
                              (financial_data['tag']=='CostsAndExpenses')&\
                              (financial_data['qtrs']==4)&\
                              (financial_data['ddate']==20191231)]['value'].values[0]   
#         print(corp,est_cost)
        # get scope 1, scope 2 , scope 3 emission data
        scope1_em = scope1_emission['Gross scope 1 emission (metric tons CO2e)'][scope1_emission['organization']==corp].values[0]
        scope3_em = scope3_emission['Gross scope 3 emission (metric tons CO2e)'][scope3_emission['organization']==corp].values[0]
        scope2_em = scope2_emission_location_based['Gross location based scope 2 emission (metric tons CO2e)'][scope2_emission_location_based['organization']==corp].values[0]
    
        total_emission = scope1_em+scope2_em+scope3_em
        ghg_favourability = (total_emission/est_cost)*1000
        ghg_favourability_kpi[corp] = {'GHG favourability by operational cost':ghg_favourability,'estimated cost & expenses':est_cost,'total emission':total_emission}

# if not available, check if COGS data is available
    elif 'CostOfGoodsAndServicesSold' in corp_tags:
        est_cost = financial_data[(financial_data['name']==mapping[corp])&\
                              (financial_data['tag']=='CostOfGoodsAndServicesSold')&\
                              (financial_data['qtrs']==4)&\
                              (financial_data['ddate']==20191231)]['value'].values[0]    
#         print(corp,est_cost)
        # get scope 1, scope 2 , scope 3 emission data
        scope1_em = scope1_emission['Gross scope 1 emission (metric tons CO2e)'][scope1_emission['organization']==corp].values[0]
        scope3_em = scope3_emission['Gross scope 3 emission (metric tons CO2e)'][scope3_emission['organization']==corp].values[0]
        scope2_em = scope2_emission_location_based['Gross location based scope 2 emission (metric tons CO2e)'][scope2_emission_location_based['organization']==corp].values[0]
    
        total_emission = scope1_em+scope2_em+scope3_em
        ghg_favourability = (total_emission/est_cost)*1000
        ghg_favourability_kpi[corp] = {'GHG favourability by operational cost':ghg_favourability,'estimated cost & expenses':est_cost,'total emission':total_emission}

    
    
    
    
ghg_favourability = pd.DataFrame(ghg_favourability_kpi).T.sort_values(by='GHG favourability by operational cost',ascending=False).reset_index().rename(columns={'index':'organization'})\
.merge(cc_us_comps[['organization','activities']],on='organization',how='left') 



ghg_favourability.head(10).style.set_caption('GHG emission in metric tonnes of Co2e for 1000 USD spent in operation')

In [None]:
sns.distplot(ghg_favourability['GHG favourability by operational cost']);

# Carbon Dependency KPI

Carbon dependency is the growth of carbon usage in terms of scope 1 GHG emission, over the previous year.
Carbon dependency shows, whether a corporation is making any sustained effort to bring down the emission in the their own operations.

**Carbon dependency =  Scope 1 GHG emission (metric tonnes of Co2) on reporting year/Scope 1 GHG emission (metric tonnes of Co2) on reporting year -1 or previous year** 


As the below plot reveals, for some of them, carbon dependency is decreasing in steady manner. But for many, a sustained effort is not reflecting.

In [None]:
# carbon dependency


q_num = 'C6.1'
row_name = ['Reporting year','Past year 1']
col_name = ['C6.1_C1Gross global Scope 1 emissions (metric tons CO2e)']

carbon_dependency = corporate_climate_change_response[(corporate_climate_change_response['question_number']==q_num)&\
                                  (corporate_climate_change_response['column_name'].isin(col_name))
#                                   (corporate_climate_change_response['row_name'].isin(row_name))
#                                   (corporate_climate_change_response['organization'].isin([temp.organization.values]))  
                                 ][['organization','row_name','column_name','response_value']]\
.replace('0.0',np.nan)\
.dropna()\
.pivot(index='organization',values='response_value',columns='row_name')\
.fillna(0.0)\
.reset_index()\
.merge(cc_us_comps[['organization','activities','category']],on='organization',how='left')\
# .rename(columns={'Reporting year':'Gross scope 1 emission (metric tons CO2e)'})
carbon_dependency['Reporting year'] = carbon_dependency['Reporting year'].astype('float')
carbon_dependency['Past year 1 '] = carbon_dependency['Past year 1 '].astype('float')
carbon_dependency['Past year 2'] = carbon_dependency['Past year 2'].astype('float')
carbon_dependency['Past year 3'] = carbon_dependency['Past year 3'].astype('float')

carbon_dependency['carbon dependency: Reporting year'] = carbon_dependency['Reporting year']/carbon_dependency['Past year 1 '] 
carbon_dependency['carbon dependency: Reporting year - 1'] = carbon_dependency['Past year 1 ']/carbon_dependency['Past year 2'] 
carbon_dependency['carbon dependency: Reporting year - 2'] = carbon_dependency['Past year 2']/carbon_dependency['Past year 3']

carbon_dependency.replace(np.inf,np.nan,inplace=True)
carbon_dependency.dropna(0.0,inplace=True)
# scope1_emission['Gross scope 1 emission (metric tons CO2e)'] = scope1_emission['Gross scope 1 emission (metric tons CO2e)'].astype('float')

carbon_dependency.drop(['category'],axis=1).reset_index(drop=True).sort_values(by='carbon dependency: Reporting year',ascending=False).head(10)\
.style.set_caption('Carbon Dependency of the companies over last 3 years')

In [None]:
temp = carbon_dependency[['organization','carbon dependency: Reporting year','carbon dependency: Reporting year - 1','carbon dependency: Reporting year - 2']]\
.melt(id_vars=['organization'],value_vars=['carbon dependency: Reporting year','carbon dependency: Reporting year - 1','carbon dependency: Reporting year - 2'])\
.rename(columns={'variable':'Year','value':'Carbon Dependency'}).sort_values(by='Year',ascending=False)

fig = go.Figure()

for org in temp['organization'].values:
    
    fig.add_trace(go.Scatter(y=temp['Carbon Dependency'][temp['organization']==org], x=temp['Year'][temp['organization']==org],mode='lines+markers',name=org))

fig.update_layout(
    title={
        'text': "Carbon Dependency of corporations over last 3 years",
#         'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
    
fig.show()

# Conclusion


**Last century's solutions are this century's problems.** Perhaps that's the best way to describe the environmental issues, we are facing today.
In this notebook,we have explored how current energy generation methods and fuels are damaging two things, most essential for our survival. Air and water. 

KPIs that we developed have good capacity to shed lights on present and future dangers , that our cities and corporations are going to face together. We have also discussed , how adaptation efforts as reported by cities can be summarized in one index and can be used against risk mapping KPIs. This will give us insights on , how good or bad cities are responding to the risks, they are facing. We have also tried to assess through KPIs, how much of water recovry can be done and have shown, it could be sufficient enough to ensure, no drought like conditions occur in city or city surroundings.

We have also shown, how we can create a KPI to measure sustained effort of corporations to reduce GHG emissions. We have developed KPI to measure a corporations favoribility towards GHG emission, in terms of it's expenditure. With more data, we belive these KPIs can further be developed to explore deeper and broader aspects.

# References 

* Corporate Carbon Performance Indicators, Carbon Intensity, Dependency, Exposure, and Risk 
  Volker H. Hoffmann and Timo Busch