# Table of Contents
* [Dataset overview](#data_overview)
* [Focus of this notebook](#fo)
* [Import libraries](#import)
* [Data](#data)
    * [Cities disclosing - df_cd](#ci_di)
    * [Cities responses - df_cr](#ci_re)
    * [Corporations disclosing to climate change - df1](#co_di_cc)
    * [Corporations responses to climate change - df2](#co_re_cc)
* [Exploratory Data Analysis](#eda)
    * [Viet Nam](#vi_vi)
    * [Cities responses on climate change, what's in there?](#ci_re_cc)
    * [Viet Nam cities responses on climate change](#vi_ci_cc)
        * [Ho Chi Minh City answers on energy sector](#sg_en)
    * [Corporations responses to climate change](#co_cc)
        * [Companies with climate change activities in Vietnam](#co_cc_vn)
            * [Energy questions](#109vn)

<a id="data_overview"></a>
<center><h1>Dataset overview</h1></center>

### Cities
- Cities disclosing: General information of cities for 3 years 2018, 2019, 2020
- Cities responses: Questions answered by cities for 3 years 2018, 2019, 2020

### Corporations
- Corporations disclosing to climate change or water security: General information of corporations for 3 years 2018, 2019, 2020
- Corporations responses to climate change: Questions answered by corporations for 3 years 2018, 2019, 2020 regarding climate change issues
- Corporations responses to water security: Questions answered by corporations for 3 years 2018, 2019, 2020 regarding water security issues


<a id="fo"></a>
<center><h1>Focuses of this notebook</h1></center>

### Responses of cities in Viet Nam
* Ho Chi Minh City on energy sector
 
### Companies with climate change activities in Viet Nam
 


<a id="import"></a>
<center><h1>Import libraries</h1></center>

In [None]:
import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# to use plotly offline
from plotly.offline import iplot
import plotly.express as px


import cufflinks
cufflinks.go_offline()
cufflinks.set_config_file(world_readable=True, theme='pearl')

# to ignore warnings
import warnings
warnings.filterwarnings('ignore')

# for Markdown printing
from IPython.display import Markdown, display
def printmd(string, color=None):
    colorstr = "<span style='color:{}'>{}</span>".format(color, string)
    display(Markdown(colorstr))

<a id="data"></a>
<center><h1>Data</h1></center>

In [None]:
data_path = '/kaggle/input/cdp-unlocking-climate-solutions'

<a id="ci_di"></a>
## Cities disclosing - df_cd

In [None]:
Cities_Disclosing_2018 = pd.read_csv(data_path+'/Cities/Cities Disclosing/2018_Cities_Disclosing_to_CDP.csv')
Cities_Disclosing_2019 = pd.read_csv(data_path+'/Cities/Cities Disclosing/2019_Cities_Disclosing_to_CDP.csv')
Cities_Disclosing_2020 = pd.read_csv(data_path+'/Cities/Cities Disclosing/2020_Cities_Disclosing_to_CDP.csv')

df_cd = pd.concat([Cities_Disclosing_2018,Cities_Disclosing_2019,Cities_Disclosing_2020],0, ignore_index=True)

<a id="ci_re"></a>
## Cities responses - df_cr

In [None]:
Cities_Responses_2018 = pd.read_csv(data_path+'/Cities/Cities Responses/2018_Full_Cities_Dataset.csv')
Cities_Responses_2019 = pd.read_csv(data_path+'/Cities/Cities Responses/2019_Full_Cities_Dataset.csv')
Cities_Responses_2020 = pd.read_csv(data_path+'/Cities/Cities Responses/2020_Full_Cities_Dataset.csv')

df_cr = pd.concat([Cities_Responses_2018,Cities_Responses_2019,Cities_Responses_2020],0, ignore_index=True)

<a id="co_di_cc"></a>
## Corporations disclosing to climate change - df1

In [None]:
Corporates_Disclosing_to_CDP_Climate_Change_2018 = pd.read_csv(data_path+'/Corporations/Corporations Disclosing/Climate Change/2018_Corporates_Disclosing_to_CDP_Climate_Change.csv')
Corporates_Disclosing_to_CDP_Climate_Change_2019 = pd.read_csv(data_path+'/Corporations/Corporations Disclosing/Climate Change/2019_Corporates_Disclosing_to_CDP_Climate_Change.csv')
Corporates_Disclosing_to_CDP_Climate_Change_2020 = pd.read_csv(data_path+'/Corporations/Corporations Disclosing/Climate Change/2020_Corporates_Disclosing_to_CDP_Climate_Change.csv')

df1 = pd.concat([Corporates_Disclosing_to_CDP_Climate_Change_2018,Corporates_Disclosing_to_CDP_Climate_Change_2019,Corporates_Disclosing_to_CDP_Climate_Change_2020],0, ignore_index=True)

<a id="co_re_cc"></a>
## Corporations responses to climate change - df2

In [None]:
Corporations_Responses_to_Climate_Change_2018 = pd.read_csv(data_path+'/Corporations/Corporations Responses/Climate Change/2018_Full_Climate_Change_Dataset.csv')
Corporations_Responses_to_Climate_Change_2019 = pd.read_csv(data_path+'/Corporations/Corporations Responses/Climate Change/2019_Full_Climate_Change_Dataset.csv')
Corporations_Responses_to_Climate_Change_2020 = pd.read_csv(data_path+'/Corporations/Corporations Responses/Climate Change/2020_Full_Climate_Change_Dataset.csv')

df2 = pd.concat([Corporations_Responses_to_Climate_Change_2018,Corporations_Responses_to_Climate_Change_2019,Corporations_Responses_to_Climate_Change_2020],0, ignore_index=True)

<a id="eda"></a>
<center><h1> Exploratory Data Analysis</h1></center>

<a id="vi_vi"></a>
## Viet Nam, Vietnam?
There are inconsistences in calling my country name. In Vietnamese, it's 2 seperated words, so it should be Viet Nam

In [None]:
def clean_text_round1(text):
    '''Vietnam, VietNam, Viet nam to Viet Nam'''
    
    text = text.replace("Vietnam",  "Viet Nam")
    text = text.replace("VietNam",  "Viet Nam")
    text = text.replace("Viet nam", "Viet Nam")

    return text

round1 = lambda x: clean_text_round1(x)

In [None]:
df_cd = pd.DataFrame(df_cd.apply(round1))
df_cr = pd.DataFrame(df_cr.apply(round1))
df1 = pd.DataFrame(df1.apply(round1))
df2 = pd.DataFrame(df2.apply(round1))

## Viet Nam cities disclosing

In [None]:
df_cd.loc[df_cd.Country == 'Viet Nam']

<a id="ci_re_cc"></a>
## Cities responses on climate change, what's in there?
In cities responses, there are a lot of information about different fields (energy, GHG emissions data, water supply,...). These information store in `Section` column

In [None]:
printmd('All sections in cities responses on climate change', color="#255483")
print(df_cr['Section'].unique())

<a id="vi_ci_cc"></a>
## Viet Nam cities responses on climate change

In [None]:
tmp = df_cr.loc[df_cr.Country =='Viet Nam'].Organization.value_counts().to_frame('')
fig = px.bar(tmp,color_discrete_sequence=["#67a9cf"])

fig.update_layout(
    title = dict(text='Number of responses from Viet Nam cities'),
    font  = dict(family="Calibri",size=13,color="RebeccaPurple"),
    xaxis = dict(title='',tickmode='auto'),
    yaxis = dict(title='Count',tickmode='auto'),
#     legend = dict(yanchor="middle",y=0,xanchor="center",x=1),
    width=900, height=500)

<a id="sg_en"></a>
### Ho Chi Minh City answers on energy questions
- All answers are from 2018
- Energy mix: only some types of energy have percentage value which are

| Energy type| Percentage| 
|---|:---:| 
| Biomass  |8.1|
| Coal  |27.5|
| Gas  |1.9|
| Oil  |47|

- Question 3 is about decision from Prime Miniter back to 2017: The electricity from solar grid connected with the electricity purchase price at VND 2,086 per kWh (excluding VAT, equivalent to 9.35 cents / kWh) <br>
***Please note that:***
    - This price has changed severel times until now
    - Electricity of Vietnam (EVN) is a state owned company that covers almost all elements from generation to retail. 

In [None]:
hcm_energy = df_cr.loc[(df_cr.Organization =='Ho Chi Minh City') & (df_cr['Section'] =='Energy')]
answers_columns = ['Year Reported to CDP','Column Name','Response Answer',]
for i,q in enumerate(hcm_energy['Question Name'].unique()):
    statement = str(i+1)+ '. ' + q
    printmd(statement, color="#255483")
    display(hcm_energy[answers_columns].loc[hcm_energy['Question Name'] ==q].reset_index(drop=True))

<a id="co_cc"></a>
## Corporations responses to climate change
Like cities responses, a lot of questions have been answered by companies, one of them is: `Question number C0.3: Select the countries/regions for which you will be supplying data.` <br>This is where we can look for information whether companies have climate action in a country <br>
=> ***df2_countries***: dataframe that contains all the answers for question C0.3

Other column is `module_name` that indicates which field question the question belong to (energy or emissions or...)



In [None]:
df2_countries = df2[['account_number','organization','response_value']].loc[df2.question_number =='C0.3'].reset_index(drop=True)

print(df2_countries.shape)
df2_countries.head()

Other column is `module_name` that indicates which field question the question belong to (energy or emissions or...)<br>
A lot of questions belong to the module `C8. Energy`

In [None]:
df2.module_name.value_counts()

<a id="co_cc_vn"></a>
## Companies with climate change activities in Vietnam

109 companies declare that they have climate change actitity in Viet Nam, we will look on information they provided, especially on energy sector

`vn_corps`: dataframe of all companies that have climate change activity in Viet Nam

In [None]:
v = []
for i in range(df2_countries.shape[0]):
    if ('Viet Nam' in str(df2_countries.iloc[i,2])): 
        v.append(i)

vn_corps = df2_countries[['account_number','organization']].iloc[v].drop_duplicates()
vn_corps.reset_index(drop=True,inplace=True)

statement = "There are " + str(vn_corps.shape[0]) + ' companies that might have climate change action in Viet Nam'
printmd(statement, color="#255483")
vn_corps.head()

### Responses from those 109 companies, particularly on energy questions

`df2_vn`: all answers from 109 companies that have actitity in Viet Nam

`df2_vn_en`: energy answers from those 109 companies

In [None]:
v = []
for i in range(df2.shape[0]):
    tmp = int(df2.iloc[i,0])
    if tmp in vn_corps.account_number.to_list():
        v.append(i)
df2_vn = df2.iloc[v]
df2_vn.head()

In [None]:
df2_vn_en = df2_vn.loc[df2_vn['module_name'] =='C8. Energy']
df2_vn_en.head()

<a id="109vn"></a>
#### Energy questions
There are 8 questions being asked in energy module

| question_number| question_unique_reference| 
|---|:---:| 
| C8.1  |What percentage of your total operational spend in the reporting year was on energy? |  
| C8.2  |Select which energy-related activities your organization has undertaken. |  
| C8.2a |Report your organization energy consumption totals (excluding feedstocks) in MWh. | 
| C8.2b |Select the applications of your organization consumption of fuel. | 
| C8.2c |State how much fuel in MWh your organization has consumed (excluding feedstocks) by fuel type.| 
| C8.2d |List the average emission factors of the fuels reported in C8.2c.|
| C8.2e |Provide details on the electricity, heat, steam, and cooling your organization has generated and consumed in the reporting year.|
| C8.2f |Provide details on the electricity, heat, steam and/or cooling amounts that were accounted for at a low-carbon emission factor in the market-based Scope 2 figure reported in C6.3|

**`C8.1`**

Percentage of total operational spend in the reporting year was on energy

In [None]:
df2_vn_en.loc[(df2_vn_en['question_number']=='C8.1')]['response_value'].value_counts()

Fill empty answers with `Don't know`

In [None]:
df2_vn_en.loc[(df2_vn_en['question_number']=='C8.1')] = df2_vn_en.loc[(df2_vn_en['question_number']=='C8.1')].fillna("Don't know")
df2_vn_en.loc[(df2_vn_en['question_number']=='C8.1')]['response_value'].value_counts()

2020 answer on question `C8.1`

In [None]:
tmp = df2_vn_en.loc[(df2_vn_en['question_number']=='C8.1') & (df2_vn_en['survey_year'] == 2020)].response_value.value_counts().to_frame('')
fig = px.bar(tmp,color_discrete_sequence=["#67a9cf"])

fig.update_layout(
    title = dict(text='Percentage of total operational spend in the reporting year was on energy, 2020'),
    font  = dict(family="Calibri",size=13,color="RebeccaPurple"),
    xaxis = dict(title='',tickmode='auto'),
    yaxis = dict(title='Number of companies',tickmode='auto'),
    
    width=900, height=600)

**`C8.2`**

Select which energy-related activities your organization has undertaken. Yes/No question

| row_number| row_name| 
|---|:---:| 
| 1  |Consumption of fuel (excluding feedstocks)|
| 2  |Consumption of purchased or acquired electricity|
| 3  |Consumption of purchased or acquired heat|
| 4  |Consumption of purchased or acquired steam|
| 5  |Consumption of purchased or acquired cooling|
| 6  |Generation of electricity, heat, steam, or cooling|

In [None]:
tmp = df2_vn_en.loc[df2_vn_en['question_number']=='C8.2']
for i in tmp['row_name'].unique():
    tmp.loc[tmp['row_name'] ==i].response_value.value_counts().iplot(kind='bar',yTitle='Number of companies', 
                                                                     linecolor='black',color='#67a9cf',bargap=0.8,
                                                                     title=i)