# **A. ANALYSIS BACKGROUND**

As many people have experienced, the earth feels much hotter recently. Quoted from an article of Greenpeace Indonesia, earth’s temperature has increased significantly. Cities In Southeast Asia for example, reached 37° - 50° C in the middle of April, 2023.


This situation makes many people wonder, what could probably cause it? Therefore, this independent research is conducted to describe the possibility cause of the rise of earth’s temperature through data analytics. In order to determine the cause, this research analyzed the world greenhouse gas emissions data in Southeast Asia countries. 

The output of this research is to describe what type of emissions that increased over 1990 - 2020, what are the sources, which country produced the highest emission,  and to diagnose the cause of the emission increase.

# **B. PROBLEM STATEMENTS**

**Descriptive Analysis**

1. How the greenhouse gas emission level changes in Southeast Asia in 1990 - 2020?
2. What substance of emissions produced between 1990 - 2020?
3. Where are the emission sources?
4. Which country in Southeast Asia produced the most greenhouse gas?

---

**Diagnostic Analysis**

What caused  the emission increase?
*   **Hypothesis**: number of population affects the number of greenhouse gas emission


# **C. DATA PREPARATION**

In [1]:
import pandas as pd #to manage the dataset
import plotly.express as px #to help in data visualization
import numpy as np #to calculate correlation coefficient

**1. Reviewing Columns of the Dataset**

In [2]:
df=pd.read_csv('/content/drive/MyDrive/Colab Notebooks/World CO2 Emission/owid-co2-data.csv')
df.columns

Index(['iso_code', 'country', 'year', 'co2', 'consumption_co2',
       'co2_growth_prct', 'co2_growth_abs', 'trade_co2', 'co2_per_capita',
       'consumption_co2_per_capita', 'share_global_co2', 'cumulative_co2',
       'share_global_cumulative_co2', 'co2_per_gdp', 'consumption_co2_per_gdp',
       'co2_per_unit_energy', 'coal_co2', 'cement_co2', 'flaring_co2',
       'gas_co2', 'oil_co2', 'other_industry_co2', 'cement_co2_per_capita',
       'coal_co2_per_capita', 'flaring_co2_per_capita', 'gas_co2_per_capita',
       'oil_co2_per_capita', 'other_co2_per_capita', 'trade_co2_share',
       'share_global_cement_co2', 'share_global_coal_co2',
       'share_global_flaring_co2', 'share_global_gas_co2',
       'share_global_oil_co2', 'share_global_other_co2',
       'cumulative_cement_co2', 'cumulative_coal_co2',
       'cumulative_flaring_co2', 'cumulative_gas_co2', 'cumulative_oil_co2',
       'cumulative_other_co2', 'share_global_cumulative_cement_co2',
       'share_global_cumulative_c

**2. Selecting Range of Data and Handling the missing values**

Base: `'country', 'year'`

Green House Gas (GHG) Elements: `'co2', total_ghg', 'methane','nitrous_oxide'`

CO2 Source: `'coal_co2', 'cement_co2', 'flaring_co2' 'gas_co2', 'oil_co2', 'other_industry_co2'`

Cause of the GHG increase: `'population', 'gdp'`

In [3]:
#selecting columns
df1 = df.loc[:,['country', 'year', 'co2', 'total_ghg', 'methane','nitrous_oxide', 'coal_co2', 'cement_co2', 
                'flaring_co2', 'gas_co2', 'oil_co2', 'other_industry_co2', 'population']]

#selecting countries
df2 = df1[df1['country'].isin(['Indonesia','Malaysia','Brunei','Singapore','Vietnam','Cambodia','Philippines',
                               'Laos','Thailand','Myanmar'])]

#Reviewing missing values
df2.isnull().sum()

country                 0
year                    0
co2                     7
total_ghg             710
methane               710
nitrous_oxide         710
coal_co2              223
cement_co2            450
flaring_co2           778
gas_co2               527
oil_co2               157
other_industry_co2    980
population              0
dtype: int64

Due to the high number of missing values accross different columns, the range of year is updated to be 1990 and above in order to gain more representative insights.

In [4]:
#selecting the year above 1990
df3 = df2[df2['year']>=1990]
df3.isnull().sum()

country                 0
year                    0
co2                     0
total_ghg              40
methane                40
nitrous_oxide          40
coal_co2               57
cement_co2             81
flaring_co2           190
gas_co2                71
oil_co2                 0
other_industry_co2    310
population              0
dtype: int64

The missing values still exist, but all of the columns will still be used to describe and visualize the data in general.

# **D. DATA VISUALIZATION & INSIGHTS**

## **D.1. DESCRIPTIVE ANALYSIS**

**a. Annual Greenhouse Gas Emission in Southeast Asea**

In [5]:
#grouping the dataset by year to simplify the visualization
df4 = df3.groupby(['year']).sum().reset_index()

#Visualization of greenhouse gas emission substances in 1990 - 2020
GHG_yearly = px.line(df4, x = 'year', y = ['total_ghg','co2','methane','nitrous_oxide'], title = 'GHG Emission Level Changes from 1990 - 2020 in ASEAN Region')
GHG_yearly.update_layout(height = 400, width = 700, yaxis_title = 'GHG Emission (million tonnes)')
GHG_yearly

  df4 = df3.groupby(['year']).sum().reset_index()


1. The data of total greenhouse gas emission (blue line) fluctuated,  yet it showed an increasing trend from 1990 - 2016.
2. There is a continuous  increasing trend in the CO2 emission data (orange line)..
3. Methane and nitrous oxide emission data (green & purple line) showed constant trend from 1990 - 2016.
4. The data for total greenhouse gas, methane, and nitrous oxide  experienced a significant decline to 0 in 2017 afterwards due to missing values after the year 2016.

As there is no missing value in CO2 data, let’s take a deeper look at CO2 emissions:

In [6]:
# Visualization of CO2 emission substances in 1990 - 2020
CO2_yearly = px.line(df4, x = 'year', y = 'co2', title = 'CO2 Level Changes from 1990 - 2020 in ASEAN Region')
CO2_yearly.update_layout(height = 400, width = 700, yaxis_title = 'CO2 (million tonnes)')
CO2_yearly

The data showed a constant increasing trend from 1990 - 2020. The peak happened in 2019 where the CO2 level was approximately 1.7 million tonnes.

**b. Greenhouse gas emission by countries**

In [7]:
GHG_total = px.pie(df3, names='country', values ='total_ghg', title = 'GhG Emission of ASEAN Countries')
GHG_total.update_layout(height = 400, width = 400)
GHG_total

Indonesia’s greenhouse gas emission is significantly higher than the other ASEAN countries. On the other hand, Singapore, Laos, and Brunei were the countries where greenhouse gas were least produced.

**c. Greenhouse Gas Substances**

In [8]:
#grouping the dataset by country
groupby_country = df3.groupby(['country']).sum().reset_index()

#Visualization of Greenhouse Gas Substances by Countries
GHG_cmp = px.bar(groupby_country, x='country',y=['co2','methane','nitrous_oxide'], 
             title = 'GhG Emission Substances of ASEAN Countries', barmode = 'group')
GHG_cmp.update_layout(height = 400, width = 700, yaxis_title = 'Emission (million tonnes)')
GHG_cmp

If we take a deeper look into the data of greenhouse gas substances, CO2 is the highest compound produced by most of the ASEAN countries with Indonesia as the highest contributor. On the other hand, the least substances produced were nitrous oxide.

**d. Greenhouse Gas Sources**

Unfortunately, the dataset doesn't provide information about the source of all of the greenhouse gas. To get representative insights about the source, we will use the data of CO2 sources.

In [9]:
GHG_src = px.bar(groupby_country, x='country',y=['coal_co2','oil_co2','cement_co2','gas_co2','flaring_co2','other_industry_co2'], 
             title = 'GHG Emission Source of ASEAN Countries', barmode = 'group')
GHG_src.update_layout(height = 400, width = 700, yaxis_title = 'Emission (million tonnes)')
GHG_src

For most of the countries, the highest source of CO2 came from oil and gas. The least source came from flaring and the other industries.

## **D.2. DIAGNOSTIC ANALYSIS**

In this section, we will analyse the cause of increasing number of the greenhouse gas emission. First, we will take a look at the population data of Southeast Asian Countries

In [10]:
asean_ppl = px.bar(df3, x = 'country', y = 'population', title = 'Population of ASEAN Countries', color = 'year')
asean_ppl.update_layout(height = 400, width = 800, yaxis_title = 'Population')
asean_ppl.show()

Not only Indonesia produced the highest greenhouse gas emission as we can see from the previous charts, but in this chart  Indonesia also has the highest population number amongst all of the ASEAN countries as seen in this chart. This indicates that there might be **a correlation** between **the number of population** and **greenhouse gas emission level**.


Creating scatter plots and calculating the coefficient correlation values between population number and greenhouse gas substances:

In [11]:
#removing null values in 'total_ghg' column
data_ppl_ghg = df3[pd.notnull(df3['total_ghg'])]

#Scatter Plot of Population vs Methane Emissions
ppl_ghg = px.scatter(data_ppl_ghg, x = 'population', y = 'total_ghg', title = 'Population vs GHG Emissions', color = 'country')
ppl_ghg.update_layout(height = 400, width = 700, yaxis_title = 'GHG Emission (million tonnes)')
ppl_ghg.show()

#Calculating the Correlation Coefficient
corr = np.corrcoef(data_ppl_ghg['population'], data_ppl_ghg['total_ghg'])
print('correlation coefficient =',corr[0,1])

correlation coefficient = 0.9026476195558496


In [15]:
#removing null values in 'nitrous_oxide' column
data_ppl_n2o = df3[pd.notnull(df3['nitrous_oxide'])]

#Scatter Plot of Population vs GHG Emissions
ppl_n2o = px.scatter(data_ppl_n2o, x = 'population', y = 'nitrous_oxide', title = 'Population vs N2O Emissions', color = 'country')
ppl_n2o.update_layout(height = 400, width = 700, yaxis_title = 'N2O Emission (million tonnes)')
ppl_n2o.show()

#Calculating the Correlation Coefficient
corr = np.corrcoef(data_ppl_n2o['population'], data_ppl_n2o['nitrous_oxide'])
print('correlation coefficient =',corr[0,1])

correlation coefficient = 0.943667099174946


In [18]:
#removing null values in 'methane' column
data_ppl_mth = df3[pd.notnull(df3['methane'])]

#Scatter Plot of Population vs Methane Emissions
ppl_mth = px.scatter(data_ppl_mth, x = 'population', y = 'methane', title = 'Population vs Methane Emissions', color = 'country')
ppl_mth.update_layout(height = 400, width = 700, yaxis_title = 'Methane Emission (million tonnes)')
ppl_mth.show()

#Calculating the Correlation Coefficient
corr = np.corrcoef(data_ppl_mth['population'], data_ppl_mth['methane'])
print('correlation coefficient =',corr[0,1])

correlation coefficient = 0.9262826130794158


In [19]:
#Scatter Plot of Population vs CO2 Emissions
ppl_co2 = px.scatter(df3, x = 'population', y = 'co2', title = 'Population vs CO2 Emissions', color = 'country')
ppl_co2.update_layout(height = 400, width = 700, yaxis_title = 'CO2 Emission (million tonnes)')
ppl_co2.show()

#Calculating the Correlation Coefficient
corr = np.corrcoef(df3['population'], df3['co2'])
print('correlation coefficient =',corr[0,1])

correlation coefficient = 0.8078021396132222


It turns out that all of the coefficient values are approaching 1.00 which means that there is a strong positive correlation between population number and greenhouse gas substances. In other words, the emission level is greatly affected by the number of population.


In addition, as we can see from the scatter plots, most of the data spread from bottom left to the upper right, indicating strong correlation between the population and greenhouse substances.

# **E. CONCLUSION**

1. In Southeast Asia, total greenhouse gas emission and CO2 level experienced increasing trend in 1990 - 2020.
2. CO2 were the highest greenhouse gas substance produced by most of the Southeast Asian countries. On the other hand, nitrous oxide were the least substance.
3. For most of the countries, the highest source of CO2 came from oil and gas. The least source came from flaring and the other industries.
4. Indonesia is the country with the highest production of greenhouse gas emission.
5. The dataset has high correlation coefficient between number of population and greenhouse gas level which indicates that the number of population strongly affecting the greenhouse gas emission level. 
