# *The road to Emissions Regression:* Identifying Challenges and Opportunities Towards Climate Mitigation


**Members**: 

ESQUILLO, Vince

QUIRAPAS, Andre 

SAGRIT, Rina

VILLALON, Marife

## Introduction

Climate change is one of the most pressing problems in the world. In 2019, CO2 emissions peaked by 36.7 billion tons in a span of 20 years (Ritchie & Roser, 2020). The
world is currently 1.0°C degrees warmer than pre-industrial levels, and is currently on
track to becoming 1.5°C degrees warmer between 2032 and 2050 (IPCC, 2018). It is
therefore imperative that countries around the world reduce their greenhouse gas (GHG)
emissions to mitigate the risks associated with climate change. However, mitigating
humanity’s impact on climate is a very challenging issue because it is inherently tied to
economic, demographic, and technological issues. For instance, there are fears that
decarbonization may slow down the economic growth of nations. This is illustrated by Zhao
et. al. (2016), who argued that carbon emissions from China and the United States were
heavily attributed to the growing demand for manufacturing activities especially in textile
products, electrical machinery, and transport equipment. Thus, to prescribe proper
solutions towards climate change mitigation, policymakers will have to understand the
impact of economic, technological, and demographic issues that may collide with the global
goal of decarbonization.

Hence, this project will attempt to do the following: 

1. Analyze the relationship between CO2 emissions and different factors using
regression models in machine learning
2. Evaluate and compare different regression models in terms of performance metrics to be used
3. Identify which factors contribute in increasing CO2 emissions

It is worth noting that this project is a **Regression Task**. 

## Data and Features

The **dependent
variable** would be the volume of **Kyoto GHG emissions**, and the **independent variables**
would be **GDP Per Capita**, **Population**, **per capita energy consumption from fossil fuels**,
**renewables**, and **nuclear energy**, and **cumulative production** from five selected carbon
intensive sectors.

It is also worth noting that **separate modelling will be done for G7 and Developing 7 countries**. Hence, two modelling efforts will be in this project.

## List of Requirements

The requirements for this project are as follows:

``` 
Python implementation: CPython
Python version       : 3.9.7
IPython version      : 7.28.0

pandas: 1.3.4
numpy : 1.20.3
matplotlib: 3.4.3
seaborn   : 0.11.2
sklearn   : 1.0
scipy     : 1.7.1
linearmodels: 4.24
shap        : 0.40.0 
```


For ease of use, the following text can be saved as a `.yml` file to create a conda environment 

```
name: ghg_project
dependencies: 
    - python=3.9.7
    - ipython=7.28.0 
    - pandas=1.3.4
    - numpy=1.20.3
    - matplotlib=3.4.3 
    - seaborn=0.11.2
    - sklearn=1.0 
    - scipy=1.7.1 
    - linearmodels=4.24
    - shap=0.40.0
```

For the files, this project will be submitted with all the directories such that it will be easier to run everything.

## Notebook File Guide

- `01_data_wrangling.ipynb` contains the data wrangling process of the project. 

    > This is where countries and features of interest were extracted from the datasets, and the data were all converted to *tidy format*.<br><br>
    
- `02_eda.ipynb` contains the exploratory data analysis of the project.

    > This notebook basically explores the difference between the G7 and 7 Developing countries. It also contains both Quantative-Quantitative and Categorical-Quantitative Analysis.<br><br>

- `03_preprocessing.ipynb` contains data imputation, standard scaling.

    > This notebook contains the rationale for each imputation method, and investigations on the nature of the missing data. <br><br>
    

- `04a_g7_modelling.ipynb` and `04b_dev7_modelling.ipynb` are the modelling and insights notebooks.

    > These notebooks contain the use of an econometric model, machine learning approaches with feature selection, gridsearch, and model interpretation with `SHAP`


## Sources

IPCC. (2018). Summary for policymakers. In Global warming of 1.5°C (pp. 3-26). Geneva,
Switzerland: World Meteorological Organization.

Ritchie, H., & Roser, M. (2020). CO₂ and greenhouse gas emissions. Our world in data.

Zhao, Y., Wang, S., Zhang, Z., Liu, Y., & Ahmad, A. (2016). Driving factors of carbon
emissions embodied in China–US trade: a structural decomposition analysis. Journal
of Cleaner Production, 131, 678-689.