# Mini-Project Report - Analyzing and Recognizing Leading Countries in CO2 Emission Reduction

#### Project Group: Helmi Karesti, Janne Penttala, Salla-Mari Uitto
#### GitHub Repository: [https://github.com/karhelmi/intro-ds-miniproject](https://github.com/karhelmi/intro-ds-miniproject)

## Deciding on a Project Idea

Our project team was formed before the first exercise session on Slack and we started to narrow down ideas during the first exercise session. We toyed with ideas of stock markets, pets in cities and many other topics. In the end, all of us were interested in sustainability so climate change was chosen as a inspiration. From here, we started to narrow down the topic by first deciding on analyzing CO2 emissions. Then we brainstormed on how to make it truly interesting. We started to have ideas on finding out which countries had reduced their emissions the most. Finally, we decided on our topic: Analyzing and Recognizing Leading Countries in CO2 Emission Reduction.

## Motivation and Added Value

Our approach in analyzing countries that have reduced their CO2 emissions effectively highlights positive achievements and encourages the sharing of best practices in the fight against climate change. The thought behind all of this is that the world needs more positive reinforcement and hope.

The target groups of our project are political decision makers, environmental organizations and researchers. These are the groups that aim to find ways to combat climate change: our project aids them in their pursuit by giving examples of countries that have succeeded in reducing CO2 emissions. Our project also introduces measures that have led to the reduction of CO2 emissions. This will aid our target group in decision making.

## Data Collection

Data about CO2 emissions is widely available on the Internet. We compared the data on different sites to each other to make sure that the numbers didn't differ from each other. In the end we decided to use [Global Carbon Atlas](https://globalcarbonatlas.org/) for fetching CO2 emission data for every country. The fetched data is stored in a .csv file. This data is used to find the countries that have reduced their CO2 emissions the most. 

Another aspect in our project is to find what has affected the CO2 emission reduction in the selected countries. For this, we chose to use [Our World in Data](https://ourworldindata.org/). This choice was fairly easy, because we need a lot of different factors to analyze in our project and this single source can provide this for us. The choice on what to analyze was made by thorough evaluation as a group. For example, renewable energy was chosen as one factor because it is generally known to reduce CO2 emissions. This data is also fetched as a .csv file.

When our project progressed, we encountered some problems with the data we initially thought of using. At the beginning, we thought that we would analyze the change between the years 2011 and 2021. However, once we had done our code, we realized fast that not all countries have available data up to the year 2021 on our chosen factors. As a team, we discussed different possibilities for handling this: one simple option was to change the time frame to something else, like 2008-2018, and another simple option was to choose only factors and countries to analyze so that we have the needed data. Some thought was also put on filling the missing values by using linear regression or other means of predicting the values, but due to our tight schedule this was dismissed rather quickly: we wanted to keep our focus on analyzing the factors and not get too sidetracked. In the end, we decided to use variables and countries that have all the data between the years 2011 and 2021. So, now that we were aware of all of our limitations, we decided to include the following 10 variables: meat production, life expectancy, GDP per capita, percentage of internet users (in the population of the country), human development index (HDI), human rights index, population, nuclear energy, energy usage per capita and renewable energy.


## Preprocessing

Our project has two sub-tasks: first one is to find the countries that have reduced CO2 emissions and second is to analyze factors that have helped the country to reduce them. Without even actually looking at the data, we knew that we would have to merge data from several data sources or files together to be able to compare and analyze CO2 emission reduction and factors for each country.

Before starting to write the code, we looked at the csv data in Excel: this way we got a rough idea about what we were up against. For example, we noticed that in addition to each country, our CO2 emission data file contained CO2 emissions also for each continent.

### Code

From the beginning it was clear that we would work with Python. Another option would have been R, but overall Python was the more attractive choice. Our team did not have much experience with it, but we all wanted to learn more. With Python, it was obvious that we would be using Pandas and Numpy to help us process the data and Matplotlib for the visualization.

At first, we started with implementing code that gives us a list of countries and their CO2 emission reduction percentage. We read the csv file to a pandas dataframe and continued to process it in that format. We had to rename and re-arrange columns and rows, change data types and finally calculate the reduction in CO2 emissions. We calculated the reduction in three different units: in MtCO2, in kgCO2/GDP and in tCO2/person.

Next, we had to decide on how we want to analyze the factors that have led to a country reducing their CO2 emissions. This started to form out to be the trickiest part of our whole project and we spent a lot of time just discussing different possibilities. Finally we settled on creating a pandas dataframe for each country, and to that country specific data frame, we would add the CO2 emissions and all the factors as separate columns.

At first, we implemented code that did all of this for one country, Finland. Once that case worked, we started to refactor the code and make functions for separate tasks to make it easier to reuse the code. First, the data frame for a country is created with its CO2 emissions. Then, the frame is filled with all of our chosen variables. Again, we had to wrangle the data in different ways: we set indexes, changed data types and took data from another frame to our country-specific frame. After doing these steps, we have a dataframe for a country with all of the variables we will be analyzing and the CO2 emissions for each year. Based on these dataframes, we create linear regression models, calculate selected statistical measures and create various plots visualizing the data and results. Code is available in the GitHub repository (link available at the beginning of this report).

## Learning Task and Approach

To analyze factors that have possibly aided countries in reducing their CO2 emissions, we decided to use linear regression. Linear regression is a fairly simple model that can be used to analyze relationships between variables. In our project, we use (simple) linear regression to analyze the relationship between CO2 emissions and our chosen factors. 

We are also calculating R-Squared value for each of our simple linear regression. This statistical measure tells us how well the linear regression model fits the data: if the value is 1, the regression model fits the data (nearly) perfectly. In our project, this value is used to help analyze the impact of different factors to the CO2 emission reduction. In addition to R-Squared values we calculated the slope for each linear regression to see if the relationship is negative or positive.

Towards the end of the project, we also calculated the p-value for our linear regressions, which helps us to understand which factors are statistically significant. On top of these statistical values, we need domain knowledge to make correct conclusions. In this case this would be knowledge about climate change.

## Results and Visualizations

We organized all countries in order based on how much they have reduced their CO2 emissions. Figure 1 shows the countries with the biggest reductions and the countries which we analyzed in more detail are marked with dark green.

(figure to be added later)

The countries for further analysis are Estonia, Bosnia and Herzegovina, Greece, Serbia, Finland, Denmark, Malta, Sweden, Montenegro, Hong Kong, Luxembourg, Slovenia and Portugal. All of these countries have reduced their CO2 emissions more than 20% between the years 2011 and 2021. The countries with the biggest reduction, Aruba and Curaçao, were left out of our study due to them being small islands.

For each selected country, we created plots that visualize the linear regression for each of our factors. Figure 2 displays one example of these. Since we have ten factors that we analyzed for each country, each country had ten plots.

(figure to be added later)

In the final analysis we collected the R-squared values, slope values and p-values to excel sheet (available in GitHub) and used them to conclude our final results. The plots (Figure 2) had issues with the axis not starting from zero and thus giving an impression of a bigger change than there truly is.

Based on our analysis, we divided our factors into two groups: to those that had an impact on CO2 emission reductions and to those that need further analysis. In total, four factors have an effect, and these are share of renewables, energy usage per capita, HDI and GDP per capita.

Share of renewables has a negative relationship with CO2 emissions is all countries, so increasing the share of renewable energy reduces CO2 emissions. For most countries, the R-Squared values suggest that our model fits the data well and p-values indicate that the finding is statistically significant. Another clear relationship was detected between energy usage per capita and CO2 emission reduction. In this case, the relationship is positive and thus the more energy is consumed per capita, the more CO2 emissions increase. High R-Squared values and statistically significant p-values support this.

For GDP per capita, the relationship is negative in most countries and in several countries, based on p-value, this is statistically significant. It seems that economic growth is possible even while reducing CO2 emissions. For HDI, the relationship is negative for all countries, which means that as HDI increases, CO2 emissions reduce. However, we recognize that many aspects affect both GDP and HDI and thus these topics would be interesting to study in more detail.

Further analysis would be needed for the other six factors that we analyzed, which are meat production, life expectancy, percentage of internet users, human rights index, population and nuclear energy. For meat production, our results are statistically significant only for four countries, and even in those countries meat production for each year varies. However, based on other  research, meat production is a significant contributor to CO2 emissions [?]. Same goes for internet usage - other research suggests that it is a notable source of CO2 emissions [?], but our analysis did not find a significant relationship there.

Our analysis concludes for life expectancy that it could be a possible motivator for reducing CO2 emissions. However, a more interesting approach would be to compare countries with low life expectancy to countries with high life expectancy. For the human rights index, changes are quite small in the countries that we observed and this leads us to not drawing any conclusions about it, especially since our analysis suggests that as the human rights index goes lower, CO2 emissions are reducing.For population growth, it seems that while population grows, it is possible to also reduce CO2 emissions. 

The last factor that we categorized as a “needs further analysis”, is nuclear energy. Only three countries of the 13 we chose produce nuclear energy and our results are not statistically significant for those countries. To analyze this properly, we would need to focus on more countries that produce nuclear energy or find data on how the share of nuclear energy that the country uses and gets from other countries has changed over time.

### Conclusions

Based on our project work, we would suggest our target group to look into the countries that have reduced CO2 emissions. Our work indicates that using renewable energy and reducing energy usage reduces CO2 emissions efficiently. These both have been actions that countries could implement to reduce their CO2 emissions. Based on our study, reducing CO2 emissions is possible while the economy and HDI grows, so tackling climate change in this perspective does not mean giving up these.

## Final Thoughts and Future Steps

Overall, our project went well, and we did not have to change our initial idea. From the start, we worked on the project on a weekly basis and we did not have to hurry to finish this. As this was overall a great experience for us, we are quite sad that this is only a mini project. Within this topic, there is a lot to analyze and discover.

We did not use Jupyter Notebook in our project and towards the end we realized that it might have been a useful tool. With more time, our code could use some refactoring to make it easily reusable. Missing value handling could also be a palace for more development, as well as making it possible to handle different time periods.

In addition to the factors we have analyzed in this project, it would be interesting to analyze even more factors, like transport and agriculture. Some of our current factors also provide many possibilities for further analysis as we discussed previously. In addition, other greenhouse gasses could also be taken into account.