# Dataset and Preprocessing

## Dataset 1 : Greenhouse gas emission in the Netherlands

This dataset includes emissions of the gases carbon dioxide, methane, and nitrogen in the Netherlands from 1990 up until 2017. The data is ordered by emission source, such as pharmaceutical industry, households, agriculture and fishery. The emissions themselves are measured in million kilograms, and are ratio values. This dataset can be used to calculate increase in greenhouse gas emissions and compare against datasets of the emissions of other countries. The dataset was taken from <a href='https://www.kaggle.com/datasets/janheindejong/greenhouse-gas-emissions-in-the-netherlands?select=IPCC_emissions.csv'> Kaggle </a> and needed to be seperated by semicolon to be deemed usable. 



In [1]:
import pandas as pd

df_greenhouse = pd.read_csv("../datasets/IPCC_emissions.csv", sep=';')
df_greenhouse.head()

Unnamed: 0,ID,Bronnen,Perioden,CO2_1,CH4_2,N2O_3
0,0,T001176,1990JJ00,163120,1278.17,59.49
1,1,T001176,1995JJ00,173520,1192.41,59.84
2,2,T001176,2000JJ00,172290,975.64,53.01
3,3,T001176,2001JJ00,177390,949.16,49.71
4,4,T001176,2002JJ00,176670,904.27,47.01


## Dataset 2 : Greenhouse gas emission around the world

This dataset includes data about the greenhouse gas emissions from various countries in the world like Australia or France, and certain areas like the European Union from the years 1990 up to 2014. The gases of which the emissions are measured are among others, nitrogen, carbon dioxide, sulphur and methane, along with a total of all GHG gas emission, which are categorical values. The values of the emission themselves are ratio values and are measured in million kilograms. The dataset was taken from <a href='https://www.kaggle.com/datasets/unitednations/international-greenhouse-gas-emissions'>Kaggle</a> and did not need to be cleaned up.

In [2]:
df_gas_emission = pd.read_csv("../datasets/greenhouse_gas_inventory_data_data.csv")
df_gas_emission.head()

Unnamed: 0,country_or_area,year,value,category
0,Australia,2014,393126.946994,carbon_dioxide_co2_emissions_without_land_use_...
1,Australia,2013,396913.93653,carbon_dioxide_co2_emissions_without_land_use_...
2,Australia,2012,406462.847704,carbon_dioxide_co2_emissions_without_land_use_...
3,Australia,2011,403705.528314,carbon_dioxide_co2_emissions_without_land_use_...
4,Australia,2010,406200.993184,carbon_dioxide_co2_emissions_without_land_use_...


## Dataset 3 : Statistics per Country

This dataset shows several different statistics from nearly every county in de United Nations in 2017. Some of these 50 statistics are population, GDP, agriculture economy, internet use, and emission estimates. The values of these variables are denoted in their name, like number, percent or square kilometer, and are nominal, ordinal or ratio values. The dataset was not preprocessed before it was used and can be found on <a href='https://www.kaggle.com/datasets/sudalairajkumar/undata-country-profiles?select=kiva_country_profile_variables.csv'>Kaggle</a>. 

In [3]:
country_profile = pd.read_csv("../datasets/country_profile_variables.csv")
country_profile.head()

Unnamed: 0,country,Region,Surface area (km2),Population in thousands (2017),"Population density (per km2, 2017)","Sex ratio (m per 100 f, 2017)",GDP: Gross domestic product (million current US$),"GDP growth rate (annual %, const. 2005 prices)",GDP per capita (current US$),Economy: Agriculture (% of GVA),...,Mobile-cellular subscriptions (per 100 inhabitants).1,Individuals using the Internet (per 100 inhabitants),Threatened species (number),Forested area (% of land area),CO2 emission estimates (million tons/tons per capita),"Energy production, primary (Petajoules)",Energy supply per capita (Gigajoules),"Pop. using improved drinking water (urban/rural, %)","Pop. using improved sanitation facilities (urban/rural, %)",Net Official Development Assist. received (% of GNI)
0,Afghanistan,SouthernAsia,652864,35530,54.4,106.3,20270,-2.4,623.2,23.3,...,8.3,42,2.1,9.8/0.3,63,5,78.2/47.0,45.1/27.0,21.43,-99
1,Albania,SouthernEurope,28748,2930,106.9,101.9,11541,2.6,3984.2,22.4,...,63.3,130,28.2,5.7/2.0,84,36,94.9/95.2,95.5/90.2,2.96,-99
2,Algeria,NorthernAfrica,2381741,41318,17.3,102.0,164779,3.8,4154.1,12.2,...,38.2,135,0.8,145.4/3.7,5900,55,84.3/81.8,89.8/82.2,0.05,-99
3,American Samoa,Polynesia,199,56,278.2,103.6,-99,-99.0,-99.0,-99.0,...,-99.0,92,87.9,-99,-99,-99,100.0/100.0,62.5/62.5,-99.0,-99
4,Andorra,SouthernEurope,468,77,163.8,102.3,2812,0.8,39896.4,0.5,...,96.9,13,34.0,0.5/6.4,1,119,100.0/100.0,100.0/100.0,-99.0,-99


In [4]:
kiva_country_profile = pd.read_csv("../datasets/kiva_country_profile_variables.csv")
kiva_country_profile.head()

Unnamed: 0,country,Region,Surface area (km2),Population in thousands (2017),"Population density (per km2, 2017)","Sex ratio (m per 100 f, 2017)",GDP: Gross domestic product (million current US$),"GDP growth rate (annual %, const. 2005 prices)",GDP per capita (current US$),Economy: Agriculture (% of GVA),...,Mobile-cellular subscriptions (per 100 inhabitants).1,Individuals using the Internet (per 100 inhabitants),Threatened species (number),Forested area (% of land area),CO2 emission estimates (million tons/tons per capita),"Energy production, primary (Petajoules)",Energy supply per capita (Gigajoules),"Pop. using improved drinking water (urban/rural, %)","Pop. using improved sanitation facilities (urban/rural, %)",Net Official Development Assist. received (% of GNI)
0,Afghanistan,SouthernAsia,652864,35530,54.4,106.3,20270,-2.4,623.2,23.3,...,8.3,42,2.1,9.8/0.3,63,5,78.2/47.0,45.1/27.0,21.43,-99
1,Albania,SouthernEurope,28748,2930,106.9,101.9,11541,2.6,3984.2,22.4,...,63.3,130,28.2,5.7/2.0,84,36,94.9/95.2,95.5/90.2,2.96,-99
2,Armenia,WesternAsia,29743,2930,102.9,88.8,10529,3.0,3489.1,19.0,...,58.2,114,11.7,5.5/1.8,48,46,100.0/100.0,96.2/78.2,3.17,-99
3,Azerbaijan,WesternAsia,86600,9828,118.9,99.3,53049,0.7,5438.7,6.7,...,77.0,97,13.5,37.5/3.9,2459,61,94.7/77.8,91.6/86.6,0.14,-99
4,Belize,CentralAmerica,22966,375,16.4,99.2,1721,1.2,4789.4,14.6,...,41.6,117,60.1,0.5/1.4,9,36,98.9/100.0,93.5/88.2,1.68,-99
