# Datasets

This notebook shows different datasets we have created that can be used for other analysis. You might need to install the required libraries via pip unless you have done it already.

In [None]:
!pip install -r requirements.txt

In [2]:
import pandas as pd

## Regions

The dataset is stored in `data/regions.csv`. It contains the list of all countries together with their ISO 3166 alpha-3 code and several regional classifications we used for the visualizations, both in English and Czech language. A few small territories are excluded from the regional classifications. Continents are not exactly continents in the usual sense, as we decided to keep Russia in its own category, as it overlaps both Europe and Asia.

In [9]:
pd.read_csv('../data/regions.csv').head()

Unnamed: 0,code,name_en,name_cz,region_A_en,region_A_cz,region_B_en,region_B_cz,region_C_en,region_C_cz,region_WB_en,continent_en,continent_cz
0,ABW,Aruba,Aruba,South/Latin America,Jižní Amerika,South/Latin America (others),Jižní a Střední Amerika (ostatní),South/Latin America (others),Jižní a Střední Amerika (ostatní),Latin America & Caribbean,South and Latin America,Jižní a Střední Amerika
1,AFG,Afghanistan,Afghánistán,Asia & Pacific (except China and India),Asie a Oceánie (mimo Čínu a Indii),Asia B,Asie B,Asia (others),Asie (ostatní),South Asia,Asia,Asie
2,AGO,Angola,Angola,Africa,Afrika,Africa (others),Afrika (ostatní),Africa (others),Afrika (ostatní),Sub-Saharan Africa,Africa,Afrika
3,AIA,Anguilla,Anguilla,South/Latin America,Jižní Amerika,South/Latin America (others),Jižní a Střední Amerika (ostatní),South/Latin America (others),Jižní a Střední Amerika (ostatní),,South and Latin America,Jižní a Střední Amerika
4,ALA,Åland Islands,Alandy,,,,,,,,,


## EU Emissions 2018

The dataset is stored in `data/eu-emissions-2018.csv` and contains:
- `pop` = total population (as was in January 2018)
- `ghg` = greenhouse gases emissions (CO2 equivalent) in 2018, in million tonnes
- `ghg_per_capita` = per capita emissions, tonnes CO2eq per capita in 2018

The dataset covers EU28 countries (including the United Kingdom) and the values were sourced from [Eurostat](https://ec.europa.eu/eurostat/home?) via a python [eurostat package](https://pypi.org/project/eurostat/). More information about the dataset creation can be found in [eu-emissions-per-capita notebook](eu-emissions-per-capita.ipynb).

In [6]:
pd.read_csv('../data/eu-emissions-2018.csv').head()

Unnamed: 0,code,country,pop,ghg,ghg_per_capita
0,AT,Austria,8822267,81.50147,9.238155
1,BE,Belgium,11398589,123.6429,10.847211
2,BG,Bulgaria,7050034,58.59507,8.311317
3,CY,Cyprus,864236,9.85631,11.404651
4,CZ,Czechia,10610055,129.38768,12.194817


## World Emissions 2015

The following datasets combine information about greenhouse gases emissions, sizes of economies and population for all countries. All values are for 2015, as there are not newer emission data available for all countries.
- population and GDP data are taken from World Bank (GDP is expressed in constant 2017 international dollars)
- emission data are from EDGAR database, [EDGAR v5.0](https://edgar.jrc.ec.europa.eu/overview.php?v=50_GHG)

Detailed tutorial how the data is obtained and the dataset created provides [emission-intensity notebook](emission-intensity.ipynb).

### Country level

The dataset is stored in `data/world-emissions-2015.csv`, all data is for year 2015:
- `code`: ISO 3166 alpha-3 country codes
- `region` and `continent`: regional division that we use to visualize the data
- `pop`: country population
- `gdp`: GDP is in constant 2017 international dollars
- `ghg`: greenhouse gas emissions, includes CO2, CH4 and N2O, expressed as million tonnes of CO2 equivalent

In [14]:
pd.read_csv('../data/world-emissions-2015.csv').head()

Unnamed: 0,code,region,continent,pop,gdp,ghg
0,ABW,South/Latin America (others),South and Latin America,104341,3889424000.0,1000.182387
1,AFG,Asia B,Asia,34413603,76148730000.0,32756.475563
2,AGO,Africa (others),Africa,27884381,224090400000.0,85591.020029
3,ALB,Europe B,Europe,2880703,34931210000.0,8985.266754
4,ARE,Middle East A,Asia,9262900,604115200000.0,249667.673122


### Aggregated at regional level

The dataset is stored in `data/world-emissions-2015-regions.csv`, all data is for year 2015:
- `region`: unit of aggregation
- `continent`: membership of a particular region (note that Russia is included as a standalone "continent" for the visualization)
- `pop`: region population
- `gdp`: GDP is in constant 2017 international dollars
- `ghg`: greenhouse gas emissions, includes CO2, CH4 and N2O, expressed as million tonnes of CO2 equivalent (CO2eq)

And relative columns:
- `gdp_per_capita = gdp / pop`: GDP per capita
- `ghg_per_capita = 1e6 * ghg / pop`: emissions per capita, expressed as tonnes of CO2eq per capita
- `ghg_per_gdp = 1e12 * ghg / gdp`: emissions per GDP, expressed as grams of CO2eq per \\$

In [15]:
pd.read_csv('../data/world-emissions-2015-regions.csv').head()

Unnamed: 0,region,pop,gdp,ghg,continent,gdp_per_capita,ghg_per_capita,ghg_per_gdp
0,Africa (others),716729098,2675477000000.0,1486.669621,Africa,3732.898828,2.074242,555.665216
1,Asia A,39236975,1304715000000.0,437.545114,Asia,33252.17475,11.151347,335.356916
2,Asia B,776411709,4028749000000.0,1814.282901,Asia,5188.934081,2.336754,450.334042
3,Australia and New Zealand,28411695,1329625000000.0,752.833166,Australia and New Zealand,46798.519564,26.497299,566.199504
4,Brazil,204471769,3079188000000.0,1266.355849,South and Latin America,15059.234368,6.193304,411.262881
