# PRIMAP-hist data Preparation

This jupyter notebook sets out a method for preparing PRIMAPhist data for plotting. It can further be used as a guide and basis for setting up other input datasets. 

The dataset can be viewed and accessed at:
https://www.pik-potsdam.de/paris-reality-check/primap-hist/

Gütschow, J.; Jeffery, L.; Gieseke, R. (2019): The PRIMAP-hist national historical emissions time series (1850-2016). v2.0. GFZ Data Services. https://doi.org/10.5880/pik.2019.001

The downloaded data should be stored in the 'input-data' folder of this repository. PRIMAP-hist is updated roughly once per year. The user should therefore check for updates at the above link and download new data when available. If this is done, the new filenames will need to be added below. 

------
Possible improvments:
*  unit conversion
* warn of over-writing files AND/OR automise file names.  Currently it's possible to give a file a poor name. Would require a look-up for category names though...


In [8]:
# import modules

# system
import sys
import os

# data handling
import pandas as pd
import numpy as np

# open climate data packages
from countrygroups import UNFCCC, EUROPEAN_UNION, ANNEX_ONE, NON_ANNEX_ONE
from shortcountrynames import to_name

# global stocktake tools
import gst_tools.gst_utils as utils

In [9]:
from a_parameters import *

In [10]:
# EPO
# Please, check that the parameters are correct before proceeding

variable_name_to_display, proc_data_fname, source_name = utils.get_primap_variable_and_and_file_name(gas_names[raw_entity], raw_sector, raw_scenario, version)

print('This is the origin of the data to be plotted:\n' + source_name)
print('This will be the name of the variable displayed on the plot:\n' + variable_name_to_display)
print('This will be the name of the processed data file:\n' + proc_data_fname)

This is the origin of the data to be plotted:
PRIMAP-histcr_v2.3.1
This will be the name of the variable displayed on the plot:
Total CO2 emissions (excl. LULUCF)
This will be the name of the processed data file:
PRIMAP-histcr_v2.3.1_Total_CO2_emissions_(excl._LULUCF).csv


In [11]:
## EPO You will process and plot the following data: ##

print('[1/10] Includes extrapolated data: '+str(include_extrapolated_data))
print('[2/10] Gas to plot: ' + gas_names[raw_entity])
print('[3/10] Sector to plot: ' + sector_names[raw_sector])

print('[4/10] Name of the raw data file: ' + raw_data_file)
print('[5/10] Version of dataset: ' + version)

# EPO TO DO: Find better terms for the types of data
if raw_scenario == 'HISTCR':
    print('[6/10] Type of data: Country-reported')
elif raw_scenario == 'HISTTP':
    print('[6/10] Type of data: Third-party')

print('[7/10] The following countries will be plotted:')
print(needed_countries)

print('[8/10] The following years will be plotted:')
print(years_of_interest)

print('[9/10] The dataset will be processed to start from the following year: ' + str(start_year))

if save_opt == True:
    print('[10/10] You chose to save the plots as files.')
else:
    print('[10/10] You chose not to save the plots as files.')

[1/10] Includes extrapolated data: True
[2/10] Gas to plot: CO2
[3/10] Sector to plot: Total (excl.LULUCF)
[4/10] Name of the raw data file: Guetschow-et-al-2021-PRIMAP-hist_v2.3.1_20-Sep_2021.csv
[5/10] Version of dataset: 2.3.1
[6/10] Type of data: Country-reported
[7/10] The following countries will be plotted:
['AFG', 'AGO', 'ALB', 'AND', 'ARE', 'ARG', 'ARM', 'ATG', 'AUS', 'AUT', 'AZE', 'BDI', 'BEL', 'BEN', 'BFA', 'BGD', 'BGR', 'BHR', 'BHS', 'BIH', 'BLR', 'BLZ', 'BOL', 'BRA', 'BRB', 'BRN', 'BTN', 'BWA', 'CAF', 'CAN', 'CHE', 'CHL', 'CHN', 'CIV', 'CMR', 'COD', 'COG', 'COK', 'COL', 'COM', 'CPV', 'CRI', 'CUB', 'CYP', 'CZE', 'DEU', 'DJI', 'DMA', 'DNK', 'DOM', 'DZA', 'ECU', 'EGY', 'ERI', 'ESP', 'EST', 'ETH', 'EUU', 'FIN', 'FJI', 'FRA', 'FSM', 'GAB', 'GBR', 'GEO', 'GHA', 'GIN', 'GMB', 'GNB', 'GNQ', 'GRC', 'GRD', 'GTM', 'GUY', 'HND', 'HRV', 'HTI', 'HUN', 'IDN', 'IND', 'IRL', 'IRN', 'IRQ', 'ISL', 'ISR', 'ITA', 'JAM', 'JOR', 'JPN', 'KAZ', 'KEN', 'KGZ', 'KHM', 'KIR', 'KNA', 'KOR', 'KWT', 'LAO

In [12]:
## Here we process the dataset

# get the data
raw_data_folder = 'input-data'
fname = os.path.join('', raw_data_folder, raw_data_file)
print('reading ' + fname)
raw_data = pd.read_csv(fname)

# LTN: rename column names to ensure they match columns names in these tools
raw_data.rename(columns={'scenario (PRIMAP-hist)': 'scenario',
                         'area (ISO3)': 'country',
                         'category (IPCC2006_PRIMAP)': 'category'}, inplace=True)

# EPO: rename the European Union so that it can be found with the filter.
raw_data['country'].replace({'EU27BX': 'EUU'}, inplace=True)

# reduce to only the desired variable (one per output file)
new_data = raw_data.loc[(raw_data['entity'] == raw_entity) &
                        (raw_data['scenario'] == raw_scenario) &
                        (raw_data['category'] == raw_sector)
                        ]

# EPO
if len(new_data.index) == 0:
    print('There is no data for the gas, the sector, and the data source specified. Please, provide new parameters.')

else:
    # Louise
    # reduce the countries or regions to only those desired
    new_data = new_data.loc[new_data['country'].isin(needed_countries)]

    if len(new_data.index) == 0:
        print('There is no data for the countries specified. Please, provide different countries.')
    else:
        # tell the user if any of the needed countries are missing and, if yes, which ones:
        missing_countries = list(set(needed_countries) - set(new_data['country'].unique()))
        if missing_countries:
            print('Not all countries requested were available in the raw data. You are missing the following:')
            for country in missing_countries:
                print('   ' + to_name(country))
            print('---------')

        # reduce to only required years
        new_data = utils.change_first_year(new_data, start_year)

        # rename columns to follow conventions
        new_data = new_data.rename(columns={'entity': 'variable'})

        # make sure 'variable' contains all necessary information
        new_data['variable'] = variable_name_to_display

        # label the source
        new_data['source'] = source_name

        new_data = utils.check_column_order(new_data)

        # check! # EPO
        print('These are the 10 first rows of the processed data:')
        print(new_data.head(10))

reading input-data\Guetschow-et-al-2021-PRIMAP-hist_v2.3.1_20-Sep_2021.csv
Not all countries requested were available in the raw data. You are missing the following:
   Palestine
---------
First year of data available is now 1990
Last year of data available is 2019
These are the 10 first rows of the processed data:
  category country scenario                source         unit  \
0   M.0.EL     AFG   HISTCR  PRIMAP-histcr_v2.3.1  Gg CO2 / yr   
1   M.0.EL     AGO   HISTCR  PRIMAP-histcr_v2.3.1  Gg CO2 / yr   
2   M.0.EL     ALB   HISTCR  PRIMAP-histcr_v2.3.1  Gg CO2 / yr   
3   M.0.EL     AND   HISTCR  PRIMAP-histcr_v2.3.1  Gg CO2 / yr   
4   M.0.EL     ARE   HISTCR  PRIMAP-histcr_v2.3.1  Gg CO2 / yr   
5   M.0.EL     ARG   HISTCR  PRIMAP-histcr_v2.3.1  Gg CO2 / yr   
6   M.0.EL     ARM   HISTCR  PRIMAP-histcr_v2.3.1  Gg CO2 / yr   
7   M.0.EL     ATG   HISTCR  PRIMAP-histcr_v2.3.1  Gg CO2 / yr   
8   M.0.EL     AUS   HISTCR  PRIMAP-histcr_v2.3.1  Gg CO2 / yr   
9   M.0.EL     AUT   HI

In [13]:
## write the data to file

"""
First ensure that years, unit, 'country', and variable are all in data. If they are
can proceed to print data
"""

if 'country' not in new_data.columns or 'unit' not in new_data.columns:

    print('Missing required information! Please check your input data and processing!')

else:

    # EPO delete define filename as composite of variable and source name
    # EPO delete fname_out = proc_data_fname#new_source_name + '_' + new_variable_name + '.csv'
    fullfname_out = os.path.join('proc-data', proc_data_fname)

    # check folder exists
    if not os.path.exists('proc-data'):
        os.makedirs('proc-data')

    # write to csv in proc data folder
    new_data.to_csv(fullfname_out, index=False)

    # celebrate success 
    print('Processed data written to file! - ' + fullfname_out)


Processed data written to file! - proc-data\PRIMAP-histcr_v2.3.1_Total_CO2_emissions_(excl._LULUCF).csv


**Test ground below**