# Collective progress

**Example notebook for creating anonymised, collective information on progress**

* Before running this notebook, you need to prepare the data you want to assess. To do so, please use the notebook - ""
* For testing, some example data is available in the folder "proc_data"
* Enter the name of the file that you wish to use in the first cell, after that you can run the full notebook with minimal changes. 

In [3]:
# import modules

# system 
import re
import os

# calculation
import pandas as pd
import numpy as np

# plotting
%matplotlib inline
import seaborn
import matplotlib

# global stocktake tools
from gst_tools.make_plots import *
import gst_tools.gst_utils as utils


## LJ notes...

The code below has been tested with
* 'UN-population-data-2017.csv'
* 'PRIMAP-hist_v2.0_Energy-CO2.csv'

TODOs
* clearly define 'proc-data' file format
* decide on final plots
* automatically include the source in plots somehow
 

In [9]:
# USER INPUT

# First, choose which file you want to plot the data for
#data_file_name = 'UN-population-data-2017.csv'
#data_file_name = 'PRIMAP-hist_v2.0_Energy-CO2.csv'
data_file_name = 'PRIMAP-hist_v2.0_KyotoGHG-AR4-total-excl-LU.csv'
#data_file_name = 'PRIMAP-hist_UN-2017_calc__CO2-per-population.csv'
#data_file_name = 'WDI2017_GDP-PPP.csv'

# Second, choose which years you are interested in analysing
years_of_interest = ['1990', '2000', '2014']

# set the following to True if plots should be saved. If False, plots will be shown on screen.
save_opt = False

In [10]:
# DATA READING AND PREP

# read the data from file 
fname_in = os.path.join('proc-data', data_file_name)
data = pd.read_csv(fname_in)

# Check the data format
if not utils.verify_data_format(data):
    print('WARNING: The data is not correctly formatted! Please check before continuing!')

# extract the key information
variable = data['variable'].unique()[0]
unit = data['unit'].unique()[0]

# tidy up for next stesps
data_years = utils.set_countries_as_index(data)
data_years = data_years.dropna(axis=1, how='any')

# remove comment below to display the data
#data_years

In [13]:
# Plot 1 - make a histogram of absolute data

for selected_year in years_of_interest:
    make_histogram(data_years[selected_year], variable, unit, remove_outliers=True, save_plot=save_opt)


-----------
Identifying and removing outliers
lower outliers are:
Series([], Name: 1990, dtype: float64)
upper outliers are: 
country
AUS     423000.0
BRA     606000.0
CAN     607000.0
CHN    3600000.0
DEU    1260000.0
ESP     294000.0
FRA     554000.0
GBR     814000.0
IDN     354000.0
IND    1150000.0
IRN     382000.0
ITA     525000.0
JPN    1270000.0
KAZ     341000.0
KOR     316000.0
MEX     423000.0
NGA     302000.0
POL     477000.0
RUS    3780000.0
UKR     952000.0
USA    6510000.0
ZAF     374000.0
Name: 1990, dtype: float64
---
-----------
Identifying and removing outliers
lower outliers are:
Series([], Name: 2000, dtype: float64)
upper outliers are: 
country
AUS     488000.0
BRA     803000.0
CAN     736000.0
CHN    4890000.0
DEU    1060000.0
ESP     393000.0
FRA     558000.0
GBR     726000.0
IDN     525000.0
IND    1640000.0
IRN     525000.0
ITA     560000.0
JPN    1380000.0
KOR     517000.0
MEX     550000.0
POL     397000.0
RUS    2280000.0
SAU     345000.0
TUR     301000.0
UKR 

In [8]:
# Plot 2 - trends

# Calculate trends and define plotting params    
# TODO - improve description here. 
trends, rolling_trends, trends_unit = utils.calculate_trends(data_years, num_years_trend=5)
trends_variable = 'Annual average change in ' + variable

# plot the trend in the final year
make_histogram(rolling_trends.iloc[:,-1], trends_variable, trends_unit, save_plot=save_opt)


Averaging trend over 5 years.
bins set to range(-15, 15)


In [12]:
# Plot 3 - change since year X

# run calculations
df_abs_diff_1990, df_perc_diff_1990 = utils.calculate_diff_since_yearX(data_years, '1990')
df_abs_diff_2005, df_perc_diff_2005 = utils.calculate_diff_since_yearX(data_years, '2005')

# make plots
# TODO - titles currently missing necessary information here!
for selected_year in years_of_interest:
    make_histogram(df_perc_diff_1990[selected_year], "change since 1990", "%", remove_outliers=True, kTuk=3, save_plot=save_opt)

make_histogram(df_perc_diff_2005.iloc[:,-1], "change since 2005", '%', remove_outliers=False, save_plot=save_opt)
make_histogram(df_perc_diff_2005.iloc[:,-1], "change since 2005", '%', remove_outliers=True, save_plot=save_opt)


Calculating difference compared to 1990
Calculating difference compared to 2005
---------
All values in the series are the same! Exiting plotting routine for change since 1990
---------
-----------
Identifying and removing outliers
lower outliers are:
Series([], Name: 2000, dtype: float64)
upper outliers are: 
country
BOL     188.439306
GNQ    2801.477833
NER     164.248705
SYC     191.517857
TLS     165.700483
ZWE     155.434783
Name: 2000, dtype: float64
---
bins set to range(-156, 156, 12)
-----------
Identifying and removing outliers
lower outliers are:
Series([], Name: 2014, dtype: float64)
upper outliers are: 
country
GNQ    5909.852217
MDV     584.466019
QAT     538.297872
Name: 2014, dtype: float64
---
bins set to range(-444, 444, 37)
bins set to range(-210, 210, 14)
-----------
Identifying and removing outliers
lower outliers are:
Series([], Name: 2016, dtype: float64)
upper outliers are: 
country
NIU    192.957746
SGP    187.234043
Name: 2016, dtype: float64
---
bins set to r

## Below here is code for testing and debugging!

In [None]:
# read the data from file 
fname_in = os.path.join('proc-data', data_file_name)
data = pd.read_csv(fname_in)

data.columns

In [5]:
data_years

Unnamed: 0_level_0,2008,2009,2010,2011
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AFG,3.076000e+10,3.723000e+10,4.037000e+10,4.284000e+10
AGO,1.129000e+11,1.157000e+11,1.196000e+11,1.243000e+11
ALB,2.342000e+10,2.421000e+10,2.511000e+10,2.575000e+10
ARE,4.311000e+11,4.085000e+11,4.151000e+11,4.368000e+11
ARG,6.464000e+11,6.081000e+11,6.697000e+11,7.099000e+11
ARM,1.908000e+10,1.638000e+10,1.674000e+10,1.753000e+10
ATG,1.907000e+09,1.678000e+09,1.558000e+09,1.530000e+09
AUS,7.617000e+11,7.755000e+11,7.912000e+11,8.100000e+11
AUT,3.197000e+11,3.076000e+11,3.135000e+11,3.223000e+11
AZE,1.093000e+11,1.196000e+11,1.254000e+11,1.255000e+11
