# Visualisation and data analysis

In this notebook we load Xy data which contains 3972 rows, each row corresponding to one run, that is 4 scenarios times almost 1000 SOWs, and contains a few dozen variables - some "input" variables which describe the SOW and the scenario, and some "output" variables which describe the outputs of the simulation at 2050 for that run.

We create some visualisations. A lot more visualisations are available in original TIMES papers.

In [9]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [10]:
import matplotlib
matplotlib.rcParams['pdf.fonttype'] = 42 # helps produce ACM-compliant figures
matplotlib.rcParams['ps.fonttype'] = 42

In [11]:
Xy = pd.read_csv("../outputs/data_Xy.csv", index_col=0)

In [12]:
Xy.columns

Index(['Scenario', 'SOW', 'Temp_Limit', 'Delay', 'GDP', 'Pop',
       'Other_ESD_Drivers', 'SDR', 'Elast_ESD_Driver', 'Elast_ESD_Price',
       'CO2_Storage_Poten', 'Wind_Poten', 'Solar_Poten', 'Biomass_Poten',
       'Oil_Gas_Poten', 'Solar_PV_Inv_Cost', 'Wind_Inv_Cost',
       'Bioenergy_CCS_Inv_Cost', 'Other_Tech_Cost', 'Forcing', 'Land_Sinks',
       'Clim_Sens', 'Year', 'Temp_Change', 'Rad_Forcing', 'CO2_Conc',
       'CH4_Conc', 'N20_Conc', 'Carbon', 'CO2eq', 'Marg_CO2_Cost', 'GSupply',
       'GSupply_Bioenergy', 'GSupply_Fossil', 'GSupply_Geothermal',
       'GSupply_Nuclear', 'GSupply_Solar', 'GSupply_Tidal_Waves',
       'GSupply_Wind', 'GConsumption', 'GConsumption_Fossil',
       'GConsumption_Nuclear', 'GConsumption_Renewable', 'GCost'],
      dtype='object')

In [13]:
Xy.head()

Unnamed: 0,Scenario,SOW,Temp_Limit,Delay,GDP,Pop,Other_ESD_Drivers,SDR,Elast_ESD_Driver,Elast_ESD_Price,...,GSupply_Geothermal,GSupply_Nuclear,GSupply_Solar,GSupply_Tidal_Waves,GSupply_Wind,GConsumption,GConsumption_Fossil,GConsumption_Nuclear,GConsumption_Renewable,GCost
0,BASE_SSP2,1,8.0,0,1.02329,1.00928,1.0631,3.70267,1.01302,1.21686,...,1.907061,4.167604,5.115586,2.424417,5.161512,812.092289,539.893683,43.075844,229.122762,24.788425
1,BASE_SSP2,2,8.0,0,1.03673,1.04802,1.09227,3.12322,0.94558,0.968287,...,1.90606,4.102991,4.179275,2.396469,4.874241,767.103047,487.040394,42.410591,237.652063,23.041618
2,BASE_SSP2,3,8.0,0,1.1054,1.0364,1.06594,4.67141,1.05255,1.11485,...,1.931578,4.383345,5.704254,2.508125,6.629379,891.257378,629.014809,45.297113,216.945456,27.872416
3,BASE_SSP2,4,8.0,0,1.1909,1.06919,1.06402,5.18396,1.0442,0.87273,...,1.937528,4.148554,6.287348,2.4755,5.821404,934.818295,642.963633,42.879712,248.974949,29.159699
4,BASE_SSP2,5,8.0,0,1.00481,0.997576,1.12753,3.46723,1.08794,1.1188,...,1.915827,4.226128,5.810294,2.516766,6.502266,874.573547,592.973597,43.678409,237.92154,26.175837


In [None]:
renames = eval(open('../data/column_name_renames.txt').read())
renames_inv = {renames[k]: k for k in renames}

In [22]:
renames

{'Scenario': 'Scenario',
 'SOW': 'SOW',
 'Temperature Limit': 'Temp_Limit',
 'Delayed Action': 'Delay',
 'GDP': 'GDP',
 'Population': 'Pop',
 'Other Energy Service Demand Drivers': 'Other_ESD_Drivers',
 'Social discount rate': 'SDR',
 'Elasticity of energy service demand to its own driver': 'Elast_ESD_Driver',
 'Elasticity of energy service demand to its own price': 'Elast_ESD_Price',
 'CO2 Storage Potential': 'CO2_Storage_Poten',
 'Wind Potential': 'Wind_Poten',
 'Solar Potential': 'Solar_Poten',
 'Biomass Potential': 'Biomass_Poten',
 'Oil & Gas Potential': 'Oil_Gas_Poten',
 'Solar PV  Investment Cost': 'Solar_PV_Inv_Cost',
 'Wind Investment Cost': 'Wind_Inv_Cost',
 'Bioenergy with CCS Specific Investment Cost': 'Bioenergy_CCS_Inv_Cost',
 'Other technologies costs': 'Other_Tech_Cost',
 'Forcing of non-energy emissions': 'Forcing',
 'Land Use, Land Use Change and Forestry Sinks': 'Land_Sinks',
 'Climate Sensitivity (in oC)': 'Clim_Sens',
 'Year': 'Year',
 'Temperature change (oC)': 'T

In [23]:
renames_inv

{'Scenario': 'Scenario',
 'SOW': 'SOW',
 'Temp_Limit': 'Temperature Limit',
 'Delay': 'Delayed Action',
 'GDP': 'GDP',
 'Pop': 'Population',
 'Other_ESD_Drivers': 'Other Energy Service Demand Drivers',
 'SDR': 'Social discount rate',
 'Elast_ESD_Driver': 'Elasticity of energy service demand to its own driver',
 'Elast_ESD_Price': 'Elasticity of energy service demand to its own price',
 'CO2_Storage_Poten': 'CO2 Storage Potential',
 'Wind_Poten': 'Wind Potential',
 'Solar_Poten': 'Solar Potential',
 'Biomass_Poten': 'Biomass Potential',
 'Oil_Gas_Poten': 'Oil & Gas Potential',
 'Solar_PV_Inv_Cost': 'Solar PV  Investment Cost',
 'Wind_Inv_Cost': 'Wind Investment Cost',
 'Bioenergy_CCS_Inv_Cost': 'Bioenergy with CCS Specific Investment Cost',
 'Other_Tech_Cost': 'Other technologies costs',
 'Forcing': 'Forcing of non-energy emissions',
 'Land_Sinks': 'Land Use, Land Use Change and Forestry Sinks',
 'Clim_Sens': 'Climate Sensitivity (in oC)',
 'Year': 'Year',
 'Temp_Change': 'Temperature c

In [25]:
for col in Xy.columns:
    print(f'{col}: {renames_inv[col]}: distinct values {len(set(Xy[col].values))}. NaNs {Xy[col].isna().sum()}')

Scenario: Scenario: distinct values 4. NaNs 0
SOW: SOW: distinct values 993. NaNs 0
Temp_Limit: Temperature Limit: distinct values 3. NaNs 0
Delay: Delayed Action: distinct values 3. NaNs 0
GDP: GDP: distinct values 993. NaNs 0
Pop: Population: distinct values 993. NaNs 0
Other_ESD_Drivers: Other Energy Service Demand Drivers: distinct values 993. NaNs 0
SDR: Social discount rate: distinct values 993. NaNs 0
Elast_ESD_Driver: Elasticity of energy service demand to its own driver: distinct values 993. NaNs 0
Elast_ESD_Price: Elasticity of energy service demand to its own price: distinct values 993. NaNs 0
CO2_Storage_Poten: CO2 Storage Potential: distinct values 993. NaNs 0
Wind_Poten: Wind Potential: distinct values 993. NaNs 0
Solar_Poten: Solar Potential: distinct values 993. NaNs 0
Biomass_Poten: Biomass Potential: distinct values 993. NaNs 0
Oil_Gas_Poten: Oil & Gas Potential: distinct values 992. NaNs 0
Solar_PV_Inv_Cost: Solar PV  Investment Cost: distinct values 993. NaNs 0
Wind

# Data Visualisation and Exploration

## Which scenarios are the worst news?



In [17]:
yvars = [
    'GCost',
    'CO2eq',
    'GConsumption',
    'GSupply'  
]
xvars = [
    'SDR', 'Clim_Sens', 'Pop', 'GDP'    
]
for y in yvars:
    for x in xvars:
        sns.stripplot(Xy, x='Scenario', y=y, hue=x);
        plt.savefig(f'../outputs/dotplot_{y}_v_{x}_v_scenarios.pdf')
        plt.close()


## Pair plots

Next we'll look at grids of scatter plots:

1. Inputs versus outputs. We see few input variables are directly related to outputs.

2. Input variables versus each other. We see no relationships at all, because all of the input variables are just sampled.

3. Output variables versus each other. We see very interesting relationships.

An important point to note is that the scenarios are four discrete scenarios, giving 4 discrete values for `Delayed Action` and `Temperature Limit`. These two variables are numeric, but not really. They're not well-sampled. So we see interesting-looking effects in the data, which are really the result of data points arising from one scenario or another, and never in between. But those effects are in a sense not real.


In [28]:
x_vars = ['Temperature Limit', 'Delayed Action', 'GDP', 'Population',
       'Other Energy Service Demand Drivers', 'Social discount rate',
       'Elasticity of energy service demand to its own driver',
       'Elasticity of energy service demand to its own price',
       'CO2 Storage Potential', 'Wind Potential', 'Solar Potential',
       'Biomass Potential', 'Oil & Gas Potential', 'Solar PV  Investment Cost',
       'Wind Investment Cost', 'Bioenergy with CCS Specific Investment Cost',
       'Other technologies costs', 'Forcing of non-energy emissions',
       'Land Use, Land Use Change and Forestry Sinks',
       'Climate Sensitivity (in oC)'
       ]

y_vars = ['Temperature change (oC)', 'Radiative Forcing (W/sqm)',
       'CO2 concentrations (PPM)', 'CH4 Concentrations (ppb)',
       'N2O concentrations (ppb)', 'Carbon (Gt)', 'CO2 (Gt-eq)',
       'Marginal CO2 cost (USD/t)', 'Global Total Electricity Supply (EJ/yr.)',
       'Global Electricty Supply from Bioenergy (EJ/yr.)',
       'Global Electricity Supply from Fossil Fuels (EJ/yr.)',
       'Global Electricity Supply from Geothermal (EJ/yr.)',
       'Global Electricity Supply from Nuclear (EJ/yr.)',
       'Global Electricity Supply from Solar (EJ/yr.)',
       'Global Electricity Supply from Tidal/Waves (EJ/yr.)',
       'Global Electricity Supply from Wind (EJ/yr.)',
       'Global Total Primary Energy Consumption (EJ/yr.)',
       'Global Primary Energy Consumption of Fossil Fuels (EJ/yr.)',
       'Global Primary Energy Consumption of Nuclear energy (EJ/yr.)',
       'Global Primary Energy Consumption of Renewable energy (EJ/yr.)',
       'Annual Total Global Energy System Cost (Trillion USD)']


In [26]:
x_vars_small = ['SDR', 'Clim_Sens', 'Pop', 'GDP']
y_vars_small = ['GSupply', 'CO2eq', 'GConsumption', 'GCost']

We run all these and save rather than displaying in the notebook. These plots are bit large and unwieldy so saving as external files is better. Saving as png (rather than pdf) is correct in this case because the number of individual dots would make the pdfs themselves unwieldy.

In [27]:
for scenario in set(Xy['Scenario']):
    Xy_sub = Xy[Xy['Scenario'] == scenario]
    sns.pairplot(Xy_sub, x_vars=x_vars_small, y_vars=y_vars_small);
    plt.savefig(f'../outputs/pairplot_Xy_{scenario}_subset.png')
    plt.close()
    # only the scenario-encoding variables actually vary between scenarios, in the X-X case.
    # but we will make the plots anyway
    sns.pairplot(Xy_sub, x_vars=x_vars_small, y_vars=x_vars_small); 
    plt.savefig(f'../outputs/pairplot_XX_{scenario}_subset.png')
    plt.close()
    sns.pairplot(Xy, x_vars=y_vars_small, y_vars=y_vars_small);
    plt.savefig(f'../outputs/pairplot_yy_{scenario}_subset.png')
    plt.close()

In [30]:
for x_var in y_vars:
    print(f'\\item {x_var}')

\item Temperature change (oC)
\item Radiative Forcing (W/sqm)
\item CO2 concentrations (PPM)
\item CH4 Concentrations (ppb)
\item N2O concentrations (ppb)
\item Carbon (Gt)
\item CO2 (Gt-eq)
\item Marginal CO2 cost (USD/t)
\item Global Total Electricity Supply (EJ/yr.)
\item Global Electricty Supply from Bioenergy (EJ/yr.)
\item Global Electricity Supply from Fossil Fuels (EJ/yr.)
\item Global Electricity Supply from Geothermal (EJ/yr.)
\item Global Electricity Supply from Nuclear (EJ/yr.)
\item Global Electricity Supply from Solar (EJ/yr.)
\item Global Electricity Supply from Tidal/Waves (EJ/yr.)
\item Global Electricity Supply from Wind (EJ/yr.)
\item Global Total Primary Energy Consumption (EJ/yr.)
\item Global Primary Energy Consumption of Fossil Fuels (EJ/yr.)
\item Global Primary Energy Consumption of Nuclear energy (EJ/yr.)
\item Global Primary Energy Consumption of Renewable energy (EJ/yr.)
\item Annual Total Global Energy System Cost (Trillion USD)
