# ScmDataFrame

In this notebook we provide an overview of the capabilities provided by OpenSCM's `ScmDataFrame` class. 
This class is a wrapper around *pandas* [DataFrame class](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html), a high-level data structure and data analysis tool.
By wrapping a DataFrame, we are able to make the most of the tools provided by *pandas* whilst adding our own requirements on top.

## Imports

In [1]:
import traceback

from pint.errors import DimensionalityError

from openscm.scmdataframe import ScmDataFrame

<IPython.core.display.Javascript object>

## Loading data

`ScmDataFrame`'s can read many different data types and be loaded in many different ways. 
For a full explanation, see the docstring of `ScmDataFrame`'s `__init__` method.

In [2]:
# uncomment the line below and then run the cell to see the docstring
# ScmDataFrame.__init__?

Here we load data from a file.

*Note:* here we load RCP26 emissions data. This originally came from http://www.pik-potsdam.de/~mmalte/rcps/ and has since been re-written into a format which can be read by OpenSCM using the [pymagicc](https://github.com/openclimatedata/pymagicc) library. We are not currently planning on importing Pymagicc's readers into OpenSCM by default, please raise an issue if you would like us to consider doing so.

In [3]:
rcp26 = ScmDataFrame("rcp26_emissions.csv")

## Timeseries

`ScmDataFrame` is ideally suited to working with timeseries data.
Indeed, it always stores data in 'wide' format, with each row representing one timeseries and metadata being contained in the row labels.
The `timeseries` method allows you to easily get the data back in this format as a *pandas* `DataFrame`.

In [4]:
rcp26.timeseries().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,time,1765-01-01 00:00:00,1766-01-01 00:00:00,1767-01-01 00:00:00,1768-01-01 00:00:00,1769-01-01 00:00:00,1770-01-01 00:00:00,1771-01-01 00:00:00,1772-01-01 00:00:00,1773-01-01 00:00:00,1774-01-01 00:00:00,...,2491-01-01 00:00:00,2492-01-01 00:00:00,2493-01-01 00:00:00,2494-01-01 00:00:00,2495-01-01 00:00:00,2496-01-01 00:00:00,2497-01-01 00:00:00,2498-01-01 00:00:00,2499-01-01 00:00:00,2500-01-01 00:00:00
model,scenario,region,variable,unit,climate_model,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
IMAGE,RCP26,World,Emissions|BC,Mt BC / yr,unspecified,0.0,0.106998,0.133383,0.159847,0.186393,0.213024,0.239742,0.26655,0.29345,0.320446,...,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578
IMAGE,RCP26,World,Emissions|C2F6,kt C2F6 / yr,unspecified,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0857,0.0857,0.0857,0.0857,0.0857,0.0857,0.0857,0.0857,0.0857,0.0857
IMAGE,RCP26,World,Emissions|C6F14,kt C6F14 / yr,unspecified,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0887,0.0887,0.0887,0.0887,0.0887,0.0887,0.0887,0.0887,0.0887,0.0887
IMAGE,RCP26,World,Emissions|CCl4,kt CCl4 / yr,unspecified,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
IMAGE,RCP26,World,Emissions|CF4,kt CF4 / yr,unspecified,0.010763,0.010752,0.010748,0.010744,0.01074,0.010736,0.010731,0.010727,0.010723,0.010719,...,1.092,1.092,1.092,1.092,1.092,1.092,1.092,1.092,1.092,1.092


In [5]:
type(rcp26.timeseries())

pandas.core.frame.DataFrame

## Unit conversion

The OpenSCM package has an inbuilt unit registry and uses the [Pint](https://github.com/hgrecco/pint) library to handle unit conversion. 

Calling the `convert_unit` method of an `ScmDataFrame` returns a new `ScmDataFrame` instance with converted units.

In [6]:
rcp26.filter(variable="Emissions|BC").timeseries()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,time,1765-01-01 00:00:00,1766-01-01 00:00:00,1767-01-01 00:00:00,1768-01-01 00:00:00,1769-01-01 00:00:00,1770-01-01 00:00:00,1771-01-01 00:00:00,1772-01-01 00:00:00,1773-01-01 00:00:00,1774-01-01 00:00:00,...,2491-01-01 00:00:00,2492-01-01 00:00:00,2493-01-01 00:00:00,2494-01-01 00:00:00,2495-01-01 00:00:00,2496-01-01 00:00:00,2497-01-01 00:00:00,2498-01-01 00:00:00,2499-01-01 00:00:00,2500-01-01 00:00:00
model,scenario,region,variable,unit,climate_model,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
IMAGE,RCP26,World,Emissions|BC,Mt BC / yr,unspecified,0.0,0.106998,0.133383,0.159847,0.186393,0.213024,0.239742,0.26655,0.29345,0.320446,...,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578


In [7]:
rcp26.filter(variable="Emissions|BC").convert_unit("kg BC / day").timeseries()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,time,1765-01-01 00:00:00,1766-01-01 00:00:00,1767-01-01 00:00:00,1768-01-01 00:00:00,1769-01-01 00:00:00,1770-01-01 00:00:00,1771-01-01 00:00:00,1772-01-01 00:00:00,1773-01-01 00:00:00,1774-01-01 00:00:00,...,2491-01-01 00:00:00,2492-01-01 00:00:00,2493-01-01 00:00:00,2494-01-01 00:00:00,2495-01-01 00:00:00,2496-01-01 00:00:00,2497-01-01 00:00:00,2498-01-01 00:00:00,2499-01-01 00:00:00,2500-01-01 00:00:00
model,scenario,region,variable,unit,climate_model,unit_context,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
IMAGE,RCP26,World,Emissions|BC,kg BC / day,unspecified,,0.0,292950.815533,365189.456325,437645.952558,510327.012109,583239.644025,656390.966871,729788.373001,803439.473804,877352.099701,...,9193352.0,9193352.0,9193352.0,9193352.0,9193352.0,9193352.0,9193352.0,9193352.0,9193352.0,9193352.0


Note that you must filter your data first as the unit conversion is applied to all available variables. If you do not, you will receive `DimensionalityError`'s.

In [8]:
try:
    rcp26.convert_unit("kg BC / day").timeseries()
except DimensionalityError:
    traceback.print_exc(limit=0, chain=False)

Traceback (most recent call last):
pint.errors.DimensionalityError: Cannot convert from 'C * gigametric_ton / a' ([carbon] * [mass] / [time]) to 'BC * kilogram / day' ([black_carbon] * [mass] / [time])


Having said this, thanks to Pint's idea of contexts, we are able to trivially convert to CO<sub>2</sub> equivalent units (as long as we restrict our conversion to variables which have a CO2 equivalent).

In [11]:
rcp26.filter(variable=["*CO2*", "*CH4*", "*N2O*"]).timeseries()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,time,1765-01-01 00:00:00,1766-01-01 00:00:00,1767-01-01 00:00:00,1768-01-01 00:00:00,1769-01-01 00:00:00,1770-01-01 00:00:00,1771-01-01 00:00:00,1772-01-01 00:00:00,1773-01-01 00:00:00,1774-01-01 00:00:00,...,2491-01-01 00:00:00,2492-01-01 00:00:00,2493-01-01 00:00:00,2494-01-01 00:00:00,2495-01-01 00:00:00,2496-01-01 00:00:00,2497-01-01 00:00:00,2498-01-01 00:00:00,2499-01-01 00:00:00,2500-01-01 00:00:00
model,scenario,region,variable,unit,climate_model,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
IMAGE,RCP26,World,Emissions|CH4,Mt CH4 / yr,unspecified,0.0,1.963262,2.436448,2.911105,3.387278,3.865015,4.344362,4.825372,5.308094,5.792582,...,142.0527,142.0527,142.0527,142.0527,142.0527,142.0527,142.0527,142.0527,142.0527,142.0527
IMAGE,RCP26,World,Emissions|CO2|MAGICC AFOLU,Gt C / yr,unspecified,0.0,0.005338,0.010677,0.016015,0.021353,0.026691,0.03203,0.037368,0.042706,0.048045,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
IMAGE,RCP26,World,Emissions|CO2|MAGICC Fossil and Industrial,Gt C / yr,unspecified,0.003,0.003,0.003,0.003,0.003,0.003,0.004,0.004,0.004,0.004,...,-0.9308,-0.9308,-0.9308,-0.9308,-0.9308,-0.9308,-0.9308,-0.9308,-0.9308,-0.9308
IMAGE,RCP26,World,Emissions|N2O,Mt N2ON / yr,unspecified,0.0,0.005191,0.010117,0.015043,0.019969,0.024896,0.029822,0.03475,0.039677,0.044605,...,5.2823,5.2823,5.2823,5.2823,5.2823,5.2823,5.2823,5.2823,5.2823,5.2823


In [13]:
rcp26.filter(variable=["*CO2*", "*CH4*", "*N2O*"]).convert_unit(
    "Mt CO2 / yr", 
    context="AR4GWP100"
).timeseries()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,time,1765-01-01 00:00:00,1766-01-01 00:00:00,1767-01-01 00:00:00,1768-01-01 00:00:00,1769-01-01 00:00:00,1770-01-01 00:00:00,1771-01-01 00:00:00,1772-01-01 00:00:00,1773-01-01 00:00:00,1774-01-01 00:00:00,...,2491-01-01 00:00:00,2492-01-01 00:00:00,2493-01-01 00:00:00,2494-01-01 00:00:00,2495-01-01 00:00:00,2496-01-01 00:00:00,2497-01-01 00:00:00,2498-01-01 00:00:00,2499-01-01 00:00:00,2500-01-01 00:00:00
model,scenario,region,variable,unit,climate_model,unit_context,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
IMAGE,RCP26,World,Emissions|CH4,Mt CO2 / yr,unspecified,AR4GWP100,0.0,49.081547,60.911202,72.777625,84.681955,96.625365,108.609062,120.634295,132.702345,144.81454,...,3551.3175,3551.3175,3551.3175,3551.3175,3551.3175,3551.3175,3551.3175,3551.3175,3551.3175,3551.3175
IMAGE,RCP26,World,Emissions|CO2|MAGICC AFOLU,Mt CO2 / yr,unspecified,AR4GWP100,0.0,19.573753,39.147508,58.72126,78.295012,97.868767,117.442519,137.016271,156.590027,176.163779,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
IMAGE,RCP26,World,Emissions|CO2|MAGICC Fossil and Industrial,Mt CO2 / yr,unspecified,AR4GWP100,11.0,11.0,11.0,11.0,11.0,11.0,14.666666,14.666666,14.666666,14.666666,...,-3412.933333,-3412.933333,-3412.933333,-3412.933333,-3412.933333,-3412.933333,-3412.933333,-3412.933333,-3412.933333,-3412.933333
IMAGE,RCP26,World,Emissions|N2O,Mt CO2 / yr,unspecified,AR4GWP100,0.0,2.430911,4.737559,7.04433,9.351227,11.658254,13.965417,16.272717,18.580161,20.887751,...,2473.625629,2473.625629,2473.625629,2473.625629,2473.625629,2473.625629,2473.625629,2473.625629,2473.625629,2473.625629


Without the context, a `DimensionalityError` is once again raised.

In [9]:
try:
    rcp26.convert_unit("Mt CO2 / yr").timeseries()
except DimensionalityError:
    traceback.print_exc(limit=0, chain=False)

Traceback (most recent call last):
pint.errors.DimensionalityError: Cannot convert from 'BC * megametric_ton / a' ([black_carbon] * [mass] / [time]) to 'CO2 * megametric_ton / a' ([carbon] * [mass] / [time])


## Interpolation

TODO: write this, see https://github.com/openclimatedata/openscm/issues/157

Have to store data on same time axes
If you want to interpolate, can do so trivially