# ScmDataFrame

In this notebook we provide an overview of the capabilities provided by OpenSCM's `ScmDataFrame` class. 
This class is a wrapper around *pandas* [DataFrame class](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html), a high-level data structure and data analysis tool.
By wrapping a DataFrame, we are able to make the most of the tools provided by *pandas* whilst adding our own requirements on top.

TODO's:

- provide link to what an IAMC-format file looks like in ScmDataFrame init docstring

## Imports

In [1]:
from openscm.scmdataframe import ScmDataFrame

<IPython.core.display.Javascript object>

## Loading data

`ScmDataFrame`'s can read many different data types and be loaded in many different ways. 
For a full explanation, see the docstring of `ScmDataFrame`'s `__init__` method.

In [4]:
# uncomment the line below and then run the cell to see the docstring
ScmDataFrame.__init__?

Here we load data from a file.

*Note:* here we load RCP26 emissions data. This originally came from http://www.pik-potsdam.de/~mmalte/rcps/ and has since been re-written into a format which can be read by OpenSCM using the [pymagicc](https://github.com/openclimatedata/pymagicc) library. We are not currently planning on importing Pymagicc's readers into OpenSCM by default, please raise an issue if you would like us to consider doing so.

In [5]:
rcp26 = ScmDataFrame("rcp26_emissions.csv")

## Timeseries

`ScmDataFrame` is ideally suited to working with timeseries data.
Indeed, it always stores data in 'wide' format, with each row representing one timeseries and metadata being contained in the row labels.
The `timeseries` method allows you to easily get the data back in this format as a *pandas* `DataFrame`.

In [6]:
rcp26.timeseries().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,time,1765-01-01 00:00:00,1766-01-01 00:00:00,1767-01-01 00:00:00,1768-01-01 00:00:00,1769-01-01 00:00:00,1770-01-01 00:00:00,1771-01-01 00:00:00,1772-01-01 00:00:00,1773-01-01 00:00:00,1774-01-01 00:00:00,...,2491-01-01 00:00:00,2492-01-01 00:00:00,2493-01-01 00:00:00,2494-01-01 00:00:00,2495-01-01 00:00:00,2496-01-01 00:00:00,2497-01-01 00:00:00,2498-01-01 00:00:00,2499-01-01 00:00:00,2500-01-01 00:00:00
model,scenario,region,variable,unit,climate_model,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
IMAGE,RCP26,World,Emissions|BC,Mt BC / yr,unspecified,0.0,0.106998,0.133383,0.159847,0.186393,0.213024,0.239742,0.26655,0.29345,0.320446,...,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578,3.3578
IMAGE,RCP26,World,Emissions|C2F6,kt C2F6 / yr,unspecified,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0857,0.0857,0.0857,0.0857,0.0857,0.0857,0.0857,0.0857,0.0857,0.0857
IMAGE,RCP26,World,Emissions|C6F14,kt C6F14 / yr,unspecified,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0887,0.0887,0.0887,0.0887,0.0887,0.0887,0.0887,0.0887,0.0887,0.0887
IMAGE,RCP26,World,Emissions|CCl4,kt CCl4 / yr,unspecified,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
IMAGE,RCP26,World,Emissions|CF4,kt CF4 / yr,unspecified,0.010763,0.010752,0.010748,0.010744,0.01074,0.010736,0.010731,0.010727,0.010723,0.010719,...,1.092,1.092,1.092,1.092,1.092,1.092,1.092,1.092,1.092,1.092


In [7]:
type(rcp26.timeseries())

pandas.core.frame.DataFrame

## Interpolation

Have to store data on same time axes
If you want to interpolate, can do so trivially