# ScmRun

*Suggestions for update:* add examples of handling of timeseries interpolation plus how the guessing works

In this notebook we provide an overview of the capabilities provided by scmdata's `ScmRun` class.
This class is a wrapper around `scmdata.timeseries.TimeSeries` objects, themselves a wrapper around [the `xarray.DataArray` class](https://xarray.pydata.org/en/stable/data-structures.html#dataarray).
The wrappers allow us to make the most of the tools provided by [xarray](https://xarray.pydata.org/en/stable/why-xarray.html) and [pandas](https://pandas.pydata.org/pandas-docs/stable) whilst adding our own requirements on top.

## Imports

In [1]:
# NBVAL_IGNORE_OUTPUT
import traceback

from pint.errors import DimensionalityError

from scmdata import ScmRun

## Loading data

`ScmRun`'s can read many different data types and be loaded in many different ways.
For a full explanation, see the docstring of `ScmRun`'s `__init__` method.

In [2]:
print(ScmRun.__init__.__doc__)


        Initialize the container with timeseries data.

        Parameters
        ----------
        data: Union[ScmDataFrame, ScmRun, IamDataFrame, pd.DataFrame, np.ndarray, str]
            If a :class`ScmDataFrame` or :class`ScmRun` object is provided, then a new
            :obj`ScmRun` is created with a copy of the values and metadata from :obj`data`.

            A :class`pd.DataFrame with IAMC-format data columns (the result
            from :func`ScmRun.timeseries()`) can be provided without any additional
            :obj:`columns` and :obj:`index` information.

            If a numpy array of timeseries data is provided, :obj:`columns` and :obj:`index`
            must also be specified.
            The shape of the numpy array should be ```(n_times, n_series)``` where `n_times`
             is the number of timesteps and `n_series` is the number of time series.

            If a string is passed, data will be attempted to be read from file. Currently,
            reading f

Here we load data from a file.

*Note:* here we load RCP26 emissions data. This originally came from http://www.pik-potsdam.de/~mmalte/rcps/ and has since been re-written into a format which can be read by scmdata using the [pymagicc](https://github.com/openclimatedata/pymagicc) library. We are not currently planning on importing Pymagicc's readers into scmdata by default, please raise an issue [here](https://github.com/openscm/scmdata/issues) if you would like us to consider doing so.

In [3]:
rcp26 = ScmRun("rcp26_emissions.csv", lowercase_cols=True)

## Timeseries

`ScmDataFrame` is ideally suited to working with timeseries data.
The `timeseries` method allows you to easily get the data back in wide format as a *pandas* `DataFrame`.
Here 'wide' format refers to representing timeseries as a row with metadata being contained in the row labels.

In [4]:
rcp26.timeseries().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,time,2000-01-01 00:00:00,2000-01-01 00:00:06,2000-01-01 00:00:12,2000-01-01 00:00:18,2000-01-01 00:00:24,2000-01-01 00:00:30,2000-01-01 00:00:36,2001-01-01 00:00:00,2001-01-01 00:00:06,2001-01-01 00:00:12,...,1890-01-01 00:00:00,1891-01-01 00:00:00,1892-01-01 00:00:00,1893-01-01 00:00:00,1894-01-01 00:00:00,1895-01-01 00:00:00,1896-01-01 00:00:00,1897-01-01 00:00:00,1898-01-01 00:00:00,1899-01-01 00:00:00
model,scenario,region,variable,unit,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1
IMAGE,RCP26,World,Emissions|BC,Mt BC / yr,4.375663,7.8048,3.3578,3.3578,3.3578,3.3578,3.3578,4.417385,7.8945,3.3578,...,4.056354,4.088285,4.120215,4.152146,4.184077,4.216008,4.247939,4.27987,4.311801,4.343732
IMAGE,RCP26,World,Emissions|C2F6,kt C2F6 / yr,0.050576,2.3749,0.0857,0.0857,0.0857,0.0857,0.0857,0.053238,2.4345,0.0857,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.959119
IMAGE,RCP26,World,Emissions|C6F14,kt C6F14 / yr,0.0,0.4624,0.0887,0.0887,0.0887,0.0887,0.0887,0.0,0.4651,0.0887,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
IMAGE,RCP26,World,Emissions|CCl4,kt CCl4 / yr,8.473,74.132,0.0,0.0,0.0,0.0,0.0,8.592,65.195,0.0,...,5.485,5.731,6.01,6.267,6.529,6.824,7.151,7.432,7.798,8.119
IMAGE,RCP26,World,Emissions|CF4,kt CF4 / yr,0.010377,12.0001,1.092,1.092,1.092,1.092,1.092,0.01038,11.925,1.092,...,0.010368,0.010369,0.01037,0.010371,0.010372,0.010373,0.010374,0.010374,0.010375,0.010376


In [5]:
type(rcp26.timeseries())

pandas.core.frame.DataFrame

## Unit conversion

The scmdata package uses the [OpenSCM-Units](https://openscm-units.readthedocs.io/) unit registry and uses the [Pint](https://github.com/hgrecco/pint) library to handle unit conversion.

Calling the `convert_unit` method of an `ScmRun` returns a new `ScmRun` instance with converted units.

In [6]:
rcp26.filter(variable="Emissions|BC").timeseries()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,time,2000-01-01 00:00:00,2000-01-01 00:00:06,2000-01-01 00:00:12,2000-01-01 00:00:18,2000-01-01 00:00:24,2000-01-01 00:00:30,2000-01-01 00:00:36,2001-01-01 00:00:00,2001-01-01 00:00:06,2001-01-01 00:00:12,...,1890-01-01 00:00:00,1891-01-01 00:00:00,1892-01-01 00:00:00,1893-01-01 00:00:00,1894-01-01 00:00:00,1895-01-01 00:00:00,1896-01-01 00:00:00,1897-01-01 00:00:00,1898-01-01 00:00:00,1899-01-01 00:00:00
model,scenario,region,variable,unit,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1
IMAGE,RCP26,World,Emissions|BC,Mt BC / yr,4.375663,7.8048,3.3578,3.3578,3.3578,3.3578,3.3578,4.417385,7.8945,3.3578,...,4.056354,4.088285,4.120215,4.152146,4.184077,4.216008,4.247939,4.27987,4.311801,4.343732


In [7]:
rcp26.filter(variable="Emissions|BC").convert_unit("kg BC / day").timeseries()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,time,1765-01-01 00:00:00,1766-01-01 00:00:00,1767-01-01 00:00:00,1768-01-01 00:00:00,1769-01-01 00:00:00,1770-01-01 00:00:00,1771-01-01 00:00:00,1772-01-01 00:00:00,1773-01-01 00:00:00,1774-01-01 00:00:00,...,2068-01-01 00:00:12,2068-01-01 00:00:18,2068-01-01 00:00:24,2068-01-01 00:00:30,2069-01-01 00:00:00,2069-01-01 00:00:06,2069-01-01 00:00:12,2069-01-01 00:00:18,2069-01-01 00:00:24,2069-01-01 00:00:30
model,scenario,region,variable,unit,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1
IMAGE,RCP26,World,Emissions|BC,kg BC / day,0.0,292944.558522,365181.6564,437636.605065,510316.112252,583227.186858,656376.947296,729772.785763,803422.313484,877333.360712,...,9193155.0,9193155.0,9193155.0,9193155.0,15121580.0,10928620.0,9193155.0,9193155.0,9193155.0,9193155.0


Note that you must filter your data first as the unit conversion is applied to all available variables. If you do not, you will receive `DimensionalityError`'s.

In [8]:
try:
    rcp26.convert_unit("kg BC / day").timeseries()
except DimensionalityError:
    traceback.print_exc(limit=0, chain=False)

Traceback (most recent call last):
pint.errors.DimensionalityError: Cannot convert from 'C * gigametric_ton / a' ([carbon] * [mass] / [time]) to 'BC * kilogram / day' ([black_carbon] * [mass] / [time])


Having said this, thanks to Pint's idea of contexts, we are able to trivially convert to CO<sub>2</sub> equivalent units (as long as we restrict our conversion to variables which have a CO<sub>2</sub> equivalent).

In [9]:
rcp26.filter(variable=["*CO2*", "*CH4*", "*N2O*"]).timeseries()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,time,2000-01-01 00:00:00,2000-01-01 00:00:06,2000-01-01 00:00:12,2000-01-01 00:00:18,2000-01-01 00:00:24,2000-01-01 00:00:30,2000-01-01 00:00:36,2001-01-01 00:00:00,2001-01-01 00:00:06,2001-01-01 00:00:12,...,1890-01-01 00:00:00,1891-01-01 00:00:00,1892-01-01 00:00:00,1893-01-01 00:00:00,1894-01-01 00:00:00,1895-01-01 00:00:00,1896-01-01 00:00:00,1897-01-01 00:00:00,1898-01-01 00:00:00,1899-01-01 00:00:00
model,scenario,region,variable,unit,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1
IMAGE,RCP26,World,Emissions|CH4,Mt CH4 / yr,123.73905,300.2069,142.0527,142.0527,142.0527,142.0527,142.0527,124.8415,303.4092,142.0527,...,110.16635,111.52362,112.88089,114.23816,115.59543,116.9527,118.30997,119.66724,121.02451,122.38178
IMAGE,RCP26,World,Emissions|CO2|MAGICC AFOLU,Gt C / yr,0.653206,1.1488,0.5113,0.0,0.0,0.0,0.0,0.702943,1.132,0.490848,...,0.611754,0.610249,0.623191,0.624403,0.640367,0.644419,0.646278,0.649564,0.651001,0.652204
IMAGE,RCP26,World,Emissions|CO2|MAGICC Fossil and Industrial,Gt C / yr,0.534,6.735,-0.9308,-0.9308,-0.9308,-0.9308,-0.9308,0.552,6.8959,-0.9308,...,0.356,0.372,0.374,0.37,0.383,0.406,0.419,0.44,0.465,0.507
IMAGE,RCP26,World,Emissions|N2O,Mt N2ON / yr,0.932522,7.4566,5.2823,5.2823,5.2823,5.2823,5.2823,0.944667,7.503,5.2823,...,0.845525,0.850982,0.85742,0.864758,0.872945,0.881882,0.891418,0.901402,0.911681,0.922105


In [10]:
rcp26.filter(variable=["*CO2*", "*CH4*", "*N2O*"]).convert_unit(
    "Mt CO2 / yr", context="AR4GWP100"
).timeseries()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,time,1765-01-01 00:00:00,1766-01-01 00:00:00,1767-01-01 00:00:00,1768-01-01 00:00:00,1769-01-01 00:00:00,1770-01-01 00:00:00,1771-01-01 00:00:00,1772-01-01 00:00:00,1773-01-01 00:00:00,1774-01-01 00:00:00,...,2068-01-01 00:00:12,2068-01-01 00:00:18,2068-01-01 00:00:24,2068-01-01 00:00:30,2069-01-01 00:00:00,2069-01-01 00:00:06,2069-01-01 00:00:12,2069-01-01 00:00:18,2069-01-01 00:00:24,2069-01-01 00:00:30
model,scenario,region,variable,unit,unit_context,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
IMAGE,RCP26,World,Emissions|CH4,Mt CO2 / yr,AR4GWP100,0.0,49.081547,60.911202,72.777625,84.681955,96.625365,108.609062,120.634295,132.702345,144.81454,...,3551.3175,3551.3175,3551.3175,3551.3175,6572.785,4045.609,3551.3175,3551.3175,3551.3175,3551.3175
IMAGE,RCP26,World,Emissions|CO2|MAGICC AFOLU,Mt CO2 / yr,AR4GWP100,0.0,19.573752,39.147508,58.72126,78.295012,97.868767,117.442519,137.016271,156.590027,176.163779,...,0.0,0.0,0.0,0.0,4540.755633,2001.633333,0.0,0.0,0.0,0.0
IMAGE,RCP26,World,Emissions|CO2|MAGICC Fossil and Industrial,Mt CO2 / yr,AR4GWP100,11.0,11.0,11.0,11.0,11.0,11.0,14.666667,14.666667,14.666667,14.666667,...,-3412.933333,-3412.933333,-3412.933333,-3412.933333,13933.333333,902.183333,-3412.933333,-3412.933333,-3412.933333,-3412.933333
IMAGE,RCP26,World,Emissions|N2O,Mt CO2 / yr,AR4GWP100,0.0,2.430911,4.737559,7.04433,9.351227,11.658254,13.965417,16.272717,18.580161,20.887751,...,2473.625629,2473.625629,2473.625629,2473.625629,2708.953717,2687.398057,2473.625629,2473.625629,2473.625629,2473.625629


Without the context, a `DimensionalityError` is once again raised.

In [11]:
try:
    rcp26.convert_unit("Mt CO2 / yr").timeseries()
except DimensionalityError:
    traceback.print_exc(limit=0, chain=False)

Traceback (most recent call last):
pint.errors.DimensionalityError: Cannot convert from 'BC * megametric_ton / a' ([black_carbon] * [mass] / [time]) to 'CO2 * megametric_ton / a' ([carbon] * [mass] / [time])


In addition, when we do a conversion with contexts, the context information is automatically added to the metadata. This ensures we can't accidentally use a different context for further conversions.

In [12]:
ar4gwp100_converted = rcp26.filter(variable=["*CO2*", "*CH4*", "*N2O*"]).convert_unit(
    "Mt CO2 / yr", context="AR4GWP100"
)
ar4gwp100_converted.timeseries()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,time,1765-01-01 00:00:00,1766-01-01 00:00:00,1767-01-01 00:00:00,1768-01-01 00:00:00,1769-01-01 00:00:00,1770-01-01 00:00:00,1771-01-01 00:00:00,1772-01-01 00:00:00,1773-01-01 00:00:00,1774-01-01 00:00:00,...,2068-01-01 00:00:12,2068-01-01 00:00:18,2068-01-01 00:00:24,2068-01-01 00:00:30,2069-01-01 00:00:00,2069-01-01 00:00:06,2069-01-01 00:00:12,2069-01-01 00:00:18,2069-01-01 00:00:24,2069-01-01 00:00:30
model,scenario,region,variable,unit,unit_context,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
IMAGE,RCP26,World,Emissions|CH4,Mt CO2 / yr,AR4GWP100,0.0,49.081547,60.911202,72.777625,84.681955,96.625365,108.609062,120.634295,132.702345,144.81454,...,3551.3175,3551.3175,3551.3175,3551.3175,6572.785,4045.609,3551.3175,3551.3175,3551.3175,3551.3175
IMAGE,RCP26,World,Emissions|CO2|MAGICC AFOLU,Mt CO2 / yr,AR4GWP100,0.0,19.573752,39.147508,58.72126,78.295012,97.868767,117.442519,137.016271,156.590027,176.163779,...,0.0,0.0,0.0,0.0,4540.755633,2001.633333,0.0,0.0,0.0,0.0
IMAGE,RCP26,World,Emissions|CO2|MAGICC Fossil and Industrial,Mt CO2 / yr,AR4GWP100,11.0,11.0,11.0,11.0,11.0,11.0,14.666667,14.666667,14.666667,14.666667,...,-3412.933333,-3412.933333,-3412.933333,-3412.933333,13933.333333,902.183333,-3412.933333,-3412.933333,-3412.933333,-3412.933333
IMAGE,RCP26,World,Emissions|N2O,Mt CO2 / yr,AR4GWP100,0.0,2.430911,4.737559,7.04433,9.351227,11.658254,13.965417,16.272717,18.580161,20.887751,...,2473.625629,2473.625629,2473.625629,2473.625629,2708.953717,2687.398057,2473.625629,2473.625629,2473.625629,2473.625629


Trying to convert without a context, or with a different context, raises an error.

In [13]:
try:
    ar4gwp100_converted.convert_unit("Mt CO2 / yr")
except ValueError:
    traceback.print_exc(limit=0, chain=False)

Traceback (most recent call last):
ValueError: Existing unit conversion context(s), `['AR4GWP100']`, doesn't match input context, `None`, drop `unit_context` metadata before doing conversion


In [14]:
try:
    ar4gwp100_converted.convert_unit("Mt CO2 / yr", context="AR5GWP100")
except ValueError:
    traceback.print_exc(limit=0, chain=False)

Traceback (most recent call last):
ValueError: Existing unit conversion context(s), `['AR4GWP100']`, doesn't match input context, `AR5GWP100`, drop `unit_context` metadata before doing conversion


## Convenience methods

Below we showcase a few convenience methods of `ScmRun`. These will grow over time, please add a pull request adding more where they are useful!

### get_unique_meta

This method helps with getting the unique metadata values in an `ScmRun`. Here we show how it can be useful. Check out its docstring for full details. 

By itself, it doesn't do anything special, just returns the unique metadata values as a list.

In [15]:
rcp26.get_unique_meta("variable")

['Emissions|BC',
 'Emissions|C2F6',
 'Emissions|C6F14',
 'Emissions|CCl4',
 'Emissions|CF4',
 'Emissions|CFC11',
 'Emissions|CFC113',
 'Emissions|CFC114',
 'Emissions|CFC115',
 'Emissions|CFC12',
 'Emissions|CH3Br',
 'Emissions|CH3CCl3',
 'Emissions|CH3Cl',
 'Emissions|CH4',
 'Emissions|CO',
 'Emissions|CO2|MAGICC AFOLU',
 'Emissions|CO2|MAGICC Fossil and Industrial',
 'Emissions|HCFC141b',
 'Emissions|HCFC142b',
 'Emissions|HCFC22',
 'Emissions|HFC125',
 'Emissions|HFC134a',
 'Emissions|HFC143a',
 'Emissions|HFC227ea',
 'Emissions|HFC23',
 'Emissions|HFC245fa',
 'Emissions|HFC32',
 'Emissions|HFC4310',
 'Emissions|Halon1202',
 'Emissions|Halon1211',
 'Emissions|Halon1301',
 'Emissions|Halon2402',
 'Emissions|N2O',
 'Emissions|NH3',
 'Emissions|NMVOC',
 'Emissions|NOx',
 'Emissions|OC',
 'Emissions|SF6',
 'Emissions|SOx']

However, it can be useful if you expect there to only be one unique metadata value. In such a case, you can use the `no_duplicates` argument to ensure that you only get a single value as its native type (not a list) and that an error will be raised if this isn't the case.

In [16]:
rcp26.get_unique_meta("model", no_duplicates=True)

'IMAGE'

In [17]:
try:
    rcp26.get_unique_meta("unit", no_duplicates=True)
except ValueError:
    traceback.print_exc(limit=0, chain=False)

Traceback (most recent call last):
ValueError: `unit` column is not unique (found values: ['Mt BC / yr', 'kt C2F6 / yr', 'kt C6F14 / yr', 'kt CCl4 / yr', 'kt CF4 / yr', 'kt CFC11 / yr', 'kt CFC113 / yr', 'kt CFC114 / yr', 'kt CFC115 / yr', 'kt CFC12 / yr', 'kt CH3Br / yr', 'kt CH3CCl3 / yr', 'kt CH3Cl / yr', 'Mt CH4 / yr', 'Mt CO / yr', 'Gt C / yr', 'kt HCFC141b / yr', 'kt HCFC142b / yr', 'kt HCFC22 / yr', 'kt HFC125 / yr', 'kt HFC134a / yr', 'kt HFC143a / yr', 'kt HFC227ea / yr', 'kt HFC23 / yr', 'kt HFC245fa / yr', 'kt HFC32 / yr', 'kt HFC4310 / yr', 'kt Halon1202 / yr', 'kt Halon1211 / yr', 'kt Halon1301 / yr', 'kt Halon2402 / yr', 'Mt N2ON / yr', 'Mt N / yr', 'Mt NMVOC / yr', 'Mt OC / yr', 'kt SF6 / yr', 'Mt S / yr'])
