# Input module examples

This is not yet perfect, but is getting there. The dataframes which are returned are designed to reflect MAGICC's underlying data input format. When we have pint with units (waiting on https://github.com/hgrecco/pint/pull/684) we can use `df.quantify()` and `df.dequantify()` to quickly add/remove units to our dataframes and swiftly do conversions etc. 

In [13]:
from os.path import join
from pprint import pprint

import pandas as pd
import pymagicc
from pymagicc.input import MAGICCInput

import expectexception

In [2]:
MAGICC6_DIR = join("..", "pymagicc", "MAGICC6", "run")

## Read files

In [3]:
mdata = MAGICCInput()
mdata.read(MAGICC6_DIR, "HISTRCP_CO2I_EMIS.IN")
mdata.head()

VARIABLE,CO2I_EMIS,CO2I_EMIS,CO2I_EMIS,CO2I_EMIS,CO2I_EMIS
TODO,SET,SET,SET,SET,SET
UNITS,GtC,GtC,GtC,GtC,GtC
REGION,R5OECD,R5REF,R5ASIA,R5MAF,R5LAM
YEAR,Unnamed: 1_level_4,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,Unnamed: 5_level_4
1765,0.003,0.0,0.0,0.0,0.0
1766,0.003,0.0,0.0,0.0,0.0
1767,0.003,0.0,0.0,0.0,0.0
1768,0.003,0.0,0.0,0.0,0.0
1769,0.003,0.0,0.0,0.0,0.0


In [4]:
pprint(mdata.metadata)

{'contact': 'Base year emissions inventories: Steve Smith (ssmith@pnl.gov) and '
            'Jean-Francois Lamarque (Jean-Francois.Lamarque@noaa.gov); RCP '
            '3-PD (IMAGE): Detlef van Vuuren (detlef.vanvuuren@pbl.nl); RCP '
            '4.5 (MiniCAM): Allison Thomson (Allison.Thomson@pnl.gov); RCP 6.0 '
            '(AIM): Toshihiko Masui (masui@nies.go.jp); RCP 8.5 (MESSAGE): '
            'Keywan Riahi (riahi@iiasa.ac.at); Concentrations & Forcing '
            'compilation: Malte Meinshausen (malte.meinshausen@pik-potsdam.de)',
 'data': 'Historical fossil&industrial CO2 (CO2I) Emissions '
         '(HISTRCP_CO2I_EMIS)',
 'header': 'RCPTOOL - MAGICC 6.X DATA FILE \n'
           'VERSION:   ALPHA - FIRST DRAFT - PRIMAP xls file written on 02 Sep '
           '2009, 17:04:37 \n'
           ' \n'
           ' \n'
           'DATA:  Historical fossil&industrial CO2 (CO2I) Emissions '
           '(HISTRCP_CO2I_EMIS) \n'
           'SOURCE:  RCP data as presented on '
         

## Accessors

`MAGICCInput` is built to make data accessing as simple as possible. As we have unique keys for each of our headers, we allow users to select data simply using lists, without specifying what each value should correspond to etc. (essentially we hide away all the pandas fiddly stuff).

Of course the underlying dataframe is always accessible via the `.df` attribute.

In [5]:
mdata['CO2I_EMIS'].head(2)

VARIABLE,CO2I_EMIS,CO2I_EMIS,CO2I_EMIS,CO2I_EMIS,CO2I_EMIS
TODO,SET,SET,SET,SET,SET
UNITS,GtC,GtC,GtC,GtC,GtC
REGION,R5OECD,R5REF,R5ASIA,R5MAF,R5LAM
YEAR,Unnamed: 1_level_4,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,Unnamed: 5_level_4
1765,0.003,0.0,0.0,0.0,0.0
1766,0.003,0.0,0.0,0.0,0.0


In [6]:
mdata['R5ASIA', 'CO2I_EMIS'].head(2)

VARIABLE,CO2I_EMIS
TODO,SET
UNITS,GtC
REGION,R5ASIA
YEAR,Unnamed: 1_level_4
1765,0.0
1766,0.0


In [7]:
# changing order doesn't matter
mdata['CO2I_EMIS', 'R5ASIA'].head(2)

VARIABLE,CO2I_EMIS
TODO,SET
UNITS,GtC
REGION,R5ASIA
YEAR,Unnamed: 1_level_4
1765,0.0
1766,0.0


In [8]:
%%expect_exception KeyError
# the above doesn't work on the raw dataframe
# because you need to specify all the levels 
# in the right order
mdata.df['R5ASIA', 'CO2I_EMIS']

[0;31m---------------------------------------------------------------------------[0m
[0;31mKeyError[0m                                  Traceback (most recent call last)
[0;32m<ipython-input-8-b7d9f3057b51>[0m in [0;36m<module>[0;34m()[0m
[1;32m      2[0m [0;31m# because you need to specify all the levels[0m[0;34m[0m[0;34m[0m[0m
[1;32m      3[0m [0;31m# in the right order[0m[0;34m[0m[0;34m[0m[0m
[0;32m----> 4[0;31m [0mmdata[0m[0;34m.[0m[0mdf[0m[0;34m[[0m[0;34m'R5ASIA'[0m[0;34m,[0m [0;34m'CO2I_EMIS'[0m[0;34m][0m[0;34m[0m[0m
[0m
[0;32m~/Documents/AGCEC/MCastle/pymagicc/venv/lib/python3.6/site-packages/pandas/core/frame.py[0m in [0;36m__getitem__[0;34m(self, key)[0m
[1;32m   2684[0m             [0;32mreturn[0m [0mself[0m[0;34m.[0m[0m_getitem_frame[0m[0;34m([0m[0mkey[0m[0;34m)[0m[0;34m[0m[0m
[1;32m   2685[0m         [0;32melif[0m [0mis_mi_columns[0m[0;34m:[0m[0;34m[0m[0m
[0;32m-> 2686[0;31m             [

In [9]:
# can also put index values in
mdata[1956, 'CO2I_EMIS', 'R5ASIA'].head(2)

VARIABLE,CO2I_EMIS
TODO,SET
UNITS,GtC
REGION,R5ASIA
YEAR,Unnamed: 1_level_4
1956,0.110425


In [10]:
%%expect_exception KeyError
# although slicing fails, one for the to do list...
mdata[1956:1970, 'CO2I_EMIS', 'R5ASIA'].head(2)

TypeError: (slice(1956, 1970, None), 'CO2I_EMIS', 'R5ASIA')

In [11]:
mdata[1998]

VARIABLE,CO2I_EMIS,CO2I_EMIS,CO2I_EMIS,CO2I_EMIS,CO2I_EMIS
TODO,SET,SET,SET,SET,SET
UNITS,GtC,GtC,GtC,GtC,GtC
REGION,R5OECD,R5REF,R5ASIA,R5MAF,R5LAM
YEAR,Unnamed: 1_level_4,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,Unnamed: 5_level_4
1998,3.184738,0.853781,1.648214,0.522509,0.378757


## Writing files

Once you have your data in the format as above, writing files is trivial. 

In [12]:
mdata.write("HISTEXAMPLE_CO2I_EMIS.IN")

AssertionError: 

However, note that the format to write in is determined by the filename. Hence you can't just any old filename, it has to follow MAGICC's (somewhat cryptic) internal conventions.

In [None]:
%%expect_exception ValueError
mdata.write("histexample.txt")