# experiment_explore.ipynb

## Purpose
Explore the [experiment](https://github.com/darothen/experiment) package, by [Daniel Rothenberg](https://github.com/darothen).

## History
- 2017-04-24 - Benjamin S. Grandey (benjamin@smart.mit.edu). Using version 0.0.1.dev-dbc0b45 of experiment.

In [1]:
import experiment
import numpy as np
import os

experiment.__version__

'0.0.1.dev-dbc0b45'

## Run unittests

In [2]:
# Run tests on path to local clone of repository - CHANGE IF NECESSARY
!pytest $HOME/github/experiment  

platform darwin -- Python 3.6.1, pytest-3.0.7, py-1.4.32, pluggy-0.4.0
rootdir: /Users/grandey/github/experiment, inifile:
collected 8 items [0m[1m
[0m
../../experiment/experiment/test/test_experiment.py ........



## Apply experiment to test data

In [3]:
# Location of test data
data_dir = os.path.expandvars('$HOME/github/experiment/experiment/test/data/sample/')

In [4]:
# Cases - based on sample.yaml in in_dir
param1 = experiment.Case('param1', 'Parameter 1', ['a', 'b', 'c'])
param2 = experiment.Case('param2', 'Parameter 2', ['1', '2', '3'])
param3 = experiment.Case('param3', 'Parameter 3', ['alpha', 'beta'])

In [5]:
# Experiment - based on sample.yaml
exp1 = experiment.Experiment(name='test_data_experiment',
                             cases=[param1, param2, param3],
                             data_dir=data_dir,
                             case_path='{param1}_{param2}',
                             output_prefix='{param1}.{param2}.{param3}.',
                             output_suffix='.tape.nc',
                             timeseries=True,  # timeseries=True needed to avoid NotImplementedError
                             validate_data=False)  

In [6]:
# Try loading 'precip' data
data_dict1 = exp1.load('precip')
print(type(data_dict1))
data_dict1

<class 'dict'>


{case(param1='a', param2='1', param3='alpha'): <xarray.Dataset>
 Dimensions:  ()
 Data variables:
     precip   float64 nan,
 case(param1='a', param2='1', param3='beta'): <xarray.Dataset>
 Dimensions:  ()
 Data variables:
     precip   float64 nan,
 case(param1='a', param2='2', param3='alpha'): <xarray.Dataset>
 Dimensions:  ()
 Data variables:
     precip   float64 nan,
 case(param1='a', param2='2', param3='beta'): <xarray.Dataset>
 Dimensions:  ()
 Data variables:
     precip   float64 nan,
 case(param1='a', param2='3', param3='alpha'): <xarray.Dataset>
 Dimensions:  ()
 Data variables:
     precip   float64 nan,
 case(param1='a', param2='3', param3='beta'): <xarray.Dataset>
 Dimensions:  ()
 Data variables:
     precip   float64 nan,
 case(param1='b', param2='1', param3='alpha'): <xarray.Dataset>
 Dimensions:  ()
 Data variables:
     precip   float64 nan,
 case(param1='b', param2='1', param3='beta'): <xarray.Dataset>
 Dimensions:  ()
 Data variables:
     precip   float64 nan,
 cas

**Comment**: I have not been able to work out why NaNs have been loaded as opposed to arrays.

## Apply experiment to some of my own data
In order to explore experiment, I will use NetCDF output data from some CESM simulations. These data are not (yet) publicly available.  The data have the following directory / naming structure:
```
{variable}/p16a_F_{no_or_only}{region}_{year}.{variable}.nc
```
where {variable} is the variable name (e.g. "PRECC"), {no_or_only} is either "No" or "Only", {region} is a region code (e.g. "Eur"), and {year} is either 1970 or 2000.

In [7]:
# Location of input data - CHANGE IF NECESSARY
in_dir = os.path.expandvars('$HOME/data/drafts/draft2017a_region_rfp_data/cesm_out/')

In [8]:
# Try just two Cases to keep things simple
region = experiment.Case('region', 'Region Code', ['Eur', 'NAm', 'SEAs'])
year = experiment.Case('year', 'Year', ['1970', '2000'])
exp2 = experiment.Experiment(name='two_Cases',
                             cases=[region, year],
                             data_dir=in_dir,
                             case_path='PRECC',
                             output_prefix='p16a_F_No{region}_{year}.',
                             output_suffix='.nc',
                             timeseries=True,  # timeseries=True needed to avoid NotImplementedError
                             validate_data=False)

In [9]:
# Load data
data_dict2 = exp2.load('PRECC')
data_dict2

{case(region='Eur', year='1970'): <xarray.Dataset>
 Dimensions:    (lat: 96, lon: 144, nb2: 2, time: 384)
 Coordinates:
   * lon        (lon) float64 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 ...
   * lat        (lat) float64 -90.0 -88.11 -86.21 -84.32 -82.42 -80.53 -78.63 ...
   * time       (time) float64 31.0 59.0 90.0 120.0 151.0 181.0 212.0 243.0 ...
 Dimensions without coordinates: nb2
 Data variables:
     time_bnds  (time, nb2) float64 0.0 31.0 31.0 59.0 59.0 90.0 90.0 120.0 ...
     PRECC      (time, lat, lon) float32 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
 Attributes:
     CDI:              Climate Data Interface version 1.5.6 (http://code.zmaw....
     Conventions:      CF-1.0
     history:          Thu Dec 29 14:54:06 2016: cdo mergetime /somerset/grand...
     case:             p16a_F_NoEur_1970
     title:            UNSET
     logname:          benjamin
     host:             std0807
     Version:          $Name$
     revision_Id:      $Id$
     initial_file:     /s

**Comment**: experiment appears to have successfully loaded the data.

In [10]:
for key, data in data_dict2.items():
    print(key)
    print(data.mean())

case(region='Eur', year='1970')
<xarray.Dataset>
Dimensions:    ()
Data variables:
    time_bnds  float64 5.839e+03
    PRECC      float64 1.676e-08
case(region='Eur', year='2000')
<xarray.Dataset>
Dimensions:    ()
Data variables:
    time_bnds  float64 5.839e+03
    PRECC      float64 1.673e-08
case(region='NAm', year='1970')
<xarray.Dataset>
Dimensions:    ()
Data variables:
    time_bnds  float64 5.839e+03
    PRECC      float64 1.68e-08
case(region='NAm', year='2000')
<xarray.Dataset>
Dimensions:    ()
Data variables:
    time_bnds  float64 5.839e+03
    PRECC      float64 1.673e-08
case(region='SEAs', year='1970')
<xarray.Dataset>
Dimensions:    ()
Data variables:
    time_bnds  float64 5.839e+03
    PRECC      float64 1.677e-08
case(region='SEAs', year='2000')
<xarray.Dataset>
Dimensions:    ()
Data variables:
    time_bnds  float64 5.839e+03
    PRECC      float64 1.675e-08
