# Object-Oriented Demo

---

This is a demo for the class-based features of `climpred` that are in development.

Author: Riley X. Brady (riley.brady@colorado.edu)

Date Last Updated: April 26th, 2019

## Object Types

There are two types of objects that can be used in `climpred`, and both are built on top of `xarray`'s `Dataset` and `DataArray` objects.

1. `ReferenceEnsemble`: This is a climate prediction ensemble that is initialized from a "reference." In other words, this can be a forecasting system initialized from observations, a reanalysis, a hindcast/reconstruction, etc.
1. `PerfectModelEnsemble`: This is a climate prediction ensemble that is initialized from a control run, using the "perfect model" framework.

Both the `ReferenceEnsemble` and `PerfectModelEnsemble` are sub-classes of a `PredictionEnsemble`.

In [1]:
import numpy as np
import xarray as xr
import climpred as cp
import matplotlib.pyplot as plt
%matplotlib inline
import proplot as plot

# Reference Ensemble

Here I will use the Community Earth System Model Decadal Prediction Large Ensemble (CESM-DPLE) for a demonstration of the `ReferenceEnsemble` capabilities. CESM-DPLE is initialized on November 1st from 1955-2017 by a Forced Ocean Sea-Ice (FOSI) reconstruction simulation. 

I will also use the FOSI output as a reference for potential predictability of the CESM-DPLE. Lastly, I will load in ERSST observations for a skill assessment of CESM-DPLE.

## References

Details on the CESM-DPLE experimental setup:

1. Yeager, S. G., et al. "Predicting near-term changes in the Earth System: A large ensemble of initialized decadal prediction simulations using the Community Earth System Model." Bulletin of the American Meteorological Society 99.9 (2018): 1867-1886.

Details on the ERSST observations:

2. https://www.ncdc.noaa.gov/data-access/marineocean-data/extended-reconstructed-sea-surface-temperature-ersst-v5

Here we load the sample data included in `climpred`.

In [9]:
def _load_dple():
    dple = xr.open_dataset('../sample_data/prediction/' +
                     'CESM-DP-LE.SST.annmean.anom.nc')['anom']
    dple = dple.sel(S=slice(1955, 2015))
    dple = dple.mean('M')
    dple = dple.rename({'S': 'initialization', 'L': 'time', })
    # detrend
    dple = cp.stats.xr_rm_trend(dple, dim='initialization')
    dple.name = 'SST'
    return dple

def _load_fosi():
    fosi = xr.open_dataset('../sample_data/prediction/' +
                       'g.e11_LENS.GECOIAF.T62_g16.009.pop.h.SST.024901-031612.nc')['SST']
    fosi = fosi.sel(time=slice('1955', '2015'))
    fosi = fosi.groupby('time.year').mean('time')
    fosi = fosi.rename({'year': 'initialization'})
    # move to anomaly space
    fosi = fosi - fosi.sel(initialization=slice(1964, 2014)).mean('initialization')
    # detrend
    fosi = cp.stats.xr_rm_trend(fosi, dim='initialization')
    fosi.name = 'SST'
    return fosi

def _load_data():
    data = xr.open_dataset('../sample_data/prediction/' +
                           'ERSSTv4.global.mean.nc')['sst']
    data = data.rename({'year': 'initialization'})
    # move to anomaly space
    data = data - data.sel(initialization=slice(1964, 2014)).mean('initialization')
    # detrend
    data = cp.stats.xr_rm_trend(data, dim='initialization')
    data.name = 'SST'
    return data

dple = _load_dple().to_dataset()
fosi = _load_fosi().to_dataset()
data = _load_data().to_dataset()

In [None]:
# temporary to check that 'm2r' works
dp = []
for i in range(10):
    vals = dple.SST.values + np.random.rand(dple.SST.shape[0],
                                 dple.SST.shape[1])
    vals = xr.DataArray(vals, dims=dple.dims)
    dp.append(vals)
dple = xr.concat(dp, dim='member')
dple['member'] = np.arange(1, 11)
dple['initialization'] = np.arange(61)
dple['time'] = np.arange(10)
dple.name = 'SST'

In [None]:
dp = cp.ReferenceEnsemble(dple)
print(dp)

In [None]:
dp.add_reference(fosi, 'FOSI')

In [None]:
dp.compute_skill(comparison='m2r', metric='mae')

The `ReferenceEnsemble` object is created just by inputting the prediction ensemble output. Other reference objects can be added after it is generated.

In [10]:
dp = cp.ReferenceEnsemble(dple)
print(dp)

<climpred.ReferenceEnsemble>
Initialized Ensemble:
    SST      (initialization, time) float64 0.02316 0.09983 ... 0.09395 0.0865
References:
    None
Uninitialized:
    None


For a `ReferenceEnsemble` object, you can add multiple references. Here, we will add the reconstruction (FOSI) and observations (ERSST). Multiple checks go into play here under the hood, ensuring that dimensions are named according to our rules and that references match all initialized dimensions (except for time).

We can also have multiple variables per reference and decadal prediction object.

In [11]:
# dummy/repetitive data to show ability to work with multiple
# variables.
dple['SALT'] = _load_dple() + np.random.rand(dple.SST.shape[0],
                                             dple.SST.shape[1])
fosi['SALT'] = _load_fosi() + np.random.rand(fosi.SST.shape[0])

In [14]:
dp = cp.ReferenceEnsemble(dple)

In [15]:
# This appends data associated with the initialized ensemble.
# Now we have a label for the reconstruction (FOSI) and the
# raw data (ERSST). One could also add other data sources,
# control runs, etc.
dp.add_reference(fosi, 'FOSI')
dp.add_reference(data, 'ERSST')
print(dp)

<climpred.ReferenceEnsemble>
Initialized Ensemble:
    SST      (initialization, time) float64 0.02316 0.09983 ... 0.09395 0.0865
    SALT     (initialization, time) float64 0.1694 0.5292 ... 0.4581 0.3289
FOSI:
    SST      (initialization) float64 -0.02241 -0.031 0.1348 ... 0.02867 0.144
    SALT     (initialization) float64 0.1649 0.5076 0.1449 ... 0.271 1.005
ERSST:
    SST      (initialization) float64 -0.06196 -0.02328 ... 0.07206 0.1659
Uninitialized:
    None


Now we can apply our functions to our `ReferenceEnsemble` object. You can call compute_skill in two different ways:

1. Declaring a single reference you want to compute skill with regard to (this returns a single xr.Dataset with skill results for that

2. Running compute_skill with no arguments, which computes skill for all available references. This returns a dictionary with each of your reference results.

In [None]:
# Here, we only want a skill computation for FOSI.
# This automatically compares FOSI to the main initialized
# ensemble.
skill = dp.compute_skill(refname='FOSI')
print(skill)

In [None]:
# Skill computation for FOSI, but with different metric.
dp.compute_skill('FOSI', metric='mae')

In [None]:
# Now we can go with option (2). Computing skill across all references.
# Automatically computes for all references a pearsonr ensemble mean.
skill = dp.compute_skill()
print(skill)

In [None]:
print(skill['FOSI'])

In [None]:
print(skill['ERSST'])

We can also compute a persistence forecast. This automatically only computes persistence for the references.

In [None]:
# Persistence for both FOSI and data, but only out to 8 lags.
persist = dp.compute_persistence(nlags=8)

In [None]:
print(persist['FOSI'])

In [None]:
print(persist['ERSST'])

## Plots from our Results

In [None]:
plot.rc.small = 11
plot.rc.large = 13
varname = 'SST'

f, ax = plot.subplots(aspect=3, axwidth='10cm', legend='b')
    
p1, = ax.plot(persist['FOSI'].time, persist['FOSI'][varname],
        '-ok', markersize=6, label='persistence forecast')
p2, = ax.plot(skill['FOSI'].time, skill['FOSI'][varname],
        '-or', markersize=6, label='initialized forecast')

ax.format(ylim=[-0.5, 1], xlim=[0.5, 10.5], xlabel='lead year',
          ylabel='anomaly correlation \n coefficient',
          title='Detrended Global SST Forecast')

f.bottompanel.legend([p1, p2])

## Uninitialized testing

In [None]:
cesmLENS = xr.open_dataset('../sample_data/prediction/CESM-LE.global_mean.SST.1955-2015.nc')
cesmLENS = cesmLENS.rename({'time': 'initialization'})

In [None]:
dp.add_uninitialized(cesmLENS)

In [None]:
dp

In [None]:
dp.compute_uninitialized('FOSI', metric='mae')

In [None]:
dp.compute_uninitialized('ERSST')

In [None]:
dp.compute_uninitialized()

# Perfect Model Ensemble

In [None]:
ds = xr.open_dataset('../sample_data/prediction/PM_MPI-ESM-LR_ds.nc')
control = xr.open_dataset('../sample_data/prediction/PM_MPI-ESM-LR_control.nc')

# get working with just one variable.
var = 'tos'
area = 'global'
period = 'ym'
ds = ds[var].sel(area=area, period=period)
control = control[var].sel(area=area, period=period)

In [None]:
pm = cp.PerfectModelEnsemble(ds)
pm.add_control(control)

In [None]:
print(pm)

In [None]:
def _plot_skill(ax, result, color='k', linestyle='-', marker='o', 
                markersize=6, linewidth=2, **kwargs):
    p = ax.plot(result.time, result, color=color, linestyle=linestyle, 
                marker=marker, markersize=markersize, linewidth=linewidth, 
                **kwargs)
    return p

In [None]:
ip = pm.compute_skill(comparison='m2e')
persist = pm.compute_persistence()

In [None]:
plot.rc.small = 8
f, ax = plot.subplots(axwidth=4, aspect=5)
i = _plot_skill(ax, ip['tos'], color='r', label='initialized forecast')
p = _plot_skill(ax, persist['tos'], color='gray', linestyle='--', 
                label='persistence forecast')
ax.format(ylim=[-0.75, 1], xlim=[0.5, 10.5])

**NOTE**: Everything looks good here except for persistence. Something still off here...

In [None]:
plot.rc.small = 12
plot.rc.large = 12
for c in ['e2c','m2c','m2e','m2m']:
    pm.compute_skill(comparison=c).tos.plot(label=c)
pm.compute_persistence(nlags=20).tos.plot(label='persistence', ls=':')
plt.ylabel('ACC')
plt.xticks(np.arange(1,21))
plt.title('Different forecast-reference comparisons for pearson_r \n lead to systematically different magnitude of skill score')
plt.legend()

In [None]:
ds3d = xr.open_dataset('../sample_data/prediction/PM_MPI-ESM-LR_ds3d.nc')
control3d = xr.open_dataset('../sample_data/prediction/PM_MPI-ESM-LR_control3d.nc')

In [None]:
pm = cp.PerfectModelEnsemble(ds3d['tos'])
pm.add_control(control3d['tos'])
print(pm)

In [None]:
skill3d = pm.compute_skill(comparison='m2e')

In [None]:
skill3d.tos.T.plot(col='time', robust=True, yincrease=False)