# Scikit-downscale: an open source Python package for scalable climate downscaling

Joseph Hamman (jhamman@ucar.edu) and Julia Kent (jkent@ucar.edu)

NCAR, United States of America

ECAHM 2020 ID: 143

Climate data from Earth System Models are increasingly being used to study the impacts of climate change on a broad range of biogeophysical (forest fires, fisheries, etc.) and human systems (reservoir operations, urban heat waves, etc.). Before this data can be used to study many of these systems, post-processing steps commonly referred to as bias correction and statistical downscaling must be performed. “Bias correction” is used to correct persistent biases in climate model output and “statistical downscaling” is used to increase the spatiotemporal resolution of the model output (i.e. 1 deg to 1/16th deg grid boxes). For our purposes, we’ll refer to both parts as “downscaling”.

In the past few decades, the applications community has developed a plethora of downscaling methods. Many of these methods are ad-hoc collections of post processing routines while others target very specific applications. The proliferation of downscaling methods has left the climate applications community with an overwhelming body of research to sort through without much in the form of synthesis guilding method selection or applicability.

Motivated by the pressing socio-environmental challenges of climate change – and with the learnings from previous downscaling efforts in mind – we have begun working on a community-centered open framework for climate downscaling: scikit-downscale. We believe that the community will benefit from the presence of a well-designed open source downscaling toolbox with standard interfaces alongside a repository of benchmark data to test and evaluate new and existing downscaling methods.

In this notebook, we provide an overview of the scikit-downscale project, detailing how it can be used to downscale a range of surface climate variables such as air temperature and precipitation. We also highlight how scikit-downscale framework is being used to compare exisiting methods and how it can be extended to support the development of new downscaling methods.


[Insert some sort of figure here, probably showing a “typical” workflow]

## Scikit-downscale
Scikit-downscale is a new Python project we have been compiling over the past few months. In it, we have been building a collection of existing downscaling methods within a common framework. Key features of Scikit-downscale are:

A high-level interface modeled after the popular fit / precict pattern found in many machine learning packages (Scikit-learn, Tensorflow, etc.),
Uses Xarray data structures and utilities for handling multi-dimensional datasets and parrlelization,
Common interface for pointwise and spatial (or global) downscaling models, and
Extensible, allowing the creation of new downscaling methods through composition.
Below is an example implementation of a Scikit-downscale workflow that uses the BCSD method:

from skdownscale.pointwise_models import PointWiseDownscaler
from skdownscale.models.bcsd import BCSDTemperature

```python
# da_temp_train: xarray.DataArray (monthly)
# da_temp_obs: xarray.DataArray (monthly)
# da_temp_obs_daily: xarray.DataArray (daily)
# da_temp_predict: xarray.DataArray (monthly)

# create a model
bcsd_model = PointWiseDownscaler(BCSDTemperature(), dim='time')

# train the model
bcsd_model.train(da_temp_train, da_temp_obs)

# predict with the model  (downscaled_temp: xr.DataArray)
downscaled_temp = bcsd_model.predict(da_temp_predict)
```

## Pointwise Models
We define pointwise methods are those that only use local information during the downscaling process. They can be often represented as a linear model and applied repetively across the entire study domain. Examples of existing pointwise methods are:

- BCSD_[Temperature, Precipitation]: Wood et al 2002
- ARRM: Stoner et al 2012
- (Hybrid) Delta Method
- GARD: https://github.com/NCAR/GARD

Because pointwise methods can be written as a stand-alone linear model, Scikit-downscale implements these models as a Scikit-learn LinearModel or Pipeline. By building directly on Scikit-learn, we inherit a well defined model API and the ability to interoperate with a robust ecosystem utilities for model evaluation and optimization (e.g. grid-search). Perhaps more importantly, this structure also allows us to compare methods at a high-level of granularity (single spatial point) before deploying them on large domain problems.

In the example above, we demonstrated the use of the PointWiseDownscaler. We use this class to wrap a pointwise models allowing training and prediction with multidimensional Xarray objects.

## Spatial Models
Spatial models is a second class of downscaling methods that use information from the full study domain to form relationships between observations and ESM data. Scikit-downscale implements these models as as SpatialDownscaler. Beyond providing fit and predict methods that accept Xarray objects, the internal layout of these methods is intentionally unspecified. We are currently working on wrapping a few popular spatial downscaling models such as:

- MACA
- LOCA

## Benchmark Applications
Its likely that one of the reasons we haven’t seen strong consensus develop around particularl downscaling methodologies is the abscense of widely available benchamrk applications to test methods against eachother. We haven’t solved this problem but we are motivated to work accross the community to develop a new benchmark applications.

Data

Metrics

## Example #1 Gard

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import scipy

import numpy as np
import pandas as pd
import xarray as xr

from xsd.pointwise_models import PureAnalog, AnalogRegression

In [None]:
# open a small dataset for training
training = xr.open_zarr('../data/downscale_test_data.zarr.zip', group='training')
training

In [None]:
# open a small dataset of observations (targets)
targets = xr.open_zarr('../data/downscale_test_data.zarr.zip', group='targets')
targets

In [None]:
# extract 1 point of training data for precipitation and temperature 
X_temp = training.isel(point=0).to_dataframe()[['T2max']] - 273.13
X_pcp = training.isel(point=0).to_dataframe()[['PREC_TOT']] * 24
display(X_temp.head(), X_pcp.head())

In [None]:
# extract 1 point of target data for precipitation and temperature 
y_temp = targets.isel(point=0).to_dataframe()[['Tmax']]
y_pcp = targets.isel(point=0).to_dataframe()[['Prec']]
display(y_temp.head(), y_pcp.head())

In [None]:
# Fit/predict using the PureAnalog class
for kind in ['best_analog', 'sample_analogs', 'weight_analogs', 'mean_analogs']:
    pure_analog = PureAnalog(kind=kind, n_analogs=10)
    pure_analog.fit(X_temp[:1000], y_temp[:1000])
    out = pure_analog.predict(X_temp[1000:])

    plt.plot(out[:300], label=kind)

In [None]:
# Fit/predict using the AnalogRegression class
analog_reg = AnalogRegression(n_analogs=100)
analog_reg.fit(X_temp[:1000], y_temp[:1000])
out = analog_reg.predict(X_temp[1000:])
plt.plot(out[:300], label='AnalogRegression')
plt.legend()

## Example #2 bcsd

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import scipy

import numpy as np
import pandas as pd
import xarray as xr

from xsd.pointwise_models import BcsdTemperature, BcsdPrecipitation

In [None]:
# utilities for plotting cdfs
def plot_cdf(ax=None, **kwargs):
    if ax:
        plt.sca(ax)
    else:
        ax = plt.gca()
    
    for label, X in kwargs.items():
        vals = np.sort(X, axis=0)
        pp = scipy.stats.mstats.plotting_positions(vals)  
        ax.plot(pp, vals, label=label)
    ax.legend()
    return ax


def plot_cdf_by_month(ax=None, **kwargs):
    fig, axes = plt.subplots(4, 3, sharex=True, sharey=False, figsize=(12, 8))
    
    for label, X in kwargs.items():
        for month, ax in zip(range(1, 13), axes.flat):
            
            vals = np.sort(X[X.index.month == month], axis=0)
            pp = scipy.stats.mstats.plotting_positions(vals)  
            ax.plot(pp, vals, label=label)
            ax.set_title(month)
    ax.legend()
    return ax
    

In [None]:
# open a small dataset for training
training = xr.open_zarr('../data/downscale_test_data.zarr.zip', group='training')
training

In [None]:
# open a small dataset of observations (targets)
targets = xr.open_zarr('../data/downscale_test_data.zarr.zip', group='targets')
targets

In [None]:
# extract 1 point of training data for precipitation and temperature 
X_temp = training.isel(point=0).to_dataframe()[['T2max']].resample('MS').mean() - 273.13
X_pcp = training.isel(point=0).to_dataframe()[['PREC_TOT']].resample('MS').sum() * 24
display(X_temp.head(), X_pcp.head())

In [None]:
# extract 1 point of target data for precipitation and temperature 
y_temp = targets.isel(point=0).to_dataframe()[['Tmax']].resample('MS').mean()
y_pcp = targets.isel(point=0).to_dataframe()[['Prec']].resample('MS').sum()
display(y_temp.head(), y_pcp.head())

In [None]:
# Fit/predict the BCSD Temperature model
bcsd_temp = BcsdTemperature()
bcsd_temp.fit(X_temp, y_temp)
out = bcsd_temp.predict(X_temp) + X_temp
plot_cdf(X=X_temp, y=y_temp, out=out)
out.plot()

In [None]:
plot_cdf_by_month(X=X_temp, y=y_temp, out=out)

In [None]:
# Fit/predict the BCSD Precipitation model
bcsd_pcp = BcsdPrecipitation()
bcsd_pcp.fit(X_pcp, y_pcp)
out = bcsd_pcp.predict(X_pcp) * X_pcp  
plot_cdf(X=X_pcp, y=y_pcp, out=out)

In [None]:
plot_cdf_by_month(X=X_pcp, y=y_pcp, out=out)

## Example #3 zscore

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import scipy

import numpy as np
import pandas as pd
import xarray as xr

from xsd.pointwise_models import ZScoreRegresssor

In [None]:
# open a small dataset
data = xr.open_zarr('../data/downscale_test_data.zarr.zip', group='UAS')
data

In [None]:
# select one point

In [None]:
# align time
def _cfnoleap_to_datetime(da):
    datetimeindex = da.indexes['time'].to_datetimeindex()
    ds = da#.to_dataset()
    ds['time_dt']= ('time', datetimeindex)
    ds = ds.swap_dims({'time': 'time_dt'})
    assert len(da.time) == len(ds.time_dt)
    return ds

In [None]:
# regroup models by time
def _regroup_models_bytime(ds_meas, ds_hist_dt, ds_rcp_dt):
    t0_meas = ds_meas.time[0]
    tn_meas = ds_meas.time[-1]
    t0_fut = tn_meas.values + np.timedelta64(1, 'D')
    
    ds_past = ds_hist_dt.sel(time_dt = slice(t0_meas, tn_meas))
    ds_past = ds_past.swap_dims({'time_dt':'time'})
    
    ds_fut_pt1 = ds_hist_dt.sel(time_dt = slice(t0_fut,None))
    ds_fut = xr.concat([ds_fut_pt1[var], ds_rcp_dt[var]], 'time_dt')
    ds_fut = ds_fut.swap_dims({'time_dt':'time'})
    return ds_past, ds_fut

ds_past, ds_fut = _regroup_models_bytime(ds_meas_noleap, ds_hist_dt, ds_rcp85_dt)

In [None]:
# bias correction using ZScoreRegresssor

In [None]:
# visualize the correction
def gaus(mean, std, doy):
    a = mean.sel(day=doy)
    mu = a.isel(lon = 0, lat = 0)

    b =std.sel(day=doy)
    sigma = b.isel(lon = 0, lat = 0)

    x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
    y = stats.norm.pdf(x, mu, sigma)
    return x, y

fut_typ_mean, fut_typ_std, fut_typ_zscore = _calc_stats(ds_fut, window_width)
fut_typ_mean_bc = fut_typ_mean + shift
fut_typ_std_bc = fut_typ_std * scale

doy=20
plt.figure()
x,y = gaus(hist_mean[var], hist_std[var], doy)
plt.plot(x, y, 'orange', label = 'historical model')
x,y = gaus(meas_mean[var], meas_std[var], doy)
plt.plot(x, y, 'red', label = 'measured')
x,y = gaus(fut_typ_mean, fut_typ_std, doy)
plt.plot(x, y, 'blue', label = 'raw future model')
x,y = gaus(fut_typ_mean_bc[var], fut_typ_std_bc[var], doy)
plt.plot(x, y, 'green', label = 'corrected future model')
plt.legend()

## Wrapper

## Call for Participation
This effort is just getting started. With the recent release of CMIP6, we expect a surge of interest in downscaled climate data. There are clear opportunities for involvement from climate impacts practicioneers, computer scientists with an interest in machine learning for climate applications, and climate scientists alike. Please reach out if you are interested in participating in any way.