# Train ML model to correct predictions of week 3-4 & 5-6

This notebook create a Machine Learning `ML_model` to predict weeks 3-4 & 5-6 based on `S2S` weeks 3-4 & 5-6 forecasts and is compared to `CPC` observations for the [`s2s-ai-challenge`](https://s2s-ai-challenge.github.io/).

# Synopsis

## Method: `ML-based mean bias reduction`

- calculate the ML-based bias from 2000-2019 deterministic ensemble mean forecast
- remove that the ML-based bias from 2020 forecast deterministic ensemble mean forecast

## Data used

type: renku datasets

Training-input for Machine Learning model:
- hindcasts of models:
    - ECMWF: `ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr`

Forecast-input for Machine Learning model:
- real-time 2020 forecasts of models:
    - ECMWF: `ecmwf_forecast-input_2020_biweekly_deterministic.zarr`

Compare Machine Learning model forecast against against ground truth:
- `CPC` observations:
    - `hindcast-like-observations_biweekly_deterministic.zarr`
    - `forecast-like-observations_2020_biweekly_deterministic.zarr`

## Resources used
for training, details in reproducibility

- platform: renku
- memory: 8 GB
- processors: 2 CPU
- storage required: 10 GB

## Safeguards

All points have to be [x] checked. If not, your submission is invalid.

Changes to the code after submissions are not possible, as the `commit` before the `tag` will be reviewed.
(Only in exceptions and if previous effort in reproducibility can be found, it may be allowed to improve readability and reproducibility after November 1st 2021.)

### Safeguards to prevent [overfitting](https://en.wikipedia.org/wiki/Overfitting?wprov=sfti1) 

If the organizers suspect overfitting, your contribution can be disqualified.

  - [x] We did not use 2020 observations in training (explicit overfitting and cheating)
  - [x] We did not repeatedly verify my model on 2020 observations and incrementally improved my RPSS (implicit overfitting)
  - [x] We provide RPSS scores for the training period with script `print_RPS_per_year`, see in section 6.3 `predict`.
  - [x] We tried our best to prevent [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)?wprov=sfti1).
  - [x] We honor the `train-validate-test` [split principle](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). This means that the hindcast data is split into `train` and `validate`, whereas `test` is withheld.
  - [x] We did not use `test` explicitly in training or implicitly in incrementally adjusting parameters.
  - [x] We considered [cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)).

### Safeguards for Reproducibility
Notebook/code must be independently reproducible from scratch by the organizers (after the competition), if not possible: no prize
  - [x] All training data is publicly available (no pre-trained private neural networks, as they are not reproducible for us)
  - [x] Code is well documented, readable and reproducible.
  - [x] Code to reproduce training and predictions is preferred to run within a day on the described architecture. If the training takes longer than a day, please justify why this is needed. Please do not submit training piplelines, which take weeks to train.

# Todos to improve template

This is just a demo.

- [ ] use multiple predictor variables and two predicted variables
- [ ] for both `lead_time`s in one go
- [ ] consider seasonality, for now all `forecast_time` months are mixed
- [ ] make probabilistic predictions with `category` dim, for now works deterministic

# Description of this notebook

* ANN that takes ensemble mean as input and returns a post-processed version of the input field (also a field, no terciles)
* ANN output is used by make_probabilistic to create tercile probabilities. In particular, each ensemble member is individually fed to the ANN, resulting in a post-processed ensemble. The post-processed ensemble is used to compute the tercile probabilities using make_probabilistic.
* additional features (week of year, lat/lon): no clear improvement in loss
* separate function for removing the annual cycle
* investigate annual cycle that is removed in the pre-processing
* standardization is done after preprocessing (always using train data)
* investigates what the model did learn: Model adjusts the fct to the obs (likely accounting for different representations of orography), though most of this adjustment is already done by the pre-/post-processing.

# Imports

In [None]:
from tensorflow.keras.layers import Input, Dense, Flatten
from tensorflow.keras.models import Sequential
import tensorflow.keras as keras

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import xarray as xr
xr.set_options(display_style='text')



from dask.utils import format_bytes
import xskillscore as xs

%matplotlib inline 
#for figures

#for prediction
from scripts import make_probabilistic
from scripts import add_valid_time_from_forecast_reference_time_and_lead_time
from scripts import skill_by_year
from scripts import add_year_week_coords

In [None]:
cache_path = '../template/data' #if you change this you also have to adjust the git lfs pull paths

# Get training data

preprocessing of input data may be done in separate notebook/script

## Hindcast

get weekly initialized hindcasts

In [None]:
# preprocessed as renku dataset
!git lfs pull ../template/data/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr

In [None]:
hind_2000_2019 = xr.open_zarr(f'{cache_path}/ecmwf_hindcast-input_2000-2019_biweekly_deterministic.zarr', consolidated=True)

## Observations
corresponding to hindcasts

In [None]:
# preprocessed as renku dataset
!git lfs pull ../template/data/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr

In [None]:
obs_2000_2019 = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_deterministic.zarr', consolidated=True)#[v]

terciled

In [None]:
!git lfs pull ../template/data/hindcast-like-observations_2000-2019_biweekly_terciled.zarr

In [None]:
obs_2000_2019_terciled = xr.open_zarr(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_terciled.zarr', consolidated=True)

### Select region

to make life easier for the beginning

In [None]:
lat = slice(90,0)
lon = slice(0,90)

In [None]:
hind_2000_2019 = hind_2000_2019.sel(longitude = lon, latitude = lat)
obs_2000_2019 = obs_2000_2019.sel(longitude = lon, latitude = lat)
obs_2000_2019_terciled = obs_2000_2019_terciled.sel(longitude = lon, latitude = lat)

2020 data contains the same amount of nan as 2018 and 2019 data.
the gridcells with partly missing values in obs are harder to predict, therefore taking the validation to be 2018, 2019 will lead to lower validation loss but high training loss. --> Make sure to have the same nans in all years. Differences in loss also due to different standardization (remove annual cycle with only two years, will make the problem much easier, than if annual cycle based on 18 years is removed)

## Train Validation split

In [None]:
# time is the forecast_time
time_train_start,time_train_end='2000','2017' # train#2017
time_valid_start,time_valid_end='2018','2019' # valid

## Weatherbench

based on [Weatherbench](https://github.com/pangeo-data/WeatherBench/blob/master/quickstart.ipynb)

In [None]:
# run once only and dont commit
#!git clone https://github.com/pangeo-data/WeatherBench/

In [None]:
#import sys
#sys.path.insert(1, 'WeatherBench')
#from WeatherBench.src.train_nn import PeriodicConv2D, create_predictions#DataGenerator, 

### define some vars

In [None]:
v='t2m'
bs=32

https://s2s-ai-challenge.github.io/

We deal with two fundamentally different variables here: 
- Total precipitation is precipitation flux pr accumulated over lead_time until valid_time and therefore describes a point observation. 
- 2m temperature is averaged over lead_time(valid_time) and therefore describes an average observation. 

The submission file data model unifies both approaches and assigns 14 days for week 3-4 and 28 days for week 5-6 marking the first day of the biweekly aggregate.

In [None]:
# 2 bi-weekly `lead_time`: week 3-4
lead = hind_2000_2019.lead_time[0]

### create datasets

### Masking

define mask to have the same missing values at all forecast_times

In [None]:
mask = xr.where(obs_2000_2019.notnull(),1,np.nan).mean('forecast_time', skipna = False)
mask

In [None]:
mask.t2m.sel(lead_time = lead).plot()

In [None]:
#validation
fct_valid = hind_2000_2019.sel(forecast_time=slice(time_valid_start,time_valid_end))[v].mean('realization')
verif_valid = obs_2000_2019.sel(forecast_time=slice(time_valid_start,time_valid_end))[v]

fct_valid = fct_valid.where(mask[v].notnull())
verif_valid = verif_valid.where(mask[v].notnull())

In [None]:
#train:
#uses only ensemble mean so far
fct_train = hind_2000_2019.sel(forecast_time=slice(time_train_start,time_train_end))[v].mean('realization')
verif_train = obs_2000_2019.sel(forecast_time=slice(time_train_start,time_train_end))[v]

fct_train = fct_train.where(mask[v].notnull())
verif_train = verif_train.where(mask[v].notnull())

In [None]:
#orange: number of missing values without masking, blue: with masking
fct_nans = xr.where(np.isnan(fct_train), 1, 0)
fct_nans.sum(('lead_time', 'latitude', 'longitude')).plot()
obs_nans = xr.where(np.isnan(obs_2000_2019), 1, 0)
obs_nans.t2m.sum(('lead_time', 'latitude', 'longitude')).plot()
#mask does what it should

### Annual cycle

#### obs

In [None]:
#plotting takes some time
ds = verif_train.sel(lead_time = lead)
ds_train = verif_train.sel(lead_time = lead)

ds = add_year_week_coords(ds)
ds_train = add_year_week_coords(ds_train)

if 'realization' in ds.coords:
    ens_mean = ds_train.mean('realization')
else:
    ens_mean = ds_train

annual_cycle = ens_mean.groupby('week').mean(['forecast_time'])
annual_cycle = annual_cycle.stack(z = ('latitude','longitude'))

#reset_inex to avoit error message
annual_cycle = annual_cycle.reset_index("z")
#https://github.com/pydata/xarray/pull/3938
annual_cycle.plot(hue='z', x="week", add_legend = False);

#### ensemble forecasts

In [None]:
ds = fct_train.sel(lead_time = lead)
ds_train = fct_train.sel(lead_time = lead)

ds = add_year_week_coords(ds)
ds_train = add_year_week_coords(ds_train)

if 'realization' in ds.coords:
    ens_mean = ds_train.mean('realization')
else:
    ens_mean = ds_train

annual_cycle = ens_mean.groupby('week').mean(['forecast_time'])
stacked = annual_cycle.stack(z = ('latitude','longitude'))

stacked = stacked.reset_index("z")
stacked.plot(hue='z', x="week", add_legend = False);

In [None]:
#with removed annual cycle: example year 2000

ds_stand = (ds - annual_cycle)
ds_stand = ds_stand.sel({'week' : ds.coords['week']})
ds_stand = ds_stand.drop(['week','year','valid_time'])

ds_stand = ds_stand.sel(forecast_time = '2000')

ds_stand = ds_stand.stack(z = ('latitude','longitude'))
ds_stand = ds_stand.reset_index("z")
ds_stand.plot(hue='z', x="forecast_time", add_legend = False);

### Preprocessing

In [None]:
def rm_annualcycle(ds, ds_train):
    #remove annual cycle for each location 
    
    ds = add_year_week_coords(ds)
    ds_train = add_year_week_coords(ds_train)
    
    if 'realization' in ds_train.coords:
        ens_mean = ds_train.mean('realization')
    else:
        ens_mean = ds_train

    ds_stand = ds - ens_mean.groupby('week').mean(['forecast_time'])#always use train to remove annual cycle

    ds_stand = ds_stand.sel({'week' : ds.coords['week']})
    ds_stand_ = ds_stand.drop(['week','year'])
 
    return ds_stand_

In [None]:
def ann_preprocess(ds, ds_train, v,lead):
    ds = ds.sel(lead_time = lead)
    ds_train = ds_train.sel(lead_time = lead)
 
    ds = rm_annualcycle(ds, ds_train)
        
    #provide time feature
    week = add_year_week_coords(ds)
    week_ = np.cos(2*np.pi/53*(week.week +53/2))
    week_ = week_.drop(['week','year'])
    week_ = week_.expand_dims({'longitude': ds.longitude, 'latitude': ds.latitude})
    
    var = ds.drop(['week','year']).to_dataset(name = '{}'.format(v))
    week_ = week_.to_dataset(name = 'week')
    combined = xr.combine_by_coords([var, week_])

    df = combined.to_dataframe()
    df = df.drop(['lead_time','valid_time'], axis =1).reset_index()
    df = df.dropna(axis = 0)
    
    df_ref = df
    
    df = df.drop(['forecast_time'], axis = 1)
    return df, df_ref

In [None]:
def ann_preprocess_label(ds,ds_train,v,lead):
    ds = ds.sel(lead_time = lead)
    ds_train = ds_train.sel(lead_time = lead)
    
    ds = rm_annualcycle(ds, ds_train)
    
    df = ds.to_dataframe()
    df = df.drop(['lead_time','valid_time'], axis =1).reset_index()
    
    df = df.dropna(axis = 0)
    
    df=df.drop(['forecast_time','latitude','longitude'], axis = 1)
    return df

In [None]:
#define dataframes
df_verif_train = ann_preprocess_label(verif_train, verif_train, v, lead)
df_fct_train, df_fct_train_ref = ann_preprocess(fct_train, fct_train, v, lead)
df_verif_valid = ann_preprocess_label(verif_valid, verif_train, v, lead)
df_fct_valid, df_fct_valid_ref = ann_preprocess(fct_valid, fct_train, v, lead)

In [None]:
print(df_verif_train.mean(axis = 0))
print(df_verif_train.std(axis = 0))
print(df_fct_train.mean(axis = 0))
print(df_fct_train.std(axis = 0))

In [None]:
print(df_verif_valid.mean(axis = 0))
print(df_verif_valid.std(axis = 0))
print(df_fct_valid.mean(axis = 0))
print(df_fct_valid.std(axis = 0))

In [None]:
#standardize input and output

mean_verif_train = df_verif_train.mean(axis = 0)
std_verif_train = df_verif_train.std(axis = 0)
mean_fct_train = df_fct_train.mean(axis = 0)
std_fct_train = df_fct_train.std(axis = 0)

#validation set using train mean and std
df_verif_valid = (df_verif_valid - mean_verif_train)/std_verif_train
df_fct_valid   = (df_fct_valid - mean_fct_train)/std_fct_train

df_verif_train = (df_verif_train - mean_verif_train)/std_verif_train
df_fct_train   = (df_fct_train - mean_fct_train)/std_fct_train

In [None]:
df_verif_train

In [None]:
df_fct_train

### ANN

In [None]:
from tensorflow.keras.layers import *
from tensorflow.keras.regularizers import l2, l1

ann = keras.models.Sequential([
    Dense(100, input_shape=(4,), activation='relu'),
    #Dropout(0.2),
    Dense(100,  activation='relu'),
    #Activation('softmax')
    Dense(1,activation='linear'),
])

In [None]:
ann.summary()

In [None]:
ann.compile(loss='mse', optimizer=keras.optimizers.Adam(1e-4))

In [None]:
import warnings
warnings.simplefilter("ignore")

In [None]:
ann.fit(df_fct_train, df_verif_train, batch_size = 1024, epochs=5, validation_data=(df_fct_valid, df_verif_valid))

### Predict

In [None]:
def postprocess_output(output, df_ref, ds_input, verif_train, v, mean_verif_train, std_verif_train):
    
    #add columns
    output['latitude'] = df_ref.latitude.values
    output['longitude'] = df_ref.longitude.values
    output['forecast_time'] = df_ref.forecast_time.values
    
    output[v] = output[v]*std_verif_train[v]  + mean_verif_train[v]#undo standardization 
    ##shold be done using train
    
    #create MultiIndex
    output = output.pivot_table(values = v, index = ['latitude','longitude','forecast_time'])
    
    #convert to dataset
    xr_output = xr.Dataset.from_dataframe(output)
    
    #retain the complete coords    
    temp = ds_input.sel(lead_time = lead).drop(['valid_time','lead_time'])
    temp = temp.to_dataset(name = 'zeros')
    merged = xr.merge([xr_output, temp])
    merged = merged.drop('zeros')

    #add annual cycle
    annual_cycle = add_year_week_coords(verif_train)
    if 'realization' in verif_train.coords:
        annual_cycle = annual_cycle.groupby('week').mean(['forecast_time']).mean('realization')  
    else:
        annual_cycle = annual_cycle.groupby('week').mean(['forecast_time'])

    pred = add_year_week_coords(merged)
    pred = pred + annual_cycle.sel(lead_time = lead)
    pred = pred.sel({'week' : merged.coords['week']})
    
    return pred

In [None]:
def ml_member(data, fct_train, v, lead, df_verif_train, verif_train, ann, 
              mean_verif_train, std_verif_train, mean_fct_train,std_fct_train):

    #preprocess i.e convert to pandas
    df_real, df_ref = ann_preprocess(data,fct_train, v, lead)

    #standardize using train data
    df_real  = (df_real - mean_fct_train)/std_fct_train
    
    #predict plus add column headers
    pred = pd.DataFrame(ann.predict(df_real), columns = df_verif_train.columns)
    #convert back to xarray and add annual cycle
    pred__ = postprocess_output(pred, df_ref, data, verif_train, v, mean_verif_train, std_verif_train)
    
    return pred__
    

In [None]:
#load validation forecasts (as ensemble and not ensemble mean, model is still trained on ens mean)
test = hind_2000_2019.sel(forecast_time=slice(time_valid_start,time_valid_end))[v]


In [None]:
test

In [None]:
# predict for all ensemble members
preds = []
for i in test.realization.values:
    pred = ml_member(test.sel(realization = i).drop('realization'), 
                     fct_train, v, lead, df_verif_train, verif_train, ann,
                     mean_verif_train, std_verif_train, mean_fct_train,std_fct_train)
    pred = pred.assign_coords(realization = i)
    preds.append(pred)
    
preds = xr.concat(preds, dim = 'realization')

In [None]:
preds

### Make probability forecast

In [None]:
!git lfs pull ../template/data/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc
tercile_edges = xr.open_dataset(f'{cache_path}/hindcast-like-observations_2000-2019_biweekly_tercile-edges.nc')

In [None]:
tercile_edges = tercile_edges.sel(longitude = lon, latitude = lat)
tercile_edges

In [None]:
##double lead st make_prob works
#make_prob also for second lead time, since two dim lead-time vector is wanted by make probabilistic
preds__ = preds.reset_coords('lead_time').drop('lead_time')
preds_1 = preds__.assign_coords(lead_time = test.lead_time[0])
preds_2 = preds__.assign_coords(lead_time = test.lead_time[1])
    
preds_new = xr.concat([preds_1, preds_2], dim = 'lead_time')
preds_new

In [None]:
obs_preds = make_probabilistic(preds_new.isel(forecast_time = 0).t2m, tercile_edges)
obs_preds.t2m

In [None]:
obs_preds.isel(lead_time = 0)['t2m'].where(mask[v].sel(lead_time = lead).notnull()).plot(col = 'category')

In [None]:
obs_2000_2019_terciled.sel(forecast_time = '2018-01-02', lead_time = lead).t2m.plot(col = 'category', figsize=(10, 6),cbar_kwargs={'orientation': 'horizontal'})

In [None]:
obs_2000_2019_terciled.sel(forecast_time = '2018-01-02')

### prob forecast of test

In [None]:
test_raw = make_probabilistic(test, tercile_edges)
test_raw

In [None]:
test_raw.isel(lead_time = 0).isel(forecast_time = 0)['t2m'].where(mask[v].sel(lead_time = lead).notnull()).plot(col = 'category')

### prob forecast of test passed through pipeline without ml model

In [None]:
###shows how to undo all the transformations
### i.e. returns the input

def ml_member_without_ml(data, fct_train, v, lead, df_verif_train, verif_train, ann, mean_verif_train, std_verif_train):

    #preprocess i.e convert to pandas
    df_real, df_ref = ann_preprocess(data,fct_train, v, lead)#remove ann cycle fct #orig pipeline
    
    #standardize using train fct data
    df_real  = (df_real - mean_fct_train)/std_fct_train #orig pipeline
    
    #predict plus add column headers
    #pred = df_real #pred = pd.DataFrame(ann.predict(df_real), columns = df_verif_train.columns)
    
    #pred = pred*std_fct_train + mean_fct_train #undo forecast standardization
    
    #pred = (pred - mean_verif_train)/std_verif_train # do obs stand
    
    pred = df_real
    #convert back to xarray and add annual cycle
    pred_ = postprocess_output(pred, df_ref, data, verif_train, v, mean_verif_train, std_verif_train)# orig pipeline
    #undo obs stand
    #add ann obs
    
    #pred_ = rm_annualcycle(pred_, verif_train.sel(lead_time = lead))


    #add annual cycle
    #annual_cycle = add_year_week_coords(fct_train)
    #if 'realization' in verif_train.coords:
    #    annual_cycle = annual_cycle.groupby('week').mean(['forecast_time']).mean('realization')  
    #else:
    #    annual_cycle = annual_cycle.groupby('week').mean(['forecast_time'])

    #pred__ = add_year_week_coords(pred_)
    #pred__ = pred__ + annual_cycle.sel(lead_time = lead)
    #pred__ = pred__.sel({'week' : pred_.coords['week']})
    
    
     #   pred = add_year_week_coords(merged)
    #pred = pred + annual_cycle.sel(lead_time = lead)
    #pred = pred.sel({'week' : merged.coords['week']})
    
    pred__ = pred_
    return pred__

In [None]:
preds_raw = []
for i in test.realization.values:
    pred_raw = ml_member_without_ml(test.sel(realization = i).drop('realization'), 
                           fct_train, v, lead, df_verif_train, verif_train, ann, mean_verif_train, std_verif_train)
    pred_raw = pred_raw.assign_coords(realization = i)
    preds_raw.append(pred_raw)
    
preds_raw = xr.concat(preds_raw, dim = 'realization')

In [None]:
###this is the transformation that the ANN has to learn at least

(preds_raw.isel(forecast_time = 0).t2m - test.where(mask[v].notnull()).isel(forecast_time = 0).sel(lead_time = lead)).plot(col = 'realization', col_wrap = 4)

In [None]:
fct_time = slice(0 +53,52 + 53,4)

In [None]:
#raw fct input minus obs

(test.isel(forecast_time = fct_time).sel(lead_time = lead).mean('realization') - verif_valid.where(mask[v].notnull()).isel(forecast_time = fct_time).sel(lead_time = lead)).plot(col = 'forecast_time', col_wrap = 5, cmap='RdBu_r', vmin=-15, vmax=15)

In [None]:
##the ml model does indeed remove local forecast biases, although it is not able to capture all biases
#biases caused by topography are mostly removed by the pre- and post-processing (see below), ml model contributes not much 

#ml transformed input minus obs
(preds.isel(forecast_time = fct_time).t2m.mean('realization') - verif_valid.where(mask[v].notnull()).isel(forecast_time = fct_time).sel(lead_time = lead)).plot(col = 'forecast_time', col_wrap = 5,cmap='RdBu_r', vmin=-15, vmax=15)

In [None]:
#transformed input minus obs
(preds_raw.isel(forecast_time = fct_time).t2m.mean('realization') - verif_valid.where(mask[v].notnull()).isel(forecast_time = fct_time).sel(lead_time = lead)).plot(col = 'forecast_time', col_wrap = 5,cmap='RdBu_r', vmin=-15, vmax=15)

In [None]:
preds_1_raw = preds_raw.assign_coords(lead_time = test.lead_time[0])
preds_2_raw = preds_raw.assign_coords(lead_time = test.lead_time[1])
    
preds_new_raw = xr.concat([preds_1_raw, preds_2_raw], dim = 'lead_time')

In [None]:
prob_preds_raw = make_probabilistic(preds_new_raw.isel(forecast_time = 0).t2m, tercile_edges)

In [None]:
prob_preds_raw.isel(lead_time = 0)['t2m'].where(mask[v].sel(lead_time = lead).notnull()).plot(col = 'category')

## Compute RPSS

In [None]:
def skill_by_year_single(prediction, terciled_obs):
    """version of skill_by_year adjusted to one var and one lead time and flexibel validation period"""
    fct_p = prediction
    obs_p = terciled_obs


    # climatology
    clim_p = xr.DataArray([1/3, 1/3, 1/3], dims='category', coords={'category':['below normal', 'near normal', 'above normal']}).to_dataset(name='tp')
    clim_p['t2m'] = clim_p['tp']

    clim_p = clim_p[v]

    ## RPSS
    # rps_ML
    rps_ML = xs.rps(obs_p, fct_p, category_edges=None, dim=[], input_distributions='p').compute()
    # rps_clim
    rps_clim = xs.rps(obs_p, clim_p, category_edges=None, dim=[], input_distributions='p').compute()

    # rpss
    rpss = 1 - (rps_ML / rps_clim)

    # https://renkulab.io/gitlab/aaron.spring/s2s-ai-challenge-template/-/issues/7

    # penalize
    penalize = obs_p.where(fct_p!=1, other=-10).mean('category')
    rpss = rpss.where(penalize!=0, other=-10)

    # clip
    rpss = rpss.clip(-10, 1)

    # average over all forecasts
    rpss_year = rpss.groupby('forecast_time.year').mean()

    # weighted area mean
    weights = np.cos(np.deg2rad(np.abs(rpss_year.latitude)))
    # spatially weighted score averaged over lead_times and variables to one single value
    scores = rpss_year.sel(latitude=slice(None, -60)).weighted(weights).mean('latitude').mean('longitude')
    #scores = scores.to_array().mean(['lead_time', 'variable'])

    return scores.to_dataframe('RPSS') 

### RPSS of ML post-processing

In [None]:
skill_by_year_single(obs_preds[v].sel(lead_time = lead), 
                     obs_2000_2019_terciled.sel(forecast_time=slice(time_valid_start,time_valid_end))[v].sel(lead_time = lead))

In [None]:
# RPSS in the order of -0.3

### RPSS of pre/post-processing without ML

In [None]:
skill_by_year_single(prob_preds_raw.sel(lead_time = lead).t2m, 
                     obs_2000_2019_terciled.sel(forecast_time=slice(time_valid_start,time_valid_end))[v].sel(lead_time = lead))

In [None]:
# RPSS in the order of -0.45

### RPSS of raw ensemble

In [None]:
skill_by_year_single(test_raw.sel(lead_time = lead).t2m, 
                     obs_2000_2019_terciled.sel(forecast_time=slice(time_valid_start,time_valid_end))[v].sel(lead_time = lead))

In [None]:
# RPSS in the order of -0.635

#### Post-processed ensemble achieved higher RPSS, but is still below baseline.
The post-processing using the ml model was most successful, but also the simple pre-/post-processing (removing local annual cycle and feature standardization) did improve the RPSS compared to the raw ensemble.
Here, only the climatology (1/3, 1/3, 1/3) is used as a baseline. The RPSS is always negative which means that all approaches are worse than climatology.