# EasyUQ

Show EasyUQ method on simulated data. For smooth EasyUQ find "optimal" degree of freedom (df) and bandwidth (h). Note that df = None refers to gaussian kernel. Moreover df can be 2, 3, 4, 5, 10, 20 with student t kernel. Bandwidth is fitted by optimizing log score (Blends method). 

Approaches to find optimal set of df and h:

    1. One-fit approach
    2. Cross validation approach (with cv = 3 and leave-one-out cv)
    3. Using gaussian kernel and silvermans rule of thumb for bandwidth

In [1]:
import numpy as np
import pandas as pd
import random
from isodisreg import idr
from statsmodels.nonparametric.bandwidths import bw_silverman
from helper_functions import llscore, smooth_crps, optimize_paras_onefit, optimize_paras_cvfit

Simulation from (2.5) in EasyUQ paper:

In [2]:
# Simulate data
n = 1000
forecast = np.random.uniform(low=0.0, high=10.0, size=n)
y_true = np.random.gamma(shape = np.sqrt(forecast), scale=np.minimum(np.maximum(forecast, 2), 8), size=n)  

Split data into train and test

In [3]:
# Divide into train and test: 80 / 20
split_ix = int(np.floor(n * 0.8))
ix_data = np.arange(n)
random.shuffle(ix_data)
forecast_train = forecast[:split_ix]
forecast_test = forecast[split_ix:]
y_true_train = y_true[:split_ix]
y_true_test = y_true[split_ix:]

Fit IDR on training data: CDF output is a step function

In [4]:
# Fit EasyUQ
fitted_idr = idr(y_true_train, pd.DataFrame({"fore": forecast_train}, columns=["fore"]))
preds_train = fitted_idr.predict(pd.DataFrame({"fore": forecast_train}, columns=["fore"]))
preds_test = fitted_idr.predict(pd.DataFrame({"fore": forecast_test}, columns=["fore"]))

Smoothing of IDR as proposed in EasyUQ paper. There are several approaches to find best set of df and h. There is a trade-off between performance and computing time

In [5]:
# Smoothing with ONE-FIT:
ll_train, h, df = optimize_paras_onefit(preds_train, y_true_train)

ll = llscore(preds_test, y_true_test, h, df)
crps_scdf = smooth_crps(preds_test, y_true_test, h, df)
crps_idr = np.mean(preds_test.crps(y_true_test))

print('One-fit df = %d' %df, 'and h = %f' %h)
print('CRPS IDR: %f' % crps_idr)
print('CRPS smooth: %f' % crps_scdf)
print('Log score: %f' % ll)

One-fit df = 2 and h = 0.827369
CRPS IDR: 4.381993
CRPS smooth: 4.377549
Log score: 3.193146


In [6]:
# Smoothing with cross validation:
ll_train, h, df = optimize_paras_cvfit(forecast_train, y_true_train, cv = 3)

ll = llscore(preds_test, y_true_test, h, df)
crps_scdf = smooth_crps(preds_test, y_true_test, h, df)
crps_idr = np.mean(preds_test.crps(y_true_test))

print('CV-fit df = %d' %df, 'and h = %f' %h)
print('CRPS IDR: %f' % crps_idr)
print('CRPS smooth: %f' % crps_scdf)
print('Log score: %f' % ll)

CV-fit df = 2 and h = 1.204223
CRPS IDR: 4.381993
CRPS smooth: 4.382789
Log score: 3.202911


In [7]:
# Smoothing with leave-one-out cross validation:
ll_train, h, df = optimize_paras_cvfit(forecast_train, y_true_train, cv = len(y_true_train))

ll = llscore(preds_test, y_true_test, h, df)
crps_scdf = smooth_crps(preds_test, y_true_test, h, df)
crps_idr = np.mean(preds_test.crps(y_true_test))

print('LLO-fit df = %d' %df, 'and h = %f' %h)
print('CRPS IDR: %f' % crps_idr)
print('CRPS smooth: %f' % crps_scdf)
print('Log score: %f' % ll)

LLO-fit df = 2 and h = 0.917030
CRPS IDR: 4.381993
CRPS smooth: 4.378388
Log score: 3.193401


In [8]:
# Smoothing using Gaussian kernel and Silverman's rule of thumb:
rot = bw_silverman(y_true_train)

ll = llscore(preds_test, y_true_test, h = rot, df=None)
crps_scdf =smooth_crps(preds_test, y_true_test, h = rot, df=None)
crps_idr = np.mean(preds_test.crps(y_true_test))

print('Set df = None and Silverman ROT = %f' %rot)
print('CRPS IDR: %f' % crps_idr)
print('CRPS smooth: %f' % crps_scdf)
print('Log score: %f' % ll)

Set df = None and Silverman ROT = 2.744865
CRPS IDR: 4.381993
CRPS smooth: 4.404581
Log score: 3.453448


# The End