# Holt-Winters Demo

Holt-Winters is a time-series analysis technique, used in both forecasting future entries in a time series as well as in providing exponential smoothing, where weights are assigned against historical data with exponentially decreasing impact. It does this by analyzing three components of the data: level, trend, and seasonality. 

The model can take array-like objects, either in host as NumPy arrays or in device (as Numba or cuda_array_interface-compliant), as well as cuDF DataFrames as the input.

For information on cuDF, refer to the cuDF documentation: https://docs.rapids.ai/api/cudf/stable

For information on cuML's Holt-Winters implementation: https://rapidsai.github.io/projects/cuml/en/stable/api.html#holtwinters

In [None]:
import os

import numpy as np

import pandas as pd
import cudf as gd

from sklearn.metrics import r2_score

from cuml.tsa.holtwinters import ExponentialSmoothing as cuES
from statsmodels.tsa.holtwinters import ExponentialSmoothing as smES

In [None]:
import warnings
warnings.filterwarnings('ignore')

## Define Parameters

In [None]:
n_series = 500
n_samples = 750

## Generate Data

To create a dataset on which to run Holt-Winters, we will artificially create additive time series by generating trend-components, seasonality-components, and noise-components, and taking the sum. Below we define our time series generator `get_timeseries_components`, which returns a tuple of randomly generated trend (with slope `m` and intercept `b`), season (with frequency `f` and amplitude `amp`), and noise (from a Gaussian distribution with scale `scale`) components of length `fs` -- by adding these parts together, we get a complete series on which to run Holt-Winters.

### Host

In [None]:
def trend(x, m=1, b=0):
    return m*x + b

def sine_season(x, fs=100, f=2, amp=1):
    return amp * np.sin(2*np.pi*f * (x/fs))

def normal_noise(scale=1, size=1):
    return np.random.normal(scale=scale, size=size)

In [None]:
def get_timeseries_components(fs=100, f=4, m=1, b=0, amp=1, scale=1):
    x = np.arange(fs)
    t = trend(x, m, b)
    s = sine_season(x, fs, f, amp)
    n = normal_noise(scale, fs)
    return (t, s, n)

In [None]:
%%time

np.random.seed(100)

tt_split = int(0.8*n_samples)
n_preds = n_samples - tt_split
train_pdf = pd.DataFrame()
test_pdf = pd.DataFrame()

for i in range(n_series):
    t, s, n = get_timeseries_components(fs=n_samples,
                                        f=4,
                                        m=np.random.uniform(-2, 2),
                                        b=np.random.uniform(-3, 3),
                                        amp=np.random.uniform(-1, 1),
                                        scale=np.random.uniform(1, 3))
    time_series = t + s + n
    train_pdf[i] = time_series[:tt_split]
    test_pdf[i] = time_series[tt_split:]

### GPU

In [None]:
%%time

train_gdf = gd.from_pandas(train_pdf)

## Statsmodels Model

smES requires that the time series `endog` be one-dimensional -- thus, to forecast out-of-sample predictions for multiple time series (in our case, all the columns of our dataframe), we have no choice but to iterate over the columns and for each time series, initialize, fit, and forecast. We store each series prediction in `sm_preds`.

### Fit / Forecast

In [None]:
%%time

sm_preds = np.zeros((n_series, n_preds))

for i in range(len(train_pdf.columns)):
    sm = smES(train_pdf[train_pdf.columns[i]], seasonal_periods=int(n_samples/4), seasonal='add')
    sm = sm.fit()
    sm_preds[i] = sm.forecast(n_preds)

## cuML Model

On the other hand, cuES allows for multi-dimensional input such as a cudf.DataFrame. When passed an entire dataframe, initialization, fitting, and forecasts for every series can be done simultaneously. These results are returned in a cudf.DataFrame, which we can cast to the same NumPy format as `sm_preds` by calling `as_matrix()`, and then to be row-major by calling `.transpose()`.

### Fit

In [None]:
%%time

cu = cuES(train_gdf, seasonal_periods=int(n_samples/4), seasonal='add', ts_num=n_series)
cu.fit()

### Forecast

In [None]:
%%time

cu_preds = cu.forecast(n_preds).as_matrix().transpose()

## Evaluate Results

In [None]:
test_arr = test_pdf.values.transpose()

cu_r2_scores = r2_score(test_arr, cu_preds)
sm_r2_scores = r2_score(test_arr, sm_preds)

print("Average cuES r2 score: %s" % np.mean(cu_r2_scores))
print("Average smES r2 score: %s" % np.mean(sm_r2_scores))