# Holt-Winters Demo

Holt-Winters is a time-series analysis technique, used in both forecasting future entries in a time series as well as in providing exponential smoothing, where weights are assigned against historical data with exponentially decreasing impact. It does this by analyzing three components of the data: level, trend, and seasonality. 

The model can take array-like objects, either in host as NumPy arrays or in device (as Numba or cuda_array_interface-compliant), as well as cuDF DataFrames as the input.

For information on cuDF, refer to the cuDF documentation: https://rapidsai.github.io/projects/cudf/en/latest/

For information on cuML's Holt-Winters implementation: https://rapidsai.github.io/projects/cuml/en/latest/api.html

In [1]:
import os

import numpy as np

import pandas as pd
import cudf as gd

from sklearn.metrics import r2_score

from cuml.tsa import ExponentialSmoothing as cuES
from statsmodels.tsa.holtwinters import ExponentialSmoothing as smES

In [2]:
import warnings
warnings.filterwarnings('ignore')

## Define Parameters

In [3]:
n_series = 500
n_samples = 750

## Generate Data

### Host

In [4]:
def trend(x, m=1, b=0):
    return m*x + b

def sine_season(x, fs=100, f=2, amp=1):
    return amp * np.sin(2*np.pi*f * (x/fs))

def normal_noise(scale=1, size=1):
    return np.random.normal(scale=scale, size=size)

In [5]:
def get_timeseries_components(fs=100, f=4, m=1, b=0, amp=1, scale=1):
    x = np.arange(fs)
    t = trend(x, m, b)
    s = sine_season(x, fs, f, amp)
    n = normal_noise(scale, fs)
    return (t, s, n)

In [6]:
%%time

np.random.seed(100)

tt_split = int(0.8*n_samples)
n_preds = n_samples - tt_split
train_pdf = pd.DataFrame()
test_pdf = pd.DataFrame()

for i in range(n_series):
    t, s, n = get_timeseries_components(fs=n_samples,
                                        f=4,
                                        m=np.random.uniform(-2, 2),
                                        b=np.random.uniform(-3, 3),
                                        amp=np.random.uniform(-1, 1),
                                        scale=np.random.uniform(1, 3))
    time_series = t + s + n
    train_pdf[i] = time_series[:tt_split]
    test_pdf[i] = time_series[tt_split:]

CPU times: user 428 ms, sys: 8 ms, total: 436 ms
Wall time: 435 ms


### GPU

In [7]:
%%time

train_gdf = gd.from_pandas(train_pdf)

CPU times: user 1.1 s, sys: 888 ms, total: 1.99 s
Wall time: 2.01 s


## Statsmodels Model

### Fit / Forecast

In [8]:
%%time

sm_preds = np.zeros((n_series, n_preds))

for i in range(len(train_pdf.columns)):
    sm = smES(train_pdf[train_pdf.columns[i]], seasonal_periods=int(n_samples/4), seasonal='add')
    sm = sm.fit()
    sm_preds[i] = sm.forecast(n_preds)

CPU times: user 28.2 s, sys: 40 ms, total: 28.2 s
Wall time: 28.2 s


## cuML Model

### Fit

In [9]:
%%time

cu = cuES(train_gdf, seasonal_periods=int(n_samples/4), seasonal='add', ts_num=n_series)
cu.fit()

CPU times: user 1.24 s, sys: 652 ms, total: 1.9 s
Wall time: 1.89 s


### Forecast

In [10]:
%%time

cu_preds = cu.forecast(n_preds).as_matrix().transpose()

CPU times: user 420 ms, sys: 8 ms, total: 428 ms
Wall time: 427 ms


## Evaluate Results

In [11]:
test_arr = test_pdf.values.transpose()

cu_r2_scores = r2_score(test_arr, cu_preds)
sm_r2_scores = r2_score(test_arr, sm_preds)

print("Average cuES r2 score: " + str(np.mean(cu_r2_scores)))
print("Average smES r2 score: " + str(np.mean(sm_r2_scores)))

Average cuES r2 score: 0.9999772591651792
Average smES r2 score: 0.9997857846562314
