# Time Series Forecast - Exponential Smoothing

### Kumar Rahul

Forecasting the demand of services or products leads to better management of short term or long term planning. In this case, we are looking at the warranty related issues reported, on a particular brand of two-wheeler. The data is a monthly roll-up of approximately half a million issues reported by the customers over a four year period. 
We will be using Claim forecasting data in this exercise. Refer the **Exhibit 1** to understand the feature list. Use the data and answer the below questions.

1.	Load the time series dataset in Jupyter Notebook using pandas.
2.	Split the data into training set and test set. Use walk forward validation strategy for model building and evaluation.
3.	Given recent claim, what is the expected claim for the next time period? Build a model with statsmodel.api to forecast the amount claimed in next time step.
4.	 How do you interpret the model outcome? Report the model performance on the walk forward validation set.

**Exhibit 1**

|Sl. No.|Name of Variable|Variable Description|
|----------|------------|---------------|
|1	|date	|Date of Claim|
|2	|rate	|Amount claimed|
|3	|item	|Number of claims|



In [None]:
# load and clean-up data
from numpy import nan
from numpy import isnan
from pandas import to_numeric


import pandas as pd
import numpy as np
import warnings
from math import sqrt
from numpy import split
from numpy import array
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
#import matplotlib.pylab as plt

import statsmodels.tsa.holtwinters as hw
from statsmodels.tsa.holtwinters import ExponentialSmoothing, SimpleExpSmoothing, Holt
import statsmodels.api as sm

In [None]:
monthly_raw_df = pd.read_csv('./data/data_monthly.csv', sep=',', header=0, infer_datetime_format=True, 
                             index_col=['date'], 
                             parse_dates= ['date'],dayfirst=True)

In [None]:
monthly_raw_df.sort_index(inplace=True)
monthly_raw_df.info()

The data for the first day and last day of CGM monitoring being trucated as it has not been captured for the full cycle.

In [None]:
monthly_filter_df = monthly_raw_df.filter(['rate'], axis =1)
monthly_filter_df['rate'] = monthly_filter_df['rate'].map(lambda x:str(x).replace(',', '')).astype(float)

In [None]:
monthly_filter_df = monthly_filter_df[(monthly_filter_df.index >='2014-03-01') & 
                                      (monthly_filter_df.index <= '2017-05-31')]

monthly_filter_df.info()

## Problem Framing


We will use the data to explore a very specific question; that is:

**Given recent claim, what is the expected claim for the next time period?**

Plot of the original data is shown below:

In [None]:
pyplot.figure(figsize = (18, 5))
pyplot.plot(monthly_filter_df, 'b-')
pyplot.title('Monthly amount claimed over a 3 year period')

## Train and Test Sets
We will use the first three years of data for training predictive models and the final year for evaluating models.

The function split_filter_df() below splits the monthly data into train and test sets and organizes each into standard weeks.

Specific row offsets are used to split the data using knowledge of the filter_df. The split filter_dfs are then organized into  data using the NumPy split() function.

In [None]:
def split_filter_df(data):
    split_point = len(data) - 10
    train, test = data[0:split_point], data[split_point:]
    return train, test

In [None]:
# load the new file
train, test = split_filter_df(monthly_filter_df)

In [None]:
# validate train data
print(train.shape)
train.head()

In [None]:
# validate test data
print(test.shape)
test.head()

## Evaluation Metric

Both Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) can be used. Unlike MAE, RMSE is more punishing of forecast errors.

The function evaluate_forecasts_rmse() and evaluate_forecasts_mape() is being used for evaluating model performance.

In [None]:
# evaluate one or more  forecasts against expected values
def evaluate_forecasts_rmse(actual):
    score_rmse = 0
    se = 0
    # calculate an RMSE score for each day
    for i in range(actual.shape[0]):
        # calculate mse
        se += (actual.iloc[i,0] - actual.iloc[i,1])**2
        # calculate rmse
    score_rmse = sqrt(se/actual.shape[0])
    return score_rmse

In [None]:
# evaluate one or more  forecasts against expected values
def evaluate_forecasts_mape(actual):
    score_mape = 0
    ape = 0
    for i in range(actual.shape[0]):
        # calculate mse
        ape += np.abs(((actual.iloc[i,0] - actual.iloc[i,1])/actual.iloc[i,0]))
        # calculate mape
    score_mape = (ape)/actual.shape[0]
    return actual, score_mape

## Develop Basic Model

The model developed here is ExponentialSmooting (TES).

* SES - SimpleExpSmoothing
* Holt - DES
* ExponentialSmoothing - TES

### Walk-Forward validation

In [None]:
# evaluate a single model
def evaluate_model(model_func, train, test,alpha,beta,gamma,season):
    # history is a list of  data
    history = train.filter(['rate'], axis = 1)
    #print(history)
    # walk-forward validation over each week
    predictions = list()
    for i in range(len(test)):
        # predict the week
        yhat_sequence = model_func(history.values, alpha, beta, gamma, season)
        # store the predictions
        predictions.append(yhat_sequence)      
        # get real observation and add to history for predicting the next week
        history = history.append(test.iloc[[i]])
    predictions = array(predictions)
    test['prediction'] = predictions
    # evaluate predictions days for each week
    actual, score_mape = evaluate_forecasts_mape(test[:])
    score_rmse = evaluate_forecasts_rmse(test[:])
    return actual, score_mape, score_rmse

In [None]:
# forecast
def exp_forecast(history, alpha, beta, gamma, season):
    # define the model
    model = ExponentialSmoothing(history, seasonal = season, seasonal_periods=12)
    # fit the model
    model_fit = model.fit(smoothing_level=alpha, smoothing_slope=beta, smoothing_seasonal = gamma, optimized=True)
    # make forecast
    yhat = model_fit.forecast(steps=1)[0]
    return yhat

In [None]:
def del_column(test):
    for n in test.columns:
        if n =='prediction':
            test.drop('prediction', axis = 1, inplace=True) 

In [None]:
# define the names and functions for the models which is to be evaluated
models = dict()
models['ExponentialSmoothening'] = exp_forecast

import itertools
alpha = [0.2,0.4,0.6, 0.8]
beta  = [0.4,0.5, 0.6]
gamma = [0.1, 0.5]
season = ['add','mul']

grid_values = list(itertools.product(alpha, beta, gamma,season))
warnings.filterwarnings("ignore")

# evaluate each model
for name, func in models.items():
    best_mape,best_rmse, best_alpha, best_beta, best_gamma, best_season = float("inf"), float("inf"),float("inf"),float("inf"),float("inf"), str()
    
    for i in range(0, len(grid_values)):
        alpha = grid_values[i][0]
        beta = grid_values[i][1]
        gamma = grid_values[i][2]
        season = grid_values[i][3]
        del_column(test)
        try:
            actual, score_mape, score_rmse= evaluate_model(func, train, test,alpha, beta, gamma,season)
            if score_rmse < best_rmse:
                best_rmse, best_mape, best_alpha, best_beta, best_gamma, best_season = score_rmse, score_mape, alpha, beta, gamma,season
            print('alpha = %.2f beta = %.2f gamma = %.2f MAPE=%.5f RMSE=%.4f Model=%s' % (alpha, beta, gamma,score_mape,score_rmse, season))
        except:
            continue
    print('Best alpha = %.2f beta = %.2f gamma = %.2f MAPE=%.5f RMSE=%.5f Model = %s' % (best_alpha, best_beta, best_gamma,best_mape, best_rmse, season))
    

### Use the best parameter to run the model

In [None]:
models = dict()
models['ExponentialSmoothening'] = exp_forecast

# evaluate each model
for name, func in models.items():
    alpha = 0.8
    beta = 0.4
    gamma = 0.1
    season = 'mul'
    del_column(test)
    actual, score_mape, score_rmse= evaluate_model(func, train, test,alpha, beta, gamma,season)
    print('MAPE=%.5f RMSE=%.5f' % (best_mape, best_rmse))

In [None]:
actual

In [None]:
#actual.to_csv('holts_winter_monthly_forecast.csv')

## Exercise

1. Trend is also a parameter in `ExponentialSmoothing` model. However, we have not defined that in grid_values.  Modify the code to include trend in the grid search for ExponentialSmoothing Model.
2. Develop SES and Holt (DES) model by modifying the code in this notebook. 
3. Compare the MAPE for SES, DES and TES model. Which model will you go ahead with?