# Time Series Forecast - Additive or Multiplicative Model

### Kumar Rahul

Forecasting the demand of services or products leads to better management of short term or long term planning. In this case, we are looking at the warranty related issues reported, on a particular brand of two-wheeler. The data is a monthly roll-up of approximately half a million issues reported by the customers over a four year period. 
We will be using Claim forecasting data in this exercise. Refer the **Exhibit 1** to understand the feature list. Use the data and answer the below questions.

1.	Load the time series dataset in Jupyter Notebook using pandas.
2.	Plot the time series data to visualize trend and seasonality in the data.
3.	Decompose the claim data to report Trend, Seasonality, and irregular component. Find the Seasonality window using the claim data.

**Exhibit 1**

|Sl. No.|Name of Variable|Variable Description|
|----------|------------|---------------|
|1	|date	|Date of Claim|
|2	|rate	|Amount claimed|
|3	|item	|Number of claims|



In [None]:
# load and clean-up data
from numpy import nan
from numpy import isnan
from pandas import to_numeric


import pandas as pd
import numpy as np
import warnings
from math import sqrt
from numpy import split
from numpy import array
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
#import matplotlib.pylab as plt

import statsmodels.tsa.holtwinters as hw
from statsmodels.tsa.holtwinters import ExponentialSmoothing, SimpleExpSmoothing, Holt
import statsmodels.api as sm

In [None]:
monthly_raw_df = pd.read_csv('./data/data_monthly.csv', sep=',', header=0, infer_datetime_format=True, 
                             index_col=['date'], 
                             parse_dates= ['date'],dayfirst=True)

In [None]:
monthly_raw_df.sort_index(inplace=True)
monthly_raw_df.info()

The data for the first day and last day of CGM monitoring being trucated as it has not been captured for the full cycle.

In [None]:
monthly_filter_df = monthly_raw_df.filter(['rate'], axis =1)
monthly_filter_df['rate'] = monthly_filter_df['rate'].map(lambda x:str(x).replace(',', '')).astype(float)

In [None]:
monthly_filter_df = monthly_filter_df[(monthly_filter_df.index >='2014-03-01') & 
                                      (monthly_filter_df.index <= '2017-05-31')]

monthly_filter_df.info()

## Problem Framing


We will use the data to explore a very specific question; that is:

**Given recent claim, what is the expected claim for the next time period?**

Plot of the original data is shown below:

In [None]:
pyplot.figure(figsize = (18, 5))
pyplot.plot(monthly_filter_df, 'b-')
pyplot.title('Monthly amount claimed over a 3 year period')

## Addititive or Multiplicative model

Given the data how to know if the additivie model will apply or a multiplicative model will apply?

> ### Additive Model
Y_Predicted = **trend** + **seasonal** + **resid**


> ### Multiplicative Model
Y_Predicted = **trend** * **seasonal** * **resid**


To know this first of all we have to decompose the data into trend, seasonal and noise component.  We can do this using `seasonal_decompose` method from `statsmodels.api.tsa` but we will create it manually for understanding.

## Manual Method - Trend, Seasonality and Residual

### Trend Component

* **Trends**: The consistent (long term) upward or downward movement of peaks in the data. First the visual inspection of the monthly claim. Data seems to show spikes going up with time. In a way, suggesting that the average claim has gone up with time.

One of the ways to identify trends is to identify the seasonality window.

* **Seasonality window**: The idea of finding seasonality window is to find a window size `s`, for which if the rolling average is calculated for each time point(-s/2<p<s/2), the zigzag motion in the time series data smoothens out. The rolling average in the window `s` tries to smooth out noise and seasonality and what is observed is a pure trend.

The below function is being used for identifying the trend by computing the seasonality window.

In [None]:
def plot_rolling_stats(data, roll):
    #Determing rolling statistics
    rolmean = data.rolling(window=roll, center = True).mean()
    rolstd = data.rolling(window=roll, center = True).std()

    #Plot rolling statistics:
    pyplot.figure(figsize = (18, 5))
    orig = pyplot.plot(data, color='blue',label='Original')
    mean = pyplot.plot(rolmean, color='red', label='Rolling Mean')
    std = pyplot.plot(rolstd, color='black', label = 'Rolling Std')
    pyplot.legend(loc='best')
    pyplot.title('Rolling Mean & Standard Deviation')
    #pyplot.show(block=False)

In [None]:
decompose_df = monthly_filter_df.copy()
plot_rolling_stats(decompose_df,12)

In [None]:
decompose_df['trend'] = decompose_df.rolling(window=12, center = True).mean()
decompose_df = decompose_df.dropna()
decompose_df.head()

### Seasonal Component

* **Seasonality**: Seasonality, measured in terms of seasonality index, is fluctuations from the trend that occurs within a defined time period (seasons, quarters, months, days of the week, time interval within a day etc.).

    > * Substracting trend from the original data will lead to looking at the seasonal component alone. 
         1. If it is additive model then detrended data = original data - trend. 
         2. If it is a multiplicative model, then  detrended data = original data/trend.


PS: While using the pre-defiend function, detrended data is extraploated to fill for the missing value (due to smoothning using seasonality index) at the top and bottom using least sqaured extrapolation. 
         
Since we do not know if we should built the time series model using additive or multiplicative approach, let us detrend the data using both the techniques.

In [None]:
decompose_df['detrend_a'] =  decompose_df['rate'] - decompose_df['trend']
decompose_df['detrend_m'] =  decompose_df['rate']/decompose_df['trend']
decompose_df.head()

Adding the month column to decompose_df. This will be used in merge operation later on.

In [None]:
decompose_df['month'] = pd.DatetimeIndex(decompose_df.index).month
decompose_df.head()

### Seasonality...cont.

>        4. We will calclate period average now. Period average is the mean of the detrended data at a defined frequency (seasonality window). Frequency is an integer value that gives the number of periods per cycle (ex. 12 for monthly). NaNs are ignored in the mean. 
         5. If additive model, seasonality index  = period average - (mean of period average)
         6. If multiplcative model, seasonality index = period average/(mean of period average)
         7. seasonal component is obtained by filling/tiling the seasonality index across all timeperiods.

#### Additive Seasonal Component

In [None]:
period_average_additive = decompose_df.groupby(decompose_df.index.month)['detrend_a'].mean()
overall_avg_additive = period_average_additive.mean()

In [None]:
seasonal_component = pd.DataFrame(period_average_additive-overall_avg_additive)
#print(seasonal_component)

seasonal_component['month'] = (seasonal_component.index)
seasonal_component.head()

In [None]:
seasonal_component.rename(index = str, columns={'detrend_a': 'seasonality_a'},inplace=True)
seasonal_component.head()

#### Multiplicative Seasonal Component

In [None]:
period_average_mult = decompose_df.groupby(decompose_df.index.month)['detrend_m'].mean()
period_average_mult

In [None]:
overall_avg_mult = period_average_mult.mean()

In [None]:
seasonal_component['seasonality_m'] = (period_average_mult/overall_avg_mult).values
#print(seasonal_component)


seasonal_component.head()

In [None]:
merge_df=pd.merge(decompose_df,seasonal_component, on = 'month',how='left')
#merge.shape
merge_df.index = decompose_df.index
merge_df.head()

### Residual

>        8. If additive model,  resid = original data - seasonal - trend
         9. If multiplicative model,  resid = x / seasonal / trend

         Source: https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/seasonal.py


In [None]:
merge_df['residual_a'] = merge_df['rate'] - merge_df['detrend_a'] - merge_df['seasonality_a']
merge_df['residual_m'] = merge_df['rate']/ merge_df['detrend_m']/ merge_df['seasonality_m']

merge_df.head()

### Check the additive model

In [None]:
merge_df['check_a'] = merge_df['detrend_a'] + merge_df['seasonality_a'] + merge_df['residual_a']

merge_df.head()

In [None]:
decomposition = sm.tsa.seasonal_decompose(monthly_filter_df, model='additive') #,freq = decomfreq)

In [None]:
decomposition.seasonal.head(10)

### Check the multiplicative model

In [None]:
merge_df['check_m'] = merge_df['detrend_m'] * merge_df['seasonality_m'] * merge_df['residual_m']

merge_df.head()

### Select Model

We’re going to check how much correlation between data points is still encoded within the residuals. This can be done using Auto-Correlation Factor (ACF). We can use `acf` method from `statsmodels.tsa.stattools` for the calculation.


As some of the correlations could be negative we will select the type with the smallest sum of squares of correlation values.

In [None]:
import statsmodels.tsa.stattools as sts

In [None]:
def ssacf(x):
    return np.square(np.sum(sts.acf(x)))

In [None]:
def com_ssacf(add,mult):
    if ssacf(add) <= ssacf(mult):
        model_type = 'additive'
    else:
        model_type = 'multiplicative'
    return model_type    

In [None]:
com_ssacf(merge_df.residual_a,merge_df.residual_m)

### Exercise - Use seasonal_decompose method

Decompose into trend, seasonal and residual component using `seasonal_decompose` method and implement the steps carried out in above section (named Manual Method).

> 1. What difference do you observe in the `merge_df` created in above section and the one created by using `seasonal_decompose` method?
2. Which method will you choose to build the time series model; additive or multiplicative?

Sample step is given for you to head start your exercise.

In [None]:
decomposition = sm.tsa.seasonal_decompose(monthly_filter_df, model='additive') #,freq = decomfreq)

dplot = decomposition.plot()
dplot.set_figwidth(20)
dplot.set_figheight(10)
dplot.suptitle('Decomposition of time series - monthly claim data')
pyplot.show()