In [1]:

import numpy as np
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from statsmodels.tsa.seasonal import seasonal_decompose

import warnings
warnings.filterwarnings('ignore')

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


In [2]:
!pip install calplot
!pip install statsmodels

In [3]:
from pylab import rcParams
import calplot
from sklearn.metrics import mean_squared_error
import statsmodels.api as sm
from math import sqrt
from statsmodels.tsa.holtwinters import SimpleExpSmoothing, ExponentialSmoothing

import matplotlib.pyplot as plt

In [4]:
train_df = pd.read_csv('/kaggle/input/daily-climate-time-series-data/DailyDelhiClimateTrain.csv')
test_df = pd.read_csv('/kaggle/input/daily-climate-time-series-data/DailyDelhiClimateTest.csv')

In [5]:
print('Train data shape:', train_df.shape)
print('Test data shape:', test_df.shape)

In [6]:
train_df.head()

In [7]:
# convert date to datetime
train_df['date'] = pd.to_datetime(train_df['date'])
test_df['date'] = pd.to_datetime(test_df['date'])

In [8]:
#set date as index
train_df = train_df.set_index('date')
test_df = test_df.set_index('date')

In [9]:
train_df['meantemp'].plot(figsize=(12,4))

There's seasonality present in the data yearly. Every year pattern is repeating.
Now let's plot ETS decomposition of the data.

In [10]:
ets_result = seasonal_decompose(train_df['meantemp'], model='additive')
rcParams['figure.figsize'] = (12,5)
ets_result.plot();

In [11]:
calplot.calplot(train_df['meantemp'], suptitle='Average delhi temperature in Yearly')

As we can see from heatmap, May and June month have high temperature because it's the summer time in India.

In this notebook, we will explore various smoothing techniques which we can perform on time series data.

Methods Cover:
1. Naive approach
2. Simple Average
3. Moving Average
4. Simple Exponential Smoothing
5. Holt's Method
5. Holt's Winter Method

## 1. Naive Approach

Naive approach simply take previous day value to forecast future dates.

$$y_{t+1}=y_{t}$$

This approach is helpful when the past target values are stable. 

In [12]:
dd= np.asarray(train_df.meantemp)
test_df['naive'] = dd[len(dd)-1]

In [13]:
plt.figure(figsize=(12,5))
plt.plot(train_df.index, train_df['meantemp'], label='Train')
plt.plot(test_df.index,test_df['meantemp'], label='Test')
plt.plot(test_df.index,test_df['naive'], label='Naive Forecast')
plt.legend(loc='best')
plt.title("Naive Forecast")
plt.show()

In [14]:
rms = sqrt(mean_squared_error(test_df.meantemp, test_df.naive))
print(rms)

## Simple Averaging

It takes average of all the past target values and that average will be use to forecast the future dates.

$$y_{t+1}=\frac{1}{x}\sum_{i=1}^{x}y_{i}$$

In [15]:
test_df['SA'] = train_df['meantemp'].mean()

In [16]:
plt.figure(figsize=(12,5))
plt.plot(train_df.index, train_df['meantemp'], label='Train')
plt.plot(test_df.index,test_df['meantemp'], label='Test')
plt.plot(test_df.index,test_df['SA'], label='SA Forecast')
plt.legend(loc='best')
plt.title("Simple Average Forecast")
plt.show()

In [17]:
rms = sqrt(mean_squared_error(test_df.meantemp, test_df.SA))
print(rms)

## Moving Average

In previous method, we have to take mean of all previous data but using all previous data doesn't sound right when there's sudden change in behavious in  previous data. Therefore for improvement we will take average of the target variable for last few time periods.

$$y_{t+1}=\frac{1}{p}(y_{t-1}+y_{t-2}+y_{t-3}+.....+y_{t-p})$$

In [18]:
train_df['7-SMA'] = train_df['meantemp'].rolling(window=7).mean()

In [19]:
train_df[['meantemp','7-SMA']].plot(figsize=(15,5))

In [20]:
test_df['SMA'] = train_df['meantemp'].rolling(window=7).mean().iloc[-1]

In [21]:
plt.figure(figsize=(12,5))
plt.plot(train_df.index, train_df['meantemp'], label='Train')
plt.plot(test_df.index,test_df['meantemp'], label='Test')
plt.plot(test_df.index,test_df['SMA'], label='SMA Forecast')
plt.legend(loc='best')
plt.title("Simple Moving Average Forecast")
plt.show()

## Simple Exponential Smoothing

In this, forecast are calculated using weighted average where weights decreases exponentially as observation come from further in past. The smallest weight associated with the oldest observation.

$$y_{t+1} = \alpha y_{t}+\alpha(1-\alpha)y_{t-1}+\alpha(1-\alpha)^2y_{t-2}$$

where:
$0<=\alpha<=1$ is a smoothing paramter

The above equation can be rewritten as :

$$y_{t+1|t} = \alpha*y_{t}+(1-\alpha)*y_{t|t-1}$$ 

In [22]:
span=7
alpha = 2/(span+1)

In [23]:
freq = pd.infer_freq(train_df.index)
train_df.index.freq = freq
test_df.index.freq = freq

In [24]:
model = SimpleExpSmoothing(train_df['meantemp'])
fitted_model = model.fit(smoothing_level=alpha, optimized=False)
train_df['SES7'] = fitted_model.fittedvalues

In [25]:
test_df['SES7'] = list(fitted_model.forecast(len(test_df)))

In [26]:
plt.figure(figsize=(12,5))
plt.plot(train_df.index, train_df['meantemp'], label='Train')
plt.plot(test_df.index,test_df['meantemp'], label='Test')
plt.plot(test_df.index,test_df['SES7'], label='SES7 Forecast')
plt.legend(loc='best')
plt.title("SES7 Forecast")
plt.show()

## Double Exponential Smoothing

Previous methods fails to capture other contributing factors like Trend and Seasonality.

Holt Winter's method able to capture seasonality. Holt Winter method comprises of the forecast equation and three smoothing equations:

|             | Component   | Smoothing Parameter|
| ----------- | ----------- |----------------------
| $l_{t}$     | Level       | $\alpha$           |
| $b_{t}$     | Trend       | $\beta$            |
| $s_{t}$     | Seasonality | $\gamma$           |


here equation will be:

$$l_{t} = (1-\alpha)l_{t-1}+\alpha x_{t}$$
$$b_{t} = (1-\beta)b_{t-1}+\beta(l_{t}-l_{t-1})$$
$$y_{t} = l_{t}+b_{t}$$
$$y_{t+1} = l_{t}+hb_{t}$$

where:
h-> number of periods in future

Here name is double exponential because we are using two parameters $\alpha$ and $\beta$

In [27]:
# double exponential smoothing additive model
des_model = ExponentialSmoothing(train_df['meantemp'], trend='add')
fitted_des_model = des_model.fit()
train_df['DES_add_7'] = fitted_des_model.fittedvalues

In [28]:
test_df['DESAdd7'] = list(fitted_des_model.forecast(len(test_df)))

In [29]:
plt.figure(figsize=(12,5))
plt.plot(train_df.index, train_df['meantemp'], label='Train')
plt.plot(test_df.index,test_df['meantemp'], label='Test')
plt.plot(test_df.index,test_df['DESAdd7'], label='DESAdd7')
plt.legend(loc='best')
plt.title("DES_add_7 Forecast")
plt.show()

In [30]:
test_df.head()

## triple Exponential Smoothing

With Triple Exponential Smoothing, we introduce a smoothing factor $\gamma$ that addresses seasonality.

$$l_{t} = (1-\alpha)l_{t-1}+\alpha x_{t}$$
$$b_{t} = (1-\beta)b_{t-1}+\beta(l_{t}-l_{t-1})$$
$$c_{t} = (1-\gamma)c_{t-L}+\gamma(x_{t}-l_{t-1}-b_{t-1})$$
$$y_{t} = (l_{t}+b_{t})c_t$$
$$y_{t+m} = (l_{t}+mb_{t})c_{t-L}+1+(m-1)modL$$

where:

L-> number of division/cycle
m-> number of periods in future

In [31]:
tes_mul_model = ExponentialSmoothing(train_df['meantemp'], trend='add',seasonal='add',seasonal_periods=7)
fitted_tes_mul_model = tes_mul_model.fit()
train_df['TES_add_7'] = fitted_tes_mul_model.fittedvalues

In [32]:
test_df['TES_add_7'] = list(fitted_tes_mul_model.forecast(len(test_df)))

In [33]:
plt.figure(figsize=(12,5))
plt.plot(train_df.index, train_df['meantemp'], label='Train')
plt.plot(test_df.index,test_df['meantemp'], label='Test')
plt.plot(test_df.index,test_df['TES_add_7'], label='TES_add_7')
plt.legend(loc='best')
plt.title("TES_add_7 Forecast")
plt.show()