## Discovering Patterns in the Russian Housing Market for Analysis and Prediction
### 3804ICT Assignment Part I | Forecasting | Trimester 2, 2019

Joshua Russell (s5057545) | joshua.russell2@griffithuni.edu.au


Joshua Mitchell (s5055278) | joshua.mitchell4@griffithuni.edu.au


Hayden Flatley (s5088623) | hayden.flatley@griffithuni.edu.au

### Introduction

> Introduce the algorithm/technique in pretty good detail

What is forecasting?

We want to look at how the market will behave in the future so that we have the opportunity to capitalise on the market's future state. 

Why we're using exponential smoothing instead of moving average.( our data is based on dates over multiple years )

To do this we need to be able to capture patterns within the time series that take place over both long or short time periods. The moving average method only takes into account the trend of the data and thus are only relevant for applications with little volitality and no seasonal pattern. Our data includes the day, month and year for year house transactions and spans over multiple years thus we expect to find and analyse seasonal behaviour.
For this we use a method known as holt winters or triple exponential smoothing that has the ability to model seasonality and trend.

Simple exponential smoothing

Initially we start off with single exponential smoothing that exponentially decreases the influence of past points using the parameter alpha, which is representative of the weighted moving average method. Alpha is the smoothing coefficient that determines the rate of exponentially decreasing influence of historical values. Greater alpha values mean greater rates of decay. The basic equation is:
\begin{equation*}
L_t = \alpha y_{t-1} + (1 - \alpha)L_{t-1}
\end{equation*}
If we expand it becomes a summation of all previously predicted values.
\begin{equation*}
L_t = \alpha\sum_{i=1}^{t-2}(1-\alpha)^{i-1}y_{t-i} + (1-\alpha)^{t-2}L_2
\end{equation*}
From these equations we can see that they're reliant on actual data points that to calculate the next point in the series. This is why single exponential smoothing gives poor forecasting results, as we hold the value constant (known as flat prediction) from the last calculated value for points further into the future. This is also why it's referred to as the level component since it offsets our values by a constant amount off from the x-axis. When using SES it's assumed that the time series has no underlying patterns.

Double expoenential smoothing

Double exponential smoothing introduces the ability to model trend with a new equation and modifies the existing level equation as follows:

\begin{equation*}
L_t = \alpha y_t + (1 - \alpha)\hat y_{t-1}
\end{equation*}
\begin{equation*}
T_t = \beta(L_t - L_{t-1}) + (1 - \beta)T_{t-1}
\end{equation*}

The level equation has changed to include the value from the previous trend and level components at t - 1 as our resulting prediction the addition of both level and trend components is:
\begin{equation*}\hat y_{t+h} = L_t + h T_t\end{equation*}
The term h is a discrete value to determine the number of steps to predict ahead of our current position in time t heading in the direction of our estimated gradient. Our second equation is an estimated of the trend or gradient with emphasis on more recent points by exponentially decaying historic points similarly to our previous single exponential smoothing equation. Now that we have multiple equations we can also choose different variations of joining our components. This includes additive models where we assume the trend component stays constant as the level changes or a multiplicative model where trend changes with the level. We can also add dampening to reduce the influence of trend as we predict further into the future, this becomes useful we don't expect future growth to continue from the end our data. (phi is our dampening parameter).

\begin{equation*}Additive: \hat y_{t+1} = L_t + h T_t \end{equation*}
\begin{equation*}Additive Damped: \hat y_{t+1} = L_t + (\phi + \phi^2 + ... + \phi^h) T_t\end{equation*}
\begin{equation*}Multiplicative: \hat y_{t+1} = L_t T_t^h \end{equation*}
\begin{equation*}Additive Damped: \hat y_{t+1} = L_t T_t^{(\phi + \phi^2 + ... + \phi^h)}\end{equation*}

Something we must also consider is the estimation of our initial level and trend components, as they don't have past values to draw from. Our implementation simply uses the value for our first data point as the intial level and trend is the average difference between the first 3 points.

Holt Winters

Finally, we must add a seasonality component to model patterns which occur repeatedly after a certain number of seasonal periods. Our data is set over multiple years, from 2011 to 2016.

Additive and multiplicative seasonal models
    Additive assumes that the seasonality stays constant while multiplicative assumes that the seasonality is proportional to the local deseasonalised mean level.
    
Training the model by changing the parameters alpha, beta and gamma. We can do this by minimising the mean square error.

What information does it provide?

How is it useful?


### Preprocessing

> Import library modules, load datasets, preprocess datasets for task

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

train_df = pd.read_csv("../Data/train.csv")
test_df = pd.read_csv("../Data/test.csv")

df = pd.concat([train_df, test_df], sort=False)
df = df[["timestamp", "price_doc"]]

In [10]:
df = pd.to_datetime(df["timestamp"])
df.dtypes

timestamp     object
price_doc    float64
dtype: object

### Library Solution

> Implement algorithm/technique using libraries

### Manual Solution
> Implement algorithm/technique manually (numpy is okay)

### Results and Metrics
> Provide experimental results on the dataset. Provide a comparison between the library solution and the manual solution. Graphs and tables and text. 