# Forecasting Video Game Sales

## The Business Problem

You recently started working for a company as a supply chain analyst that creates and sells video games. Many businesses have to be on point when it comes to ordering supplies to meet the demand of its customers. An overestimation of demand leads to bloated inventory and high costs. Underestimating demand means many valued customers won't get the products they want. Your manager has tasked you to forecast monthly sales data in order to synchronize supply with demand, aid in decision making that will help build a competitive infrastructure and measure company performance. You, the supply chain analyst, are assigned to help your manager run the numbers through a time series forecasting model.

You’ve been asked to provide a forecast for the next 4 months of sales and report your findings.

## Plan Your Analysis

#### 1. Does the dataset meet the criteria of a time series dataset? Make sure to explore all four key characteristics of a time series data.

Initial findings of the time series showed a complete series exhibiting the 4 key characteristics of time series data. The series is over a continuous time interval, of sequential measurements
across that interval, using equal spacing between every two consecutive measurements and each time unit within the time interval has at most one data point.
The data collected is composed of monthly sales data dating back to 2008 and going until September 2013.

#### 2. Which records should be used as the holdout sample?

In preparation for construction of predictive models, the last 4 records (2013-06 to 2013-09) have been filtered out as a holdout sample. This holdout sample is used to check the accuracy of the model to forecast predicted values against the actual values.

## Determine Trend, Seasonal, and Error components

#### 1. What are the trend, seasonality, and error of the time series? Show how you were able to determine the components using time series plots. Include the graphs.

The initial findings of the time series plot shows an upward rising trend with a regularly occurring spike in sales each year reported at the end of the year. This pattern shows that there is seasonality in the time series. There are no patterns within the series suggesting cyclicity.

<img src="img/ts1.png" width="300"/>

<div align="center">
    Figure 1 - Time Series Plot
</div>

The decomposition plot shows the time series broken down into its three components: trend, seasonal and the error. Each of these components makes up the time series and helps confirming what was present in the initial time series plot. 

<img src="img/dc1.png" width="300"/>

<div align="center">
    Figure 2 - Decomposition Plot showing data, seasonal, trend and the error
</div>

The trend line is confirmed as upward trending.
The seasonal portion shows that the regularly occurring spike in sales each year changes in magnitude, ever so slightly. Having seasonality suggests that any ARIMA models used for analysis will need seasonal differencing. 
The change in magnitude suggests that any ETS models will use a multiplicative method in the seasonal component.
The error plot of the series presents a fluctuations between large and smaller errors as the time series goes on. Since the fluctuations are not consistent in magnitude then we will apply error in
a multiplicative manner for any ETS models.

## Build your Models

#### 1. What are the model terms for ETS? Explain why you chose those terms. Describe the in-sample errors. Use at least RMSE and MASE when examining results

From the decomposition plot the necessary information to define the terms for the ETS model are obtained.
The trend line exhibits linear behavior which results in the need for an additive method.
The seasonality changes in magnitude each year so a multiplicative method is necessary.
The error changes in magnitude as the series goes along so a multiplicative method will be used.
This leaves an **ETS(M, A, M)** model.

**Error Terms:**

Two key components to look at are the RMSE, which shows the in-sample standard deviation, and the MASE which can be used to compare forecasts of different models. One can see that the variance is about 33000 units around the mean.
The MASE shows a fairly strong forecast at 0.36 with its value falling well below the generic 1.00, the commonly accepted MASE threshold for model accuracy.

#### 2. What are the model terms for ARIMA? Explain why you chose those terms. Graph the Auto-Correlation Function (ACF) and Partial Autocorrelation Function Plots (PACF) for the time series and seasonal component and use these graphs to justify choosing your model terms. Describe the in-sample errors. Use at least RMSE and MASE when examining results. Regraph ACF and PACF for both the Time Series and Seasonal Difference and include these graphs in your answer and show that the graphs have no autocorrelated lag anymore.

Since there are seasonal components found in the time series, an ARIMA(p, d, q)(P, D,Q)S model for forecasting.

**Time Series ACF and PACF:**

The ACF presents slowly decaying serial correlations towards 0 with increases at the seasonal lags. Since serial correlation is high it is neccessary to seasonally difference the series.

<img src="img/ts2.png" width="500"/>

<div align="center">
    Figure 3 - Time Series ACF and PACF
</div>

**Seasonal Difference ACF and PACF:**

The seasonal difference presents similar ACF and PACF results as the initial plots without differencing, only slightly less correlated. In order to remove correlation it is neccessary to difference further.

<img src="img/sd1.png" width="500"/>

<div align="center">
    Figure 4 - Seasonal Difference ACF and PACF
</div>

**Seasonal First Difference:**

The seasonal first difference of the series has removed most of the significant lags from the ACF and PACF so there is no need for further differencing. The remaining correlation can be accounted for using autoregressive and moving average terms and the differencing terms will be d(1) and D(1).

The ACF plot shows a strong negative correlation at lag 1 which is confirmed in the PACF. This suggests an MA(1) model since there is only 1 significant lag. The seasonal lags (lag 12, 24,etc.) in the ACF and PACF do not have any significant correlation so there will be no need for seasonal autoregressive or moving average terms.

<img src="img/sd2.png" width="500"/>

<div align="center">
    Figure 5 - Seasonal First Difference ACF and PACF
</div>

Therefore the model terms for the ARIMA model are **ARIMA(0, 1, 1)(0, 1, 0)[12]**

**Error Terms:**

The ACF and PACF results for the ARIMA(0, 1, 1)(0, 1, 0)[12] model shows no significantly correlated lags suggesting no need for adding additional AR() or MA() terms.

<img src="img/ac1.png" width="300"/>

<div align="center">
    Figure 6 - Sufficient ARIMA model
</div>

Two key components to look at are the RMSE, which shows the in-sample standard deviation, and the MASE which can be used to compare forecasts of different models. The data shows that the variance is about 37000 units around the mean.
The MASE shows a fairly strong forecast at 0.36 with its value falling well below the generic 1.00, the commonly accepted MASE threshold for model accuracy.

## Forecast

#### 1. Which model did you choose? Justify your answer by showing: in-sample error measurements and forecast error measurements against the holdout sample.

When comparing the two in-sample error measures used, the RMSE and MASE, they show very similar results. The ETS model does have a narrower standard deviation but only by a few thousand units.

Further investigation shows that the MAPE and ME of the ARIMA model are lower than the ETS. This suggests that, on average, the ARIMA model misses its forecast by a lesser amount.

When looking at the model’s ability to predict the holdout sample, one can see that the ARIMA model has better predictive qualities in just about every metric.

Therefore, the ARIMA model is used to produce the forecast.

<img src="img/am1.png" width="300"/>

<div align="center">
    Figure 7 - Accuracy Measures to Compare the ETS vs. the ARIMA model
</div>

#### 2. What is the forecast for the next four periods? Graph the results using 95% and 80% confidence intervals.

Forecast results using 95% and 80% confidence intervals:

<img src="img/fc1.png" width="500"/>

<div align="center">
    Figure 8 - Forecast Results
</div>

<img src="img/fc2.png" width="500"/>

<div align="center">
    Figure 9 - Forecast Chart
</div>