# Time Series 1

### September 14 | Week 9 Day 1
### Instructor: Monique Wong


## Agenda

1. What is time series?
* Common techniques: "statistical" and supervised machine learning
* Stastistical techniques:
    - Decomposition
    - Concept of stationarity
    - Box-Jenkins / ARIMA methods
    
Very little code today - the goal is to build intuition.

## What is a time series?
- When your data points are chronological (related to time), ordered
- We can use the time dimension to better make predictions and forecasts
- Same idea as before:
    - We're trying to find a model that fits the data well
    - This can be using statistical techniques or traditional machine learning

## Common methods

- Traditionally, the field if "time series" is about finding the best **statistical model** that fits the data
    - This is what we'll focus on today and tomorrow
    - The techniques we'll cover are **univariate**
        - We are using past data of a particular variable to forecast the future data of the same variable
        - E.g., Using past sales data for the last 7 years to forecast what sales will be for the next 3 months
- You can also deal with time series with **supervised learning**
    - We have to make sure that our train/test split "follow the rules of time"
    - Our features are past data at different time steps
    - Additional features can be added for **multivariate** use cases

## What are we trying to predict?
- **Point estimate:** e.g., exactly how many units are being sold
- **Confidence interval:** the 50%, 75%, 90% range of likely results

**Discussion:** What is more helpful? What are some use cases for predicting a range?

## Time Series Methods: Decomposition
- Aim to describe the seasonality and trend in the data

| Component | Definition |
|:- |:------- |
| Trend | Long-term increase or decrease in the data, does not have to be linear |
| Seasonal | Affected by seasonal factors like the time of the year or the day of the week, always a fixed and known frequency |
| Cyclic | Time series that rises and falls, but not of fixed frequency, measured in years. |

Let's look at some pictures.

## Discussion: what components do we see here?

<img src='imgs/trends-1.png'>

## Discussion: what components do we see?

| Chart | Seasonality | Cyclic | Trend |
|:-|:---|:---|:---|
| Monthly housing sales | Strong, within the year | Strong, 6-10 year cycles | No apparent trend |
| US treasury bill contracts | None | N/A | Obvious downward trend, may be a part of a season / cycle|
| Australian quarterly electricity production | Strong | No evidence | Strong increasing trend |
| Daily change in the Google closing stock price | None | None | None |

## The key idea: we can decompose each component

### 1. Additive model

$$ y_t = S_t + T_t + R_t$$


### 2. Multiplicative model

$$ y_t = S_t \times T_t \times R_t$$

Where: 
- $S_t$ is the seasonal component
- $T-t$ is the trend-cycle component
- $R-t$ is the remainder component (whatever is left)

### How to pick? 
- Whatever fits best.

## Decomposition: easier to see in pictures...

### Explaining the trend-cycle component
<img src='imgs/elecequip-trend-1.png'>

## Decomposition: easier to see in pictures...

### The decomposition itself
<img src='imgs/elecequip-stl-1.png'>

## Decomposition: final notes


### Briefly, the math
- Mathematically, there are different ways to do the decomposition
- They will result in slightly different results
- Some examplies: Classical, X11, SEATS, STL
- For more: https://otexts.com/fpp2/classical-decomposition.html


### Forecasting
- The idea here is simple
- Future projections is the addition or multiplication (depending on additive or multiplicative model) of the seasonal component with "everything else"
- "Everything else" will be modeled some other way - also called the seasonally-adjusted component
    - Many methods here: Holt, ARIMA
    - We'll cover ARIMA



## Break

We'll reconvene in ...

## Time Series Methods: ARIMA models (Box-Jenkins)
- Aim to describe the autocorrelations in the data
- **Autocorrelations**: fancy word for how the next data points are correlated with previous data points
- Cannot be applied to a time series with a seasonal component - **the seasonal component must be taken out first**


### Before we explain the two parts of ARIMA (AR and MA), we need to understand stationarity...
- AR (autoregressive) and MA (moving average) models only work with stationary data
- The "I" stands for integrated: this is the differencing needed to turn a seasonally-adjusted data to stationary data


## Concept of stationarity

This is the first thing we need to understand well. In essence:
> A stationary time series is one whose properties do not depend on the time at which the series is observed

In less formal words:
- There is no cyclical trend, seasonality trend, weekly trend etc.
- It's "white noise"

Let's take a look at some pictures.

## Which of these are stationary?
<img src='imgs/stationary-1.png'>

## Which of these are stationary - discussion

a. Google stock price for 200 consecutive days - long-term trend

b. Daily change in the Google stock price for 200 consecutive days - **could be stationary**

c. Annual number of strikes in the US - long-term trend

d. Monthly sales of new one-family houses sold in the US - some seasonality

e. Annual price of a dozen eggs in the US (constant dollars) - long-term trend

f. Monthly total of pigs slaughtered in Victoria, Australia - long-term trend

g. Annual total of lynx trapped in the McKenzie River district of north-west Canada - **could be stationary**

h. Monthly Australian beer production - some seasonality

i. Monthly Australian electricity production - some seasonality, long-term trend

Source: https://otexts.com/fpp2/stationarity.html


## Getting to stationary data

### At a high level:
- The data minus seasonal component = seasonally adjusted data
- Seasonally adjusted data with differencing = stationary data (hopefully)
- Now, we can use ARMA (notice, no I) models to model the stationary data

### What is differencing?
- Minusing out the last data point from this data point for every data point

$$ y_{t}' = y_t - y_{t-1}$$ 

More mathematical details can be found here: https://otexts.com/fpp2/stationarity.html


## Getting back to ARIMA: AR models
- Stands for autoregressive
- In essence, we are:
> forecasting future variables using a linear combination of past values of the variable
- Mathematically speaking:

$$ y_t = c + {\phi}y_{t-1} + {\phi}y_{t-2} + ... + {\phi}y_{t-p} + {\varepsilon}_t $$
- AR models are parametrized as $AR(p)$, where $p$ is called the "order" or the number of time steps to "learn from"
- When used to model stationary data, $\phi$ should be between -1 and 1
    - It should be viewed as "weights" for how much each previous time step influences the current time step
- $c$ is the mean of the data
- ${\varepsilon}_t$ is normally distributed white noise with mean of 0, variance of 1.

## AR models - some pictures

<img src='imgs/arp-1.png'>

## Second part: MA models
- Stands for moving average
- In essence, we are:
> forecasting future variables using past forecast error in a regression-like way
- Mathematically speaking:

$$ y_t = c + {\varepsilon}_t + \theta_1{\varepsilon}_{t-1} + \theta_2{\varepsilon}_{t-2} + ... + + \theta_q{\varepsilon}_{t-q}$$
- MA models are parametrized as $MA(q)$, where $q$ is the order or time steps to "learn from"
- For stationarity, $\theta$ should be between -1 and 1, also to be viewed as weights

## MA models - some pictures

<img src='imgs/maq-1.png'>

## Putting it all together: ARIMA

- The idea is with AR and MA processes put together, with differencing, we can find a fit for the seasonally-adjusted data
- An ARIMA model has 3 parameters:

$$ \text{ARIMA}(p, d, q) $$

- $p$ is the order for the AR component
- $d$ is the order for differencing
- $q$ is the order for the MA component

### Forecasting
- It's about finding the right parameters to fit the seasonally-adjusted data the best
- Then, we can add the seasonal component to make a prediction

## Supervised Learning (not covered in Compass)

- I sometimes find this much easier to understand (much less explainability though)
- Don't have time to cover, but check out the following resource

Resource: https://github.com/TomasBeuzen/machine-learning-tutorials/blob/master/ml-timeseries/notebooks/supervised_time_series_intro.ipynb