# Time Series Analysis
### PyCon 2017
### Author: Aileen Nielsen

# Self Correlation

## Autocorrelation Function
- Used to help identify possible structures of time series data
- Gives a sense of how different points in time relate to each other in a way explained by temporal distance
- Need to have a standard frequency of data, not sparse
### Problem that can arise
- Does not account for periodic or seasonal data
- Does not account for trending

### ACF plotting
- Sample 0 will always have correlation of 1 (with itself)
- Boundaries of significance
- **Periodic data will not have a good lag cut off**
### Problem that can arise
- Does not account of trending

## Partial Autocorrelation Function
- “gives the partial correlation of a time series with its own lagged values, controlling for the values of time series at all shorter lags”
### Why would this be useful?
- Does not recycle correlation of shorter time period, like autocorrelation does
- better accounts for periodicity
- **Takes out the shorter correlation**, then looks for additional correlations

# Pre-Prediction Munging & Stationarity  
- Most time series are not naturally stationary
    - contain seasonality
    - contrain trending
- Standard and statistic methods require stationary data for forecasting

## Stationarity
- Need to remove the trend and seasonal elements before forecasting
- Most data in the real world shows trends and seasonality
- Most models require data that shows neither of these properties to say something interesting 

Elements of Stationarity:
1. Constant Mean
    - Value is not drifting over time
2. Constant Variance
    - Unpredictability is not changing over time
3. Constant Autocorrelation
    - Structure is not changing over time
### Caveat
- Not realistic in the real world but can be true locally

### Handling non-stationarity
#### Remove Trending
1. Take a diff or log diff to take out trending
2. Take a moving average (with window function) and subtract it out
    - computationally expensive
    - different value subtracted at every point, so difficult to explain, re-transform, etc.
3. Take linear regression
    - less complex than moving average, b/c only 1 variable to re-transform
    - less effective than moving average, b/c will not always fully detrend
        - may require a send level transform which is non-optimal

#### Remove Seasonality 
Seasonality can be additive or multiplicative.
- In the real work, mostly see multiplicative
1. Average de-trended values for specific season
    - **simplest**
2. Use 'loess' method (locally weighted scatterplot smoothing)
    - **most common**
    - Window of specified width is placed over the data
    - A weighted pregression line or curve is fitted to the data, with points closest to center of curve having greatest weight
    - Weighting is reduced on points farthest from regression line/curve and calculation is rerun several times
    - This yield one point on loess curve
    - Helps reduce impact of outlier points 
    - **Computationally taxing**, but typically how seasonality is dealt with

#### Remove increasing variance
1. Power transformation
2. Log transformation

#### Remove autocorrelation
**DON'T**, at least typically don't try and do this
- it means you have fundamentally different processes and shouldn't remove them

#### Most Important Factors to Consider when transforming for stationarity
- **Try and use transformations that are 1 to 1**
    - That way it's easy to get back to original solution space
- It is okay to have multiple transforms if you can keep track of it, AND it is necessary

### Validate Stationarity
Standard practice is to use the Dickey-Fuller Test
- Test the null hypothesis of whether a unit root is present in an autoregressive model
> Y_t = ρ*Y_t-1 + u_t
test whether ρ = 1
- The test gives back several values to help you assess significance with standard p-value reasoning
- **Basic Intuition: ρ having unit value means it's not stationary** 


# Forecasting
Once we have stationarity, can move onto forecasting (at least for statistical methods)

## Moving Average (MA)
- Defined as having the form:
> X_t = μ + ε_t + θ_1*ε_t-1 +…+ θ_q*ε_t-q
> μ is the mean of the series
> θ are parameters
> θ_q not 0
- This is a stationary process regardless of values of θ
- Consider an MA(1) process (centered at 0):
> X_t = ε_t + θ_1*ε_t-1
- **Essentially MA says you oscolate around a mean, μ, with error**

## Autoregressive Process (AR)
- Defined as having the form:
> X_t = φ_1*X_t-1 +…+ φ_pX_t-p + ε_t
- This is a stationary process if abs(φ) < 1
- Consider an AR(1) process:
> X_t = φ+1*X+t-1 + ε+t
- **Essentially AR says your past value has something to do with your present value**

## ARIMA Model (AKA Box Jenkins)
- AR = autoregressive terms
- I = differencing
    - **Just detrends the data**
    - Optional 
- MA = moving average
- Hence specified as (autoregressive terms, differencing terms, moving average terms)
- **Note that we don't have to take out trends in our data using ARIMA, since the model can do it for us**
    - **BUT** you have to specify the differencing
    
### Summary of ARIMA
ARIMA MODE: ‘THE MOST GENERAL CLASS OF MODELS FOR FORECASTING A TIME SERIES WHICH CAN BE MADE TO BE STATIONARY
- Statistical properties (mean, variance) constant overt time
- ‘its short-term random time patterns always look the same in a statistical sense’
- Autocorrelation function & power spectrum remain constant over time
- Ok to do non-linear transformations to get there
- ARIMA model can be viewed as a combination of signal ad noise
- Extrapolate the signal to obtain forecasts

### Applying the Appropriate ARIMA Model
- Need to determine what ARIMA model to use
- Use plot of the data, the ACF, and the PACF
- With the plot of the data: look for trend (linear or otherwise) & determine whether to transform data
- Most software will use a maximum likelihood estimation to determine appropriate ARIMA parameters

#### Simplified
1. Use PACF for AR model diagnostics
2. Use ACF for MA model diagnostics

### Drawback of ARIMA
When thinking about the model...
- Number of timestamps able to forecast is limited by the number of lags used
    - X lags == X forecast duration
        - Otherwise **ERROR WILL COMPOUND and ERROR BOUNDS WILL BE HUGE**
    - **Do not give into just increasing the order of model, b/c can lead to overfitting**
- Also keep in mind that you assume a stationary mean, so don't know the ground truth
    
### Application
#### General Steps
1. remove trending
    - remember that can try accounting for it with ARIMA, I, term
2. remove seasonality
    - sometimes okay to proceed with seasonality just to get a feeling for the data and initial models
3. find partial auto regressive term (used for AR)
    - use your knoweldge of the data to interpret
    - try and be parsimonious 
4. find auto regressive term (used for MA)
    - use your knoweldge of the data to interpret
    - try and be parsimonious

### What if you don't take out seasonality?
#### Use a seasonal ARIMA
Decompose model into two components
1. seasonal
- has it's own AR and MA terms
2. non seasonal
- has it's own AR and MA terms

Resource links: 
1. https://otexts.com/fpp2/seasonal-arima.html
2. https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html

#### And another model ontop of ARIMA

### Confidence intervals for ARIMA
https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima_model.ARIMAResults.conf_int.html