# Modern Time Series Analysis
### SciPy 2019 Tutorial
### Author: Aileen Nielsen

In [4]:
import inspect
import statsmodels.api as sm

# Box-Jenkins ARIMA Modeling (background)
- Developed early to mid 20th century
- AR, MA, etc. in same category
- Success and remains quite close to cutting edge performance
- Excellent performance on **small datasets**

**When using modern time series analysis, always want to check that your getting more than what you can get with ARIMA type models. OCCAM'S RAZOR**
- ARIMA is actually a pretty high bar

### ARIMA formula
A nonseasonal ARIMA model is classified as an "ARIMA(p,d,q)" model, where:
- p is the number of autoregressive terms,
- d is the number of nonseasonal differences needed for stationarity, and
- q is the number of lagged forecast errors in the prediction equation.
> ŷt   =   μ + ϕ1 yt-1 +…+ ϕp yt-p - θ1et-1 -…- θqet-q

### What's Missing from ARIMA
- Not especially intuitive
- No way to build in our underlying understanding about how it works
    - random walk element
    - cyclical element
    - external regressors 
- Some systems cycle more slowly or stochastically than can be easily described with an ARIMA model
    - Good performance on small datasets
        - 7 day forecasting with daily data
    - Typically bad performance with large
        - 7 day forecasting with hourly or minute data
        - Minute data with hourly forecasting

# State Space Models 

## Structural time series
- Can be expressed in ARIMA form
- Fit viam maximum likelihood/Kalman filter
- Largely developed in econometrics
- Offer insights into underlying structure
- Also possible to inject Bayesian analysis via priors on parameters 
- Easier to understand than ARIMA
    - **Can describe leveling, seasonality, and error, VISUALLY**
- Used in original Apollo missions
    - At each time, t, only have to update your state, not all timestamps


### What do they offer?
- **Filtering**: distribution of the current state at time t given all pervious measurements up to and including time t
    - Example: Kalman filter
- **Prediction**: the distribution of the future state at time t+k given all previous measurements up to and including time t
- **Smoothing**: the distribution of a given state at time k given all previous and futre measurements for 0 to T (last time)

### Components
- State
- Measurement
- Error

### Usecases
**Need some sort of hypothesis about dynamics of your system**
- Good use case: rocket tragectory
    - Why? Have physics to tie hypothesis to
- Bad use case: blanket stock forecast
    - Why? No underlying hypothesis 
    
### Evaluating State space
Use AIC 


### Applying Structural Models
Use: 

statsmodels ~ [UnobservedComponents](https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.structural.UnobservedComponents.html)

- Allows you to plug and play with vaious structural compenents
    - level
    - trend
    - seasonal
    - cycle
    - autoregressive
    - and much more
    
    
### When to Use structural models
Good for exploratory data analysis
**Be cautious to use in production**
- ie Don't fly an airplane or treat cancer with it
- Because so many knobs to turn
- Similar problem with ARIMA

Can use if move it to Bayesian model
- Have stronger inputs
- [Google API](https://github.com/dafiti/causalimpact)

## Hidden Markov Models (HMMs)
- State space model: observations are an indicator of underlying state
- Markov process: past doesn’t matter if present status is known
- Parameter estimation: Baum-Welch algorithm
- Smoothing/state labeling: Viterbi algorithm

#### Baum-Welch Algorithm for Determining Parameters
- Expectation-maximization parameter estimation:
    - Initialize parameters (with informative priors or randomly)
    - EM iterations:
        - Compute the expectation of the log likelihood given the data
        - Choose the parameters that maximize the log likelihood expectation
    - Exit when desired convergence is reached
- Guaranteed that the likelihood increases with each iteration
- BUT converges to a local maximum not a global maximum
- BUT can overfit the data

##### Baum-Welch Algorithm Details
Solve for the following:
1. a (alpha, forward) ~ transition matrix probability
    - from X_t-1 to X_t how likely is state going to change
2. b (beta, backward) ~ Whats prob of seeing a value at Y given a particular state at X
    - how likely is a particular Y assuming a particular underlying state
3. pi ~ priors
    - how likely are you to be at a particular starting point

##### Problems with Baum-Welch Algorithm or other HMM
- Again, many knobs to turn
- improve with every cycle but may only find local maxima
- often times need good domain knowledge to set boundaries

##### Pros of HMM
- Can account for sudden state change, unlike ARIMA and Statebased 

# Machine Learning for Time Series
General models applied to time series need feature generation to account for temporal aspect vs. traditional time series methods can get by with just univariate or multivariate data (no features) 

## Feature Generation
### Examples
- min
- max
- number peaks
- median
- mean
- etc.
### Difficulties
- with long dataset, can get tough and computationally expensive
- Peaks aren't always intuitive for computers
### How To
- well studies area
- [Catch22 canonical set](https://link.springer.com/article/10.1007/s10618-019-00647-x) is a good guide **FOR GENERAL TIME SERIES, WITHOUT DOMAIN KNOWELDGE**
- Can almost always do better than Catch22 with custom features if you have domain knowledge
    - ie with EKG data, there are specific features of EKG readings
    
#### Checking usabilty
- Always check whether your features are useful

## Time Series Classification and Forecasting

### Trees

### Random Forest

### xgboost

### Clustering

### Time series clustering
- Surprisingly difficult
    - Conceptually
    - Computational costs
    - Pitfall: Euclidean distance
- Used across many disciplines
    - Medicine
    - Finance
    - Chemistry
    - Etc
    
### Dynamic time warping
- Computationally intense

### Forcasting
Stationarity is not a requirement for ML forecasting, like with statistical time series methods, but it can help

#### Compare with Statistical and state based models
- Don't need to use temporal aspect of data
- Instead create features for every forecast at timestamp t

#### Scoring
With time series forecasting, no single metric will give you the full picture
- see example from NB 3: Trees for Classification and Prediction
    - RMSE looks equivalent for last two models, but major difference in actual fit

# Deep Learning for Time Series
No need to go into details
## Model Options 
### RNN
#### GRU
#### LSTM
### CNN
### LSTNet

# Other Options / Outlook

1. Look into automated forecasting with tech company open source libraries like Prophet

2. The future is a combining machine learning and statistical approaches, so good to have the stats background 