# Chapter 1

### Time Series

- Correlation = root of r-squared
- Correlation between time series:
    - if 2 different stocks are trending, their correlation is high even if they do not show same pattern
    - Correct way : find correlation between the stock returns instead (eg: correlation between daily percentage change of two stocks)
- Predicting future points using regression : dependent time series = independent time series * slope + intercept + error
- Auto-correlation:
    - correlation between a time series with a lagged version of itself
    - an "echo" that exists in all points in a time series with other points in the past
    - eg: 1,2,3,4,5,6,7 in this series second number = first number + 1, third number = second number + 1.. this exist for all points
    - Negative autocorrelation = mean reverting
        - stocks have historically negative autocorrelation over weeks
        - strategy to make money : buy down -> sell up
    - Positive autocorrrelation = momentum
        - commodities and currencies have historically positive autocorrelation over months
        - strategy to make money : buy up -> sell down
    - An autocorrelation graph 
        - shows how many past points (lags) can we use to predict the future (including the present point).
        - Shows suitable model for prediction
- White Noise
    - constant mean over time
    - constant variance over time
    - 0 autocorrelation at all lags
    - Gaussian White Noise : the white noise has gaussian distribution and show bell curve
- Random Walk and White noise
    - Stock market follow a random walk, and so the return (gain or percent change) is white noise (Yesterday price - Today price = noise)
    - You cannot forecast a random walk. The best guess : todays price is same as yesterdays price
    - random walk with drift = random walk + mean (drift)
    - So, although we cannot forecast a random walk, we can guess the direction of the walk with the value of drift
    - How do we make sure if a series is rendom walk?
        - Dickey Fuller Test : You can test if a series is random walk
        - Augmented Dickey Fuller Test : Test if a series is random walk with more than one lags through augmentation
- Stationarity
    - Strong stationarity : Entire distribution of data is time invariant
    - Weak stationarity : mean, variance and autocorrelation of data are time invariant
    - stationary data is easy to model due to less number of parameters
    - non-stationary data is hard to model due to large number of parameters (new parameters found for each point in time)
    - eg: stock price is non-stationary. reason : price of today will differ from price of 10 years into the future
    - eg: white noise is stationary. reason : mean, variance and auto-correlation of 100 data is same as 1000 data points
    - non-stationary to stationary : may require several transformations like:
        1. log transformation
        2. take the difference between current and a lagged version of itself (the right lag = look at acf graph)
    - Regression model:
        1. AR model : 
            - Theory : The next value should retain some information from the previous value
            - todays value = mean + co-efficient * yesterday's value + error (y = mx + c)
            - co-efficient = phi. Negative phi = mean reversion, positive phi = momentum
            - phi = 0 for random walk (high autocorrelation) , phi = 1 for white noise (no autocorrelation) , -1 < phi < +1 for stationary series
            - autocorrelation decays exponentially at a rate of phi
        1. MA model : todays value = mean + co-efficient * yesterday's value + error
            - todays value = mean + co-efficient * yesterday's value + error (y = mx + c)
- Partial auto-correlation : 
    - incremental benefit of adding another lag
    - quantifies how significance adding n-th lag is when there is already (n-1)th lag
- Information Criteria : adjusts penalties on number of parameters in the model. The best model has least AIC or/and BIC model among the peers.

```

df['num_col'].autocorr() # autocorrelation value
# Plot ACF and PACF graph
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plot_acf(df['num_col'], lags= 20, alpha=0.05) # alpha = 1 - confidence interval
plot_pacf(df['num_col'], lags= 20, alpha=0.05)
from statsmodels.tsa.stattools import acf
acf(df['num_col']) # See acf values
# White noise
import numpy as np
noise = np.random.normal(loc=0, scale=1, size=500)
# Dickey Fuller test for random walk
from statsmodels.tsa.stattools import adfuller
adfuller(df['num_col'])
# Pure AR Series generation
from statsmodels.tsa.arima_process import ArmaProcess
phi = 0.9
ar = np.array([1, -phi])
ma = np.array([1])
AR_object = ArmaProcess(ar, ma)
simulated_data = AR_object.generate_sample(nsample=1000)
plt.plot(simulated_data)
# ARIMA modeling 
from statsmodels.tsa.arima.model import ARIMA
model = ARIMA(data, order=(1,0,0)) # (AR, diff, MA)
result = model.fit()
forecast = result.get_forecast(steps=50) # Forecast next 50 values
print(result.summary())
print(result.params) # Returns constant meu, and co-efficient phi
print(result.aic, result.bic) # AIC and BIC values of the model
# Plotting forecast
from statsmodels.graphics.tsaplots import plot_predict
fig, ax = plt.subplots()
data.plot(ax=ax)
plot_predict(result, start='2012-09-27', end='2012-10-06', alpha=0.05, ax=ax)
plt.show()
# Visualize best model : AR or MA values on X axis and AIC, BIC on Y axis
plt.plot(ar_values, aic_values, label='AIC', marker='o')
plt.plot(ar_values, bic_values, label='BIC', marker='o')

```

### Correlation between values vs Correlation between percent changes


<center><img src="images/01.01.png"  style="width: 400px, height: 300px;"/></center>


### Positive and Negative Autocorrelation

<center><img src="images/01.02.png"  style="width: 400px, height: 300px;"/></center>
<center><img src="images/01.03.png"  style="width: 400px, height: 300px;"/></center>


# Chapter 2

### Autocorrelation Examples

<center><img src="images/02.01.png"  style="width: 400px, height: 300px;"/></center>
<center><img src="images/02.02.png"  style="width: 400px, height: 300px;"/></center>


### White Noise : A perfect example of stationary time series

<center><img src="images/02.03.png"  style="width: 400px, height: 300px;"/></center>


### Random Walk and Dicky-Fuller Test

<center><img src="images/02.04.png"  style="width: 400px, height: 300px;"/></center>
<center><img src="images/02.05.png"  style="width: 400px, height: 300px;"/></center>


### Non-stationary time series

<center><img src="images/02.06.png"  style="width: 400px, height: 300px;"/></center>
<center><img src="images/02.07.png"  style="width: 400px, height: 300px;"/></center>
<center><img src="images/02.08.png"  style="width: 400px, height: 300px;"/></center>


### Transformation : non-stationary to stationary

<center><img src="images/02.09.png"  style="width: 400px, height: 300px;"/></center>
<center><img src="images/02.10.png"  style="width: 400px, height: 300px;"/></center>
<center><img src="images/02.11.png"  style="width: 400px, height: 300px;"/></center>


# Chapter 3

### AR series with different phi

<center><img src="images/03.01.png"  style="width: 400px, height: 300px;"/></center>

### Effect of phi on Autocorrelation

<center><img src="images/03.02.png"  style="width: 400px, height: 300px;"/></center>


### AR model with multiple lags

<center><img src="images/03.03.png"  style="width: 400px, height: 300px;"/></center>


### PACF of AR with different lags

<center><img src="images/03.04.png"  style="width: 400px, height: 300px;"/></center>
