# Trend

In [2]:
import numpy as np
import pandas as pd
from plotnine import *


## Deterministic trend
- able to define it using a well defined math function 

#### linear
$$ trend = a + b \cdot time $$
- a - intercept
- b - expected change in consecutive period 

#### exponential
$$ trend = e^{a + b \cdot time} $$

- can be made linear by taking log on both sides
$$ log(trend) = log(e^{...}) = a + b \cdot time $$

- several measures that increase exponentially -> GDP, hence log(GDP)

#### trend stationary ts
- trend stationary: if timeseries has deterministic trend


In [3]:
import os
os.getcwd()

'/Users/matejuhrin/repo/ThinkBayes2/ts'

In [4]:
import numpy as np
import pandas as pd


# https://github.com/vcerqueira/blog/blob/main/data/gdp-countries.csv
series = pd.read_csv('data/gdp-countries.csv')['United States']
series.index = pd.date_range(start='12/31/1959', periods=len(series), freq='Y')


In [23]:
gdp = series.dropna()
log_gdp = np.log(gdp).dropna()

In [24]:
linear_trend = np.arange(1, len(log_gdp) + 1)

In [25]:
linear_trend

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
       52, 53, 54, 55, 56, 57, 58, 59, 60, 61])

endog - endogenous vars
exog -

In [26]:
from statsmodels.tsa.arima.model import ARIMA

arima = ARIMA(endog=log_gdp, order = (2, 0, 1), exog=linear_trend)
result = arima.fit()
result.summary()



0,1,2,3
Dep. Variable:,United States,No. Observations:,61.0
Model:,"ARIMA(2, 0, 1)",Log Likelihood,148.405
Date:,"Fri, 11 Aug 2023",AIC,-284.81
Time:,23:04:26,BIC,-272.145
Sample:,12-31-1959,HQIC,-279.846
,- 12-31-2019,,
Covariance Type:,opg,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,27.4105,0.287,95.651,0.000,26.849,27.972
x1,0.0502,0.013,3.970,0.000,0.025,0.075
ar.L1,1.9835,0.025,78.386,0.000,1.934,2.033
ar.L2,-0.9868,0.026,-38.161,0.000,-1.037,-0.936
ma.L1,-0.8343,0.125,-6.668,0.000,-1.080,-0.589
sigma2,0.0004,9.17e-05,4.321,0.000,0.000,0.001

0,1,2,3
Ljung-Box (L1) (Q):,1.47,Jarque-Bera (JB):,14.62
Prob(Q):,0.23,Prob(JB):,0.0
Heteroskedasticity (H):,1.32,Skew:,-1.07
Prob(H) (two-sided):,0.55,Kurtosis:,4.09


## Stochastic trends
- can change randomly
- ts with stoch trends are diff stationary

In [27]:
# random walk
rw = np.cumsum(np.random.choice([-1, 1], size=1000))

### Test stochastic trend
- ad fuller test 
- H0: There is unit root - ts non stationary
- H1: There is no unit root - ts stationary

In [38]:
from statsmodels.tsa.stattools import adfuller

p_adfuller = adfuller(x=log_gdp.dropna(), regression='ct')[1] # ct constant and trend
p_adfuller

1.0

In [29]:
help(adfuller)

Help on function adfuller in module statsmodels.tsa.stattools:

adfuller(x, maxlag: 'int | None' = None, regression='c', autolag='AIC', store=False, regresults=False)
    Augmented Dickey-Fuller unit root test.
    
    The Augmented Dickey-Fuller test can be used to test for a unit root in a
    univariate process in the presence of serial correlation.
    
    Parameters
    ----------
    x : array_like, 1d
        The data series to test.
    maxlag : {None, int}
        Maximum lag which is included in test, default value of
        12*(nobs/100)^{1/4} is used when ``None``.
    regression : {"c","ct","ctt","n"}
        Constant and trend order to include in regression.
    
        * "c" : constant only (default).
        * "ct" : constant and trend.
        * "ctt" : constant, and linear and quadratic trend.
        * "n" : no constant, no trend.
    
    autolag : {"AIC", "BIC", "t-stat", None}
        Method to use when automatically determining the lag length among the
      

## KPSS test
- has reversed hypotheses compared to adfuller

In [30]:
from statsmodels.tsa.stattools import kpss

pval_kpss = kpss(log_gdp.dropna(), regression = 'ct')[1]

pval_kpss

look-up table. The actual p-value is smaller than the p-value returned.



0.01

### How many diffs to make series stationary
- ndfiss functions

In [32]:
from pmdarima.arima import ndiffs

# how many differencing steps are needed for stationarity?
diffs_required = ndiffs(log_gdp, test='adf')

diffs_required

2

In [39]:
log_gdp_diff2 = log_gdp.diff().diff().dropna()

p_adfuller_diff2 = adfuller(log_gdp_diff2, regression='ct')[1]

p_adfuller_diff2

2.060913767697398e-05