### Forecasting, Time Series and Regression by Bowerman, O'Connell, Koehler

# Overview

Box-Jenkins methodology is a four step iterative procedure
1. **Tentative Identification**: historical data are used to tentatively identify an appropriate Box-Jenkins model
2. **Estimation**: Historical data are used to estimate the parameters of the tentatively identified model.
3. **Diagnostic checking**: Various diagnostics are used to check the accuracy of the tentatively identified model and, if needed, to suggest an improved model which is then regarded as a new tentively identified model.
4. **Forecasting**: Once a final model is obtained, it is used to forecast future time series values.

Classical B-J models describe stationary time series.

# 9.1 Stationary and Nonstationary Time Series

A time series is **stationary** if the statistical properties (ex, mean and variance) are practically constant over time.

Taking first differences often makes it stationary.

If first differences isn't good enough, take seconds

**Second difference** $= z_t = (y_t - y_{t-1}) - (y_{t-1} - y_{t-2}) = y_t - 2y_{t-1} + y_{t-2}$

So, since we are maniulating the data, we'll come up with a new name for the outputed data.

**Working Series** $= z_b, z_{b+1}, z_{b+2}, ..., z_n$

If the original data are non-stationary and nonseasonal, it will usually be stationary by the second difference. Data that is non-stationary and seasonal can be harder and result in more complex equations.

# 9.2 Sample Autocorrelation and Partial Autocorrelation (SAC and SPAC)

## SAC

This is a listing or graph of the sample autocorrelation function at lages k = 1, 2, ...

**SAC at lag k** $$= r_k =  \displaystyle\frac{\sum_{t=b}^{n-k} (z_t - \bar{z})(z_{t+k} - \bar{z})}{\sum_{t=b}^n (z_t - \bar{z})^2}$$

$$\bar{z} =  \displaystyle\frac{\sum_{t=b}^n z_t}{(n-b+1)}$$

This measures the linear relationship between time series observations separated by a lag of k times. $r_k$ will always be between -1 and 1. Close to 1 means that observations separated by a lag of k have a tendency to move in a linear fashion with a positive slope. Same with -1, but with a negative slope.

$$t_{r_k}-statistic = \displaystyle\frac{r_k}{s_{r_k}}$$

**A spike at lag k** exists in the SAC if $r_k$ is statistically large or $t_{r_k}$ is greater than two.

The time series **cuts off after lag k** if there are no more spikes at lags greater than k in the SAC.

The time series **dies down** if it just decreases in a steady fashion. This is kind of up to the viewer. It can die down in three different ways.
1. A damped exponential fashion (no oscillation or with oscillation)
2. A damped sign wave
3. A combination of the two above.

It can also die down *fairly quickly* or *extremely slowly*.

### Using SAC to find a Stationary Nonseasonal Time Series

In general, it can be shown that:
1. If the SAC of the working series (already differenced if desired) wither cuts off very quickly or dies down fairly quickly, then it is stationary. Usuallycut off after a lag that is >= 2.
2. If the SAC of the working series dies down extremely slowly, it should be considered non stationary.

If SAC is found to be non stationary, do another difference and try again.

## SPAC

Is $r_{kk}$

Has different formulas that I don't want to write right now. It can be thought of as the SAC with lag k with the effects of the intervening observation eliminated.

# 9.3 Intro to Nonseasonal Modeling and Forecasting

Once the time series has been transformed into stationary time series values, we can use SAC and SPAC to identify a B-J model. Two useful types are:
1. Autoregressive models
2. Moving average models

**Random Shock** $a_t$ is the term that will describe the effect of all factors other than $z_{t-1}$ on $z_t$. It's a value that is assumed to have been randomly selected fro a narmal distribution with mean of zero and variance that is the same for every time period (or stationary). Also, the random shock at every point in time is considered to be statistically independent of each other.

Even though there are many B-J models, each is characterized by its **theoretical autocorrelation function (TAC)** and its **theoretical partial autocorrelation function (TPAC)**
1. **TAC** of a model is a listing of the theoretical autocorrelations $p_1, p_2, ...$ of time series observations described.
2. **TPAC** is a listing of the theoretical partial autocorrelations $p_{11}, p_{22},...$.

**THE SAC AND SPAC ARE ESTIMATES OF THE TAC AND TPAC**