# Time Series Analysis with R - Chapter 2

### Cal Johnson
#### Analytics Engineer
#### AO AAE, USCC

## Chapter 2 - Correlation

*Covariance* is the linear association between two variables. Note that a linear association between two variables *does not imply one causes the other*.

Exercising some correlation functions in R on a carbon monoxide and benzoapyrene doncentrations sample taken from Herald Square in Manhattan (Colucci and Begeman, 1971):

In [3]:
# Bring in the data and attach it so we can 
www <- "Herald.dat"
Herald.dat <- read.table(www, header = TRUE)
attach(Herald.dat)

### Calculating covariance in three ways:

In [4]:
# First way, with n - 1
x <- CO; y <- Benzoa; n <- length(x)
sum((x - mean(x)) * (y - mean(y))/(n - 1))

In [5]:
# Second way, with n
mean((x - mean(x)) * (y - mean(y)))

In [6]:
# Third way, with cov() function
cov(x,y)

Note that the second example for calculating covariance is the definition of the expectation of covariance.

Also, think about the difference in the above methods for calculating covariance and the denominators (*n* versus *n-1*). As *n* becomes larger, the results of both will asymptotically approach the same value (or, asymptotically approach the unbiased estimate, as the text says).

*Correlation* is the measure of linear association between two variables and is obtained by standardizing the covariance by dividing it by the standard deviations of the variables.

In [8]:
# The correlation between CO and benzoapyrene measurements at Herald Square
# from definition and from cor() function:
cov(x,y) / (sd(x)*sd(y))

In [11]:
cor(x,y)

### Ensemble and Stationary

Interestingly enough, the mean function of a time series model is a function of time (go figure), and is described as:

u(t) = E(x_t)

It turns out that the expectation of this is the average across the *ensemble* of all possible time series that could be produced by the model, and the ensemble is the entire population. This implies simulating more than one time series with a single time series model.

Estimates can be made for the apparent trend and seasonal effects and then removed, using the decompose() function, to get the time series of the random component. This opens up constant mean time series models to be used.

If the mean function is constant, the we say the model is *stationary* in the mean (*duh!*).

If the mean of the time average for a single series tends towards the ensembe mean as time marches on, and is stationary in the mean, then we say the mean is *ergodic*, or time average is independent of the starting point.