# *Introduction to Time Series Analysis*

# 1. Definitions and Terms
## 1.1 Definitions

#### Time Series

[[WikipediaTS]](#references):
> A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.


#### Time Series Analysis

[[Nielsen2019]](#references):
> Time series analysis is the endeavor of extracting meaningful summary and statistical information from points arranged in chronological order. It is done to diagnose past behavior **as well as to predict future behavior**.

Some other authors draw distinction between time series **analysis** and time series **forecasting**. In [[Brownlee2019]](#references):
> We have different goals depending on whether we are interested in understanding a dataset or making predictions. Understanding a dataset, called time series analysis, can help to make better predictions, but is not required and can result in a large technical investment in time and expertise not directly aligned with the desired outcome, which is forecasting the future.

## 1.2 Terms

#### univariate time series

There is only one variable measured against time

#### multivariate time series

Series with multiple variables measured at each timestamp. They are particularly rich for analysis because often the measured variables are interrelated and show temporal dependencies between one another.

#### temporal

Relating to time

#### lookahead

The term lookahead is used in time series analysis to denote any knowledge of the future. You shouldn’t have such knowledge when designing, training, or evaluating a model.



# 2. Finding and Preparing Time Series Data

## 2.1 Where to Find Sample Time Series Datasets

- [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php) has about 80 time series datasets.
- [UEA & UCR Time Series Classification Repository](http://www.timeseriesclassification.com/) contains hundreds of time series datasets


## 2.2 Handling Missing Data

Common methods to handle missing data:

- ***forward fill***: carry forward the last known value prior to the missing one.
- ***backward fill***: propagate values backward
- ***imputation***: fill in missing data based on observations about the entire data set or with rolling mean/median.
- ***interpolation***: use neighboring data points to estimate the missing value. Interpolation can also be a form of imputation.
- ***removal***: not use time periods that have missing data at all, if you can afford to loose some data points.

Be aware that some methods such as backward fill, imputation, and interpolation introduce [lookahead](#lookahead) bias.

## 2.3 Upsampling and Downsampling

Downsampling is subsetting data such that the timestamps occur at a lower frequency than in the original time series. Upsampling is representing data as if it were collected more frequently than was actually the case. 

Easy with Panda's [resample()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.resample.html).

## 2.4 Smoothing

May be required, e.g. to eliminate spikes and/or error in measurements. Smoothing is done by (moving) averaging, often with weighted moving average method.

## 2.5 Seasonal Data

Plot the data with line chart to visualize seasonal data. 

Seasonality, along with level, trend, and residue, can be decomposed using Python's statsmodels [tsa.seasonal.seasonal_decompose()](http://www.statsmodels.org/stable/generated/statsmodels.tsa.seasonal.seasonal_decompose.html#statsmodels.tsa.seasonal.seasonal_decompose) function:

![seasonal decompose output](http://www.statsmodels.org/stable/_images/version0-6-1.png "Seasonal decomposition output")


Note on **seasonal vs cyclical**:
> **Seasonal time series** are time series in which behaviors recur over a fixed period. **Cyclical time series** also exhibit recurring behaviors, but they have a variable period. A common example is a business cycle, such as the stock market’s boom and bust cycles, which have an uncertain duration.

## 2.6 Preventing Lookahead

Unfortunately, there isn’t a definitive statistical diagnosis for lookahead. So the solution is to just be vigilant.

# 3. Time Series Exploratory Data Analysis

Exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.

General EDA methods that are done to general datasets are also applicable to time series datasets. Familiar techniques such as plotting, taking summary statistics, applying histograms, and using targeted scatter plots can be used to answer common questions about the data, such as:
- the columns that are available,
- their value ranges, 
- logical units of measurement that work best
- if any of the columns are strongly correlated with one another
- what is the overall mean of an interesting variable
- what is its variance

And also to answer questions specific to time series data, such as:
- What is the range of values you see, and do they vary by time period or some other logical unit of analysis
- Does the data look consistent and uniformly measured, or does it suggest changes in either measurement or behavior over time?

This chapter deals with concepts and methods that are specific to analyzing time series data, such as:
- stationarity
- window functions
- self-correlation
- spurious correlations

## 3.1 Stationarity

Generally speaking, a stationary time series is one that has fairly stable statistical properties over time, perticularly with respect to mean and variance. Stationarity is important because we need to know how much we should expect the system's long-term past behavior to reflect its long-term future behavior.

There are two types of stationarity:
- Weak stationarity requires only that the mean and variance of a process be time invariant.
- Strong stationarity requires that the distribution of the random variables output by a process remain the same over time.

When discuss stationarity in this article, we are referring to weak stationarity.

Stationarity matters in practice because a large number of models assume a stationary process, such as traditional models with known strengths and statistical models. Also a model of a non-stationary time series will vary in its accuracy as the metrics of the time series vary. For example, if a model is estimating the mean of the time series, then the bias and error in the model will vary over time with non-stationary time series, at which point the value of the model becomes questionable.

### 3.1.1 Stationarity tests

Statistical tests for stationary often come down to the question of whether there is a unit root in the process. A linear time series is nonstationary if there is a unit root, although lack of a unit root does not prove stationarity.

The Augmented Dickey-Fuller (ADF) test is the most commonly used metric to asses a time series for stationarity problems. This test's null hypothesis is that a unit root is present in a time series. Another test, called the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test posits a null hypothesis that the time series is stationary.

TODO ADF
TODO KPSS

### 3.1.2 Making time series stationary

Often time series can be made stationary enough with a few transformations such as:
- log and square root transformation to "fix" variance
- differencing to remove trend (however if you have to difference more than two or three times to make the time series stationary, probably differencing is not the solution)



## 3.2 Making time series normally distributed

Another common assumption that forecasting models make is that the data is normally distributed. The **Box Cox transformation** makes non0normally distributed data (skewed data) more normal. However, just because you can transform your data doesn't mean you should (e.g. the scale of distance changes between data points).



## 3.3 Window Functions

Window function is a common function that distinct to time series. Moving average is probably the most popular window function.

## 3.4 Self-correlation and Autocorrelation

Self-correlation idea is that a value in a time series at one given point in time may have a correlation to the value at another point in time. Not that "self-correlation" is being used here informally to describe a general idea rather than a technical one.

Autocorrelation generalizes self-correlation by not anchoring to a specific point in time (?). From Wikipedia:

> Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of the delay. Informally, it is the similarity between observations as a function of the time lag between them.

In other words, autocorrelation is how data points at different points in time are linearly related to one another as a function of their time difference.

### 3.4.1 The Autocorrelation Function (ACF)

The autocorrelation function (ACF) can be intuitively understood with plotting.

TODO ACF.

There are few important facts about the ACF:
- The ACF of a periodic function has the same periodicity as the original process.
- The autocorrelation of the sum of periodic functions is the sum of the autocorrelations of each function separately.
- All time series have an autocorrelation of 1 at lag 0.
- The autocorrelation of a sample of white noise will have a value of approximately 0 at all lags other than 0.
- The ACF is symmetric with respect to negative and positive lags, so only positive lags need to be considered explcitly.
- A statistical rule for determining a significant nonzero ACF estimate is given by a "critical region" with bounds at +/- 1.96 * sqrt(n). This rule relies on sufficiently large sample size and a finite variance for the process.

### 3.4.2 The Partial Autocorrelation Function (PACF)

Analyzing the same time series above with PACF:

TODO PACF.

As we can see the PACF removes the correlation with "harmonics" of significantly correlated time lag.

### 3.4.3 Sample ACF and PACF of Various Time Series

#### White Noise

TODO

#### Random Walk

TODO

#### Trend

TODO

## 3.5 Spurious correlations

Spurious correlation is when two time series have significant but nonsense correlation, for example correlation between number of people who are drowned by falling into a pool and number of films Nicolas Cage appeared in.

There are few factors that contribute to producing spurious correlations:
- Trend is a big factor.
- Seasonality -- for example, the spurious correlation between hot dog consumption and death by drowning (summer).
- Level or slope shifts in data from regime changes over time (producing a dumpbell-like distribution with meaningless high correlation).
- Cummulatively summed quantities.

### 3.5.1 Cointegration

A related concept is called *cointegration*, which refers to a real relationship between two time series. A commonly used example is a drunk pedestrian and their dog. Their individually measured walks might appear random taken alone, but they never stray too far from each other.

There will be high correlations in the case of cointegration. The difficulty will be in assessing whether the two processes are cointegrated or whether they have spurious correlations, because in both cases they have high correlations. The important difference is that there need not be any relationship in the case of a spurious correlation, whereas cointegrated time series are strongly related to one another.

## References

- [Brownlee2019] *Introduction to Time Series Forecasting with Python*, Jason Brownlee, v1.7, 2019
- [Nielsen2019]: *Practical Time Series Analysis*, Aileen Nielsen, ISBN: 9781492041658, Oct 2019
- [WikipediaTS]: *Time Series*, https://en.wikipedia.org/wiki/Time_series
- [WikipediaVAR]: *Vector autoregression*, https://en.wikipedia.org/wiki/Vector_autoregression
