# Modeling Time Series
by: Esteban Martinez Roldan

A time series is a sequence of observed data, measured or recorded at regular intervals over time. These data are organized in a chronological sequence and are used to analyze patterns, trends, and variations that may occur as a function of time. Time series are applied in various fields such as economics, finance, meteorology, epidemiology, and engineering, among others, to study behavior and forecast future values based on historical data.

Time series are represented as follows:

$X_{t}, X_{t-1}, X_{t-2},...,X_{t-n}$,     current time (t) and lags (t-i)

$X_{t+1}, X_{t+2},...,X_{t+n}$,            advances in the series (t+i)



Time series can be classified according to their characteristics into:

- **Deterministic**: These series follow a predictable pattern or can be modeled exactly using a mathematical function or a specific set of rules. In this context, "deterministic" implies that there is no randomness in the time series and that each future value can be precisely determined from previous values and the rules of the model.

- **Stochastic or Random**: These are series in which at least part of the behavior or future values cannot be predicted exactly, even if all past data and model rules are known. In other words, there is an element of randomness or uncertainty that influences the future values of the time series.


Let's Start with the Basics of Modeling Time Series. This includes stationary series, random walks, and the Dickey-Fuller stationarity test. If these terms seem intimidating, don't worry; the intention is to understand the interpretation and intuition behind them in a very simple way. However, even though it is not mentioned in the text, it is important to handle null values, impute missing values while respecting the data distribution, and finally, choose the best model for making forecasts.


# Stationary Series
**1.** There are three basic criteria to classify a series as stationary: The mean of the series should not be a function of time, but rather a constant. The image below shows a graph on the left that satisfies this condition, while the red graph has a time-dependent mean.

![Estacionario_media.jpg](Estacionario_media.jpg)


**2.** The variance of the series should not be a function of time. This property is known as homoscedasticity. The following graph shows what is and what is not a stationary series. (Note the variable distribution in the graph on the right).


![Estacionario_varianza.jpg](Estacionario_varianza.jpg)

**3.** The covariance of the $i_{th}$ term and the $(i + m)$ $t_{h}$ term should not be a function of time. In the following graph, you will notice that the spread narrows as time increases. Therefore, the covariance is not constant over time for the "red series."

![Estacionario_covariana.jpg](Estacionario_covariana.jpg)

### Why Does the "Stationarity" of a Time Series Matter?
The stationarity of a time series is a fundamental characteristic that significantly affects how it is analyzed, modeled, and forecasted. The main features are:

-**Facilitates Modeling:** A stationary time series has statistical properties that do not change over time, such as mean and variance. This simplifies the mathematical and statistical modeling process because methods and techniques developed for stationary series are more effective and reliable. For example, models like ARIMA (Autoregressive Integrated Moving Average) are specifically designed for stationary series.

-**More Accurate Predictions:** When a time series is stationary, the relationships between past and future data are more stable and predictable. This allows for more accurate predictions of the series' future values. Models based on stationary series tend to yield better results in terms of prediction accuracy compared to models applied to non-stationary series.

-**Clearer Interpretation of Trends and Patterns:** Stationarity removes the influence of trends and non-seasonal variations, allowing for a clearer interpretation of the underlying patterns in the data. This is crucial for better understanding the dynamics of the phenomenon being studied and for making informed decisions based on evidence.

-**Validation of Assumptions and Models:** Many statistical and econometric models assume that the time series is stationary or can be transformed into a stationary series using appropriate techniques. Validating stationarity helps ensure that the underlying assumptions in the model are valid, which increases confidence in the conclusions derived from the analysis.

Stationarity is important because it directly affects the effectiveness of statistical models and the accuracy of predictions in time series analysis. By identifying whether a series is stationary, analysts can choose the most appropriate tools and techniques to perform robust analyses and obtain reliable results.

In cases where the stationarity criterion is violated, the first step is to transform the time series to make it stationary and then attempt stochastic models to predict this time series. There are multiple ways to achieve stationarity, including detrending, differencing, and decomposition, among others.


## Random Walk
This is the most basic concept of time series. You might have heard the term somewhere, as it is frequently mentioned. However, I found that many people in the industry interpret a random walk as a stationary process. In this section, with the help of some mathematics, we will try to explain this concept.

Example: Imagine a girl moving randomly on a giant chessboard. In this case, the girl's next position depends only on her last position. 


![caminata_alearoria.jpg](caminata_alearoria.jpg)


Now imagine you are sitting in another room and cannot see the girl. You want to predict the girl's position over time. How accurate will you be? Of course, it becomes increasingly inaccurate as the girl's position changes.

At $t = 0$ you know exactly where the girl is. The next time, she can only move to 8 squares, so her probability drops to $1/8$ instead of $1$ and continues to decrease. Now let's try to formulate this series:



$X_{t} = X_{t-1} + \epsilon_{t}$
                                                                                        
where $\epsilon_{t}$ is the error at time  $t$, and it represents the randomness that the girl introduces at each moment.

Now, if we fit recursively into all the $X$, we will eventually end up with the following equation:


$X_{t} = X_{0} + \sum_{i=1}^t\epsilon_{i}$
                                                       

Now, let's try to validate our assumptions of stationary series in this random walk formulation:

### Is the mean constant?

We know that the expectation of any error will be zero, as it is random. Therefore, we have:

$E[X_{t}] = E[X_{0}] = \text{Constante}$.

#### Is the variance constant?

$Var[X_{t}] = Var[X_{0}] + \sum_{i=1}^t Var[\epsilon_i]$

$Var[X_{t}] = t * \sigma_{\epsilon}^2 = \text{Time dependent}$.

Therefore, we infer that the random walk is not a stationary process, as it exhibits time-varying variance. Additionally, if we check the covariance, we see that it also depends on time. We already know that a random walk is a non-stationary process.


### What makes time series special?

As the name suggests, Time Series (TS) are a collection of data points collected at constant time intervals. These are analyzed to determine long-term trends in order to forecast the future or perform some other form of analysis. But how does a TS differ from a regular regression problem? There are 2 things:

It is time-dependent: Therefore, the basic assumption of a linear regression model that observations are independent does not hold in this case.

They typically exhibit increasing or decreasing trends: Most TS exhibit some form of trends or seasonality, meaning specific variations within a particular timeframe. For example, if we observe the sales of a wool jacket over time, we will invariably find higher sales during the winter seasons.


### Dickey-Fuller Stationarity Test

The Dickey-Fuller test is a statistical test used to determine whether a time series is stationary or non-stationary. The Dickey-Fuller test is based on a regression model where the dependent variable is the time series itself, $X_{𝑡}$. The model considers the difference between consecutive values of the time series, $\Delta X_{t}=X_{t}-X_{t-1}$, as a measure of trend or non-stationarity.

The hypothesis test is formulated as follows:


- $H_{0}:\ \delta = 0$.The time series has a unit root, indicating it is non-stationary (i.e., it has a non-stationary structure or trend).

- $H_{1}\ \delta <1$. The time series is stationary, meaning it does not have a unit root and is stationary in the weak sense.


The Dickey-Fuller test assumes that the series behaves like an $AR(1)$ model (where the model uses only one past value to predict future values). This can be represented mathematically as follows:


$X_{t} = u + \phi_{1} X_{t-1} + e_{t}$,

Al resta $X_{t-1}$ en ambos lados de la ecuación, se tiene:


Subtracting $X_{t-1}$ from both sides of the equation yields:

$\Delta X_{t} = u +  \delta X_{t-1} + e_{t}$


Therefore, the statistical test can be formulated as follows:

$t_{\hat{\delta}} = \frac{ \hat{\delta}}{SE(\hat{\delta})}$


The test was developed by Robert Dickey and Thomas Fuller in 1979. There are two versions of the test: a Standard version and an Augmented version (more commonly used). One of the differences is that the Augmented version can be applied to longer series and can handle series with missing values. The purpose of the test is to either reject or confirm the presence of unit roots, which helps in starting the modeling process or applying transformations to achieve stationarity in the series.

## References

* https://en.wikipedia.org/wiki/Dickey%E2%80%93Fuller_test 

* https://www.analyticsvidhya.com/blog/2021/06/statistical-tests-to-check-stationarity-in-time-series-part-1/

* https://medium.com/@ritusantra/tests-for-stationarity-in-time-series-dickey-fuller-test-augmented-dickey-fuller-adf-test-d2e92e214360