# Lesson 3: Stationarity and Differencing

In this lesson, we discuss the concept of stationarity in time series data, why it is important, and how differencing can be used to achieve stationarity. We will also see how to test for stationarity using the Augmented Dickey-Fuller (ADF) test.

## What is Stationarity?

A time series is **stationary** if its statistical properties—such as mean, variance, and autocorrelation—remain constant over time. Stationarity is a key assumption for many time series models because it simplifies the modeling and forecasting process.

## Testing for Stationarity

One common method to test for stationarity is the **Augmented Dickey-Fuller (ADF) test**. The null hypothesis of the ADF test is that the series has a unit root (i.e., it is non-stationary). A p-value less than 0.05 generally indicates that the series is stationary.

## Creating a Non-Stationary Series

For demonstration purposes, we will create a synthetic non-stationary time series using a random walk process.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller

%matplotlib inline

# Set seed for reproducibility
np.random.seed(42)

# Generate a random walk (non-stationary series)
n = 100
steps = np.random.normal(loc=0, scale=1, size=n)
random_walk = np.cumsum(steps)  # cumulative sum to simulate a random walk

# Create a date range
dates = pd.date_range(start='2020-01-01', periods=n, freq='D')

# Build the DataFrame
df = pd.DataFrame({'Date': dates, 'Value': random_walk}).set_index('Date')

# Plot the non-stationary series
plt.figure(figsize=(12,6))
plt.plot(df, label='Random Walk')
plt.title('Non-Stationary Time Series (Random Walk)')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

## ADF Test on the Original Series

Now, let's apply the Augmented Dickey-Fuller test to our random walk to check for stationarity.

In [None]:
# Perform the ADF test
result = adfuller(df['Value'])
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])

if result[1] < 0.05:
    print('The series is likely stationary.')
else:
    print('The series is likely non-stationary.')

## Differencing to Achieve Stationarity

Differencing is a common technique used to remove trends and stabilize the mean of a time series. The **first-order difference** is calculated as the difference between consecutive observations. Let's compute and plot the first-order differenced series.

In [None]:
# Compute the first-order difference
df['Value_diff'] = df['Value'].diff()

# Plot the differenced series
plt.figure(figsize=(12,6))
plt.plot(df['Value_diff'], label='First-Order Differenced Series', color='orange')
plt.title('Differenced Time Series')
plt.xlabel('Date')
plt.ylabel('Difference in Value')
plt.legend()
plt.show()

## ADF Test on the Differenced Series

After differencing, we reapply the ADF test to check if the series has become stationary.

In [None]:
# Remove missing values created by differencing
df_diff = df['Value_diff'].dropna()

# Perform the ADF test on the differenced series
result_diff = adfuller(df_diff)
print('ADF Statistic (Differenced): %f' % result_diff[0])
print('p-value (Differenced): %f' % result_diff[1])

if result_diff[1] < 0.05:
    print('The differenced series is likely stationary.')
else:
    print('The differenced series is likely non-stationary.')

## Conclusion

In this lesson, we:

- Discussed the concept of stationarity and its importance in time series analysis.
- Created a synthetic non-stationary series (a random walk).
- Applied the Augmented Dickey-Fuller test to assess stationarity.
- Used differencing to transform a non-stationary series into a stationary one.

Achieving stationarity is a crucial step before modeling and forecasting time series data.