This script tests the stationarity of your time series data by performing the Augmented Dickey-Fuller (ADF) Test. First, the code reads in 1 of the 3 CSV files included in the DATA folder and performs the ADF test on the data that was read in. To determine whether the data is stationary, read the print statement that was outputted by the main code chunk that performs the ADF test.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller

In [2]:
# read in the data
cpu_util_df = pd.read_csv("/Users/christinetsai/Desktop/ds4002/02-time-series-project/ds4002-project2/DATA/ec2_cpu_utilization_53ea38.csv")
cpu_util_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4032 entries, 0 to 4031
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   timestamp  4032 non-null   object 
 1   value      4032 non-null   float64
dtypes: float64(1), object(1)
memory usage: 63.1+ KB


In [3]:
# perform the Augmented Dickey-Fuller test on the original series
result_original = adfuller(cpu_util_df["value"])

print(f"ADF Statistic (Original): {result_original[0]:.4f}")
print(f"p-value (Original): {result_original[1]:.4f}")

if result_original[1] < 0.05:
    print("Interpretation: The original series is Stationary.\n")
else:
    print("Interpretation: The original series is Non-Stationary.\n")

# Apply first-order differencing
cpu_util_df['val_diff'] = cpu_util_df['value'].diff()

# Perform the Augmented Dickey-Fuller test on the differenced series
result_diff = adfuller(cpu_util_df["val_diff"].dropna())
print(f"ADF Statistic (Differenced): {result_diff[0]:.4f}")
print(f"p-value (Differenced): {result_diff[1]:.4f}")
if result_diff[1] < 0.05:
    print("Interpretation: The differenced series is Stationary.")
else:
    print("Interpretation: The differenced series is Non-Stationary.")

ADF Statistic (Original): -9.8419
p-value (Original): 0.0000
Interpretation: The original series is Stationary.

ADF Statistic (Differenced): -17.2580
p-value (Differenced): 0.0000
Interpretation: The differenced series is Stationary.
