-
Notifications
You must be signed in to change notification settings - Fork 6
Time Series Analysis
A time series is a sequence of data points X = {x1, x2, ..., xn} measured at fixed time intervals. From a machine learning point of view, forecasting a time series refer to the problem of estimating the unknown xt+1 value given the known previous k values xt, ..., xt-k.
The nescience library provides the class TimeSeries to analyze and find optimal models for time series. Please, refer to the Reference API (TDB) for a list of supported families of models.
Auto-miscoding (see Miscoding) allow us to estimate how relevant are the previous values of the time series to forecast future values. The auto-miscoding method compares the time series with a lagged version of itself, for multiple lag values. In this sense, auto-miscoding address the same problem than auto-correlation.
import numpy as np
from statsmodels.graphics.tsaplots import plot_acf
import matplotlib.pyplot as pltLet's create a simple sinusoidal time series, constantly increasing in the mean and in the standard deviation:
data = np.array([x + np.sin(x) * 0.1 * x + np.random.randn() * 0.1 for x in range(1, 200)])
plt.plot(data)
Next code computes a classical auto-correlogram for this time series.
plot_acf(data)
plt.title("Autocorrelation")
plt.xlabel("Lag")
plt.ylabel("Correlation")
plt.show()
As we can observe, the auto-correlation tell us that beyond lag 15 the time series has no predictive power, which is not true, as we know from the original formula that generated the data. The problem is that auto-correlation is not defined for non-stationary time series (we need a constant mean and a constant standard deviaton in order to compute auto-correlation).
Let see how auto-miscoding applies to the same time series:
from nescience.timeseries import TimeSeries
ts = TimeSeries(auto=False)
ts.fit(data)
mscd = ts.auto_miscoding(max_lag=25)plt.bar(x=np.arange(len(mscd)), height=mscd)
plt.xlabel("Lag")
plt.ylabel("Miscoding")
plt.title("Auto-miscoding")
plt.show()
As we can see, auto-miscoding is able to recognize that the a lagged version of the time series is relevant to forecast future values, even for large values of the lag. Moreover, the auto-miscoding enphizies the sinusoidal component of the series.
>>> import pandas as pd
>>> air = pd.read_csv("AirPassengers.csv")
>>> ts = air["#Passengers"].valuesA canonical example of time series is the air passagers dataset, composed by monthly totals of a US airline passengers from 1949 to 1960:

This dataset has to be imported in the following way to be used with the fastautoml library:
>>> import pandas as pd
>>> air = pd.read_csv("AirPassengers.csv")
>>> ts = air["#Passengers"].valuesIn order to evaluate the quality of the predictions made with the fastautoml.AutoTimeSeries class, we will compare against a dummy model that as prediction for xt+1 it just return the value xt.
def dummy_score(ts):
mean = np.mean(ts)
u = np.sum([(ts[i] - ts[i-1])**2 for i in range(0, len(ts)-1)])
v = np.sum([(ts[i] - mean)**2 for i in range(0, len(ts)-1)])
score = 1 - u/v
return scoreIf we apply our dummy model to the air passengers dataset we will get the following score:
>>> ground_truth(ts)
0.870694834815031The same dataset modeled with the AutoTimeSeries class provide the following score:
>>> from fastautoml.fastautoml import AutoTimeSeries
>>> model = AutoTimeSeries()
>>> model.fit(data)
AutoTimeSeries()
>>> model.score(data)
0.9844207308319846The following families of models are currently supported for the auto-time series part: