Time Series Analysis

A time series is a sequence of data points X = {x₁, x₂, ..., x_n} measured at fixed time intervals. From a machine learning point of view, forecasting a time series refer to the problem of estimating the unknown x_t+1 value given the known previous k values x_t, ..., x_t-k.

The nescience library provides the class TimeSeries to analyze and find optimal models for time series. Please, refer to the Reference API (TDB) for a list of supported families of models.

Auto-miscoding

Auto-miscoding (see Miscoding) allow us to estimate how relevant are the previous values of the time series to forecast future values. The auto-miscoding method compares the time series with a lagged version of itself, for multiple lag values. In this sense, auto-miscoding address the same problem than auto-correlation.

import numpy as np
from statsmodels.graphics.tsaplots import plot_acf
import matplotlib.pyplot as plt

Let's create a simple sinusoidal time series, constantly increasing in the mean and in the standard deviation:

data = np.array([x + np.sin(x) * 0.1 * x + np.random.randn() * 0.1 for x in range(1, 200)])
plt.plot(data)

Synthetic Time Series

Next code computes a classical auto-correlogram for this time series.

plot_acf(data)
plt.title("Autocorrelation")
plt.xlabel("Lag")
plt.ylabel("Correlation")
plt.show()

Auto-correlation

As we can observe, the auto-correlation tell us that beyond lag 15 the time series has no predictive power, which is not true, as we know from the original formula that generated the data. The problem is that auto-correlation is not defined for non-stationary time series (we need a constant mean and a constant standard deviaton in order to compute auto-correlation).

Let see how auto-miscoding applies to the same time series:

from nescience.timeseries import TimeSeries
ts = TimeSeries(auto=False)
ts.fit(data)
mscd = ts.auto_miscoding(max_lag=25)

plt.bar(x=np.arange(len(mscd)), height=mscd)
plt.xlabel("Lag")
plt.ylabel("Miscoding")
plt.title("Auto-miscoding")
plt.show()

Auto-miscoding

As we can see, auto-miscoding is able to recognize that the a lagged version of the time series is relevant to forecast future values, even for large values of the lag. Moreover, the auto-miscoding enphizies the sinusoidal component of the series.

Air Passengers dataset

>>> import pandas as pd
>>> air = pd.read_csv("AirPassengers.csv")
>>> ts = air["#Passengers"].values

A canonical example of time series is the air passagers dataset, composed by monthly totals of a US airline passengers from 1949 to 1960:

Air Passengers

This dataset has to be imported in the following way to be used with the fastautoml library:

>>> import pandas as pd
>>> air = pd.read_csv("AirPassengers.csv")
>>> ts = air["#Passengers"].values

In order to evaluate the quality of the predictions made with the fastautoml.AutoTimeSeries class, we will compare against a dummy model that as prediction for x_t+1 it just return the value x_t.

def dummy_score(ts):
    mean = np.mean(ts)
    u = np.sum([(ts[i] - ts[i-1])**2 for i in range(0, len(ts)-1)])
    v = np.sum([(ts[i] - mean)**2 for i in range(0, len(ts)-1)])
    score = 1 - u/v
    return score

If we apply our dummy model to the air passengers dataset we will get the following score:

>>> ground_truth(ts)
0.870694834815031

The same dataset modeled with the AutoTimeSeries class provide the following score:

>>> from fastautoml.fastautoml import AutoTimeSeries
>>> model = AutoTimeSeries()
>>> model.fit(data)
AutoTimeSeries()
>>> model.score(data)
0.9844207308319846

Supported Models

The following families of models are currently supported for the auto-time series part:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time Series Analysis

Time Series Analysis

Auto-miscoding

Air Passengers dataset

Supported Models

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally