# Time series interpolating with sktime

Suppose we have a set of time series with different lengths (number of points), but fully describing the same event.
To use sktime methods time series need to be converted to the same length (number of points). In this tutorial we will show you how to cope with it by sktime internal utils.

In [7]:
import random
import numpy as np
import pandas as pd

from sklearn.pipeline import Pipeline
from sktime.datasets import load_basic_motions
from sktime.transformers.series_as_features.compose import ColumnConcatenator
from sktime.classification.compose import TimeSeriesForestClassifier

# Ordinary situation

Here is a normal situation, when all time series have same length - we load a trial dataset from sktime api and train a classifier.

In [9]:
X_train, y_train = load_basic_motions(split='TRAIN', return_X_y=True)
X_test, y_test = load_basic_motions(split='TEST', return_X_y=True)


steps = [
    ('concatenate', ColumnConcatenator()),
    ('classify', TimeSeriesForestClassifier(n_estimators=100))]
clf = Pipeline(steps)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

1.0

# If time serial are unequal length -> algorithm raises an error

Now we are going to spoil the dataset a little bit by randomly cutting the time series - this leads to different time series lengths. 

Consequently we have an error while attept to train a classifier.

In [10]:
# randomly cut the data series
def random_cut(df):
    for row_i in range(df.shape[0]):
        for dim_i in range(df.shape[1]):
            ts = df.at[row_i, f'dim_{dim_i}']
            df.at[row_i, f'dim_{dim_i}'] = pd.Series(ts.tolist()[:random.randint(len(ts)-5, len(ts)-3)]) # here is a problem

In [16]:
X_train, y_train = load_basic_motions(split='TRAIN', return_X_y=True)
X_test, y_test = load_basic_motions(split='TEST', return_X_y=True)
            
for df in [X_train, X_test]:
    random_cut(df)
    
try:
    steps = [
        ('concatenate', ColumnConcatenator()),
        ('classify', TimeSeriesForestClassifier(n_estimators=100))]
    clf = Pipeline(steps)
    clf.fit(X_train, y_train)
    clf.score(X_test, y_test)
except ValueError as e:
    print(f"IndexError: {e}")

IndexError: Tabularization failed, it's possible that not all series were of equal length


# Now the interpolator enrolls

Now we use our interpolator to resize time series of different lengths to user-defined length. Inside it linear interpolator from scipy fits and samples on user-defined number of points equidistantly. 

After that classifier successfully  trains on the dataset.

In [19]:
from sktime.transformers.series_as_features.interpolate import TSInterpolator 

X_train, y_train = load_basic_motions(split='TRAIN', return_X_y=True)
X_test, y_test = load_basic_motions(split='TEST', return_X_y=True)
            
for df in [X_train, X_test]:
    random_cut(df)
    
steps = [
    ('transform', TSInterpolator(50)),
    ('concatenate', ColumnConcatenator()),
    ('classify', TimeSeriesForestClassifier(n_estimators=100))]
clf = Pipeline(steps)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

1.0