# InceptionTimeClassifier
This is a basic example of working with the sklearn library. \
A UCR dataset is loaded and used to train an InceptionTimeClassifier model. \
The model performance is then checked on testing data, and achieves ~96.5% accuracy. 

Written by Nils Odin 2023-09-23

**Good to know:**
- *Some UCR/UEA archive datasets are available through the sktime library but more can be added manually*
- *Transformations are available and can be applied during training (check sktime docs)*
- *sktime should be pretty much fully compatible with the sklearn library*


**Development:**
- *2023-09-23: Proof of concept, working example*
- *TODO: Add some basic data augmentation och check their performance on smaller datasets* 


**Resources:**
- *Tutorial that this file is based on: https://www.sktime.net/en/stable/examples/02_classification.html*
- *Install instructions for sktime: https://www.sktime.net/en/stable/installation.html#development-versions*
- *InceptionTimeClassifier documentation: https://www.sktime.net/en/stable/api_reference/auto_generated/sktime.classification.deep_learning.InceptionTimeClassifier.html*

In [10]:
# make some imports 
import warnings
warnings.filterwarnings("ignore")  # hide some annoying deprication warnings

import numpy as np
import pandas as pd

In [11]:
# select the italian power demand dataset (ID 38 in the UCR/UEA archive)
from sktime.datasets import load_italy_power_demand

# get dataset in pandas multiindex mtype
X_train, y_train = load_italy_power_demand(split="train", return_type="pd-multiindex")
X_test, y_test = load_italy_power_demand(split="test", return_type="pd-multiindex")

# rename columns for some more clarity
X_train.columns = ["total_power_demand"]
X_train.names = ["day_ID", "hour_of_day"]
print("Training data shape is", X_train.shape)
X_train.head()

Training data shape is (1608, 1)


Unnamed: 0_level_0,Unnamed: 1_level_0,total_power_demand
Unnamed: 0_level_1,timepoints,Unnamed: 2_level_1
0,0,-0.710518
0,1,-1.18332
0,2,-1.372442
0,3,-1.593083
0,4,-1.467002


In [12]:
# Import the Inception time classifier from sktime and check its default parameters
from sktime.classification.deep_learning import InceptionTimeClassifier
ITC = InceptionTimeClassifier()
ITC.get_params()    # Check default parameters

{'batch_size': 64,
 'bottleneck_size': 32,
 'callbacks': None,
 'depth': 6,
 'kernel_size': 40,
 'loss': 'categorical_crossentropy',
 'metrics': None,
 'n_epochs': 1500,
 'n_filters': 32,
 'random_state': None,
 'use_bottleneck': True,
 'use_residual': True,
 'verbose': False}

In [13]:
# Train the Inception Time classifier on the training data
ITC.fit(X_train, y_train)

In [14]:
# We now have a trained model, which we can use to make predictions on new data.
# For this, we use the testing data, which we have not used for training the model.
y_preditions = ITC.predict(X_test)



In [15]:
# We can now evaluate the performance of our model by comparing the predicted values with the true values.
# For this, we use the sklearn library since sklearn and sktime are compatible
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_preditions)

0.9650145772594753

### Basic training completed!
This dataset is quite large with over 1000 datapoints, but an accuracy of 96.5% is still pretty solid. \
Next, I want to perform some basic data augmentations such as adding noise to the dataset, simulating errors in data collection.

**Everything below this point is a work in progress!**

In [17]:
# NOTE This code is currently not working and will throw an error because of some funky DataFrame problem I am too tired to solve right now

# Function to apply time warping
def time_warp(series, alpha=0.05):
    warp_factor = np.random.uniform(1 - alpha, 1 + alpha)
    return np.interp(np.arange(len(series)) * warp_factor, np.arange(len(series)), series)

# Function to apply time shifting
def time_shift(series, max_shift=10):
    shift_amount = np.random.randint(-max_shift, max_shift)
    return np.roll(series, shift_amount)

# Function to add noise
def add_noise(series, noise_level=0.02):
    noise = np.random.normal(0, noise_level, len(series))
    return series + noise

# Function to resample the time series
def resample_series(series, factor=2):
    return series.resample(f'{factor}H').mean()

# Apply augmentation
#X_train.reset_index(drop=True, inplace=True)    # reset index to make it a simple RangeIndex instead of a MultiIndex as an attempt to fix the problem

augmented_data = {
    'time_warp': [time_warp(X_train['total_power_demand'].values) for _ in range(500)],  # 500 augmented samples
    'time_shift': [time_shift(X_train['total_power_demand'].values) for _ in range(500)],
    'noise': [add_noise(X_train['total_power_demand'].values) for _ in range(500)],
    'resample': [resample_series(X_train['total_power_demand']).values for _ in range(500)]
}

# Concatenate the augmented data with the original data
augmented_X_train = pd.DataFrame(augmented_data)
augmented_X_train = pd.concat([X_train, augmented_X_train], ignore_index=True, axis=1)
augmented_X_train.columns = ['total_power_demand', 'time_warp', 'time_shift', 'noise']

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'

In [None]:
# Retrain a new model on the augmented data
ITC_augmented = InceptionTimeClassifier()
ITC_augmented.fit(augmented_X_train, y_train)

In [None]:
# We now predict the labels for the test data using the augmented model and score the predictions
y_preditions_augmented = ITC.predict(X_test)
accuracy_score(y_test, y_preditions)