# sktime worksheet - time seriesclassification

This is an exercise style worksheet introducing the [sktime](https://github.com/alan-turing-institute/sktime) time series classification module, for self-study and user testing.

This worksheet contains exercises for you to attempt after:
* going through the basic `sktime` demo, live or [on youtube](https://www.youtube.com/watch?v=wqQKFu41FIw)
* having had a look at the tutorial notebooks yourself.

For this sheet, you are actively encouraged to seek out the sktime documentation and tutorial notebooks to solve the exercises.

We would very much appreciate if you could leave feedback, especially critical feedback, in the markdown fields meant for this. You can also raise issues on any bugs or improvement suggestions.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Exercise 1: loading datasets

(a) use `sktime`'s inbuilt data handling functionality to load the inline skates data set from timeseriesclassification.com

(b) convert the feature data frame into 3d numpy format

(c) convert the feature data frame into long pandas data frame format

(d) convert the feature data frame into nested pandas data frame format

Questions:
* was it easy/difficult to find the function to load the dataset?
* was it easy/difficult to find the inline skates data set, and how to load it?
* were the data type conversions easy/difficult to find and carry out?

### Exercise 2: basic time series classification

(a) randomly split the inline skates data set into a uniformly random 80/20 train/test split

(b) train a HIVE-COTE classifier with default parameters on the training set, predict the classes on the test set

(c) evaluate the result using classification accuary and balanced F1 scores

Questions:
* was it easy/difficult to find the classifier?
* was it easy/difficult to apply the classifier?
* was it easy/difficult to understand how to evaluate the classifier?

### Exercise 3: composite time series classifiers

(a) build the following time series classifier: a k-nearest neighbor classifier, where both time series distance and the parameter k are tuned using cross-validation grid search

(b) use the classifier in (a) and run the evaluation from exercise 2

(c) build the following time series classifier: a pipeline consisting of a feature extractor that computes mean, variance, and five quartiles for each time series. A grid search tuned support vector classifier (for tabular data, from `sklearn`) is then used to predict the label.

(d) use the classifier in (c) to obtain probabilistic class predictions on a uniformly random 80/20 split, and evaluate using the logartithmic loss



Questions:
* was it easy/difficult to find out how to tune classifiers?
* was it easy/difficult to find out how to chain feature extraction and a sklearn classifier?
* was it easy/difficult to understand how to obtain and evaluate probabilistic predictions?

### Exercise 4: Multivariate time series classification

(a) load any multivariate time series classification set from timeseriesclassification.com which is not the basic motions dataset.

(b) find, build, and evaluate a multivariate time series classifier on that data set

(c) use grid search to tune which column-wise classifiers are used in a column ensemble classifier

Questions:
* was it easy/difficult to find out material on multivariate time series classification?
* was it easy/difficult to find out how to load multivariate time series classification data sets?
* was it easy/difficult to tune over classifier choice?

### General feedback:
* What was the most difficult part to understand?
* Did you get stuck anywhere or had to skip? 
* How can we improve the documentation? 
* What additional features would you like to see?