# 1 Time series classification
<a href="https://colab.research.google.com/github/jarusgnuj/ioctm358/blob/master/notebooks/time_series_classification/1_TSC_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Time series classification (TSC) operates on time series data, a series of values that is ordered by time. Data samples are labelled as belonging to a particular class. The TSC system is trained using this data to classify unlabelled samples. There is a wide range of TSC applications. Smartwatch data is used to classify human activities (walking, running, ascending stairs, etc.). Animal behaviour (hunting, sleeping) is monitored using accelerometers on tagged, wild animals for environmental studies. Sensors on industrial machines are used to classify time series samples as either normal or preceding a failure, informing machine maintenance schedules.

## 1.1 Our dataset
This exercise uses the SonyAIBORobotSurface1 dataset from the [UEA & UCR Time Series Classification Repository](https://www.timeseriesclassification.com) (Dau et al, 2018). This dataset was collected by Vail and Veloso (2004), Carnegie Mellon University, from an accelerometer on a Sony AIBO robot. Their aim was to detect the surface that the robot was walking on in order to optimise its gait for that surface. The robots competed in the RoboCup League, a football game played on a carpeted field.



![The Sony AIBO Robot is a robot dog. It is pictured with a ball.](https://i1.wp.com/www.techdigest.tv/wp-content/uploads/2015/06/aibo-560.jpg "Sony AIBO Robot")

## 1.2 References
Dau, H. A., Bagnall, A., Kamgar, K., Yeh, C.-C. M., Zhu, Y., Gharghabi, S., Ratanamahatana, C. A. and Keogh, E. (2018) ‘The UCR Time Series Archive’, [Online]. Available at http://arxiv.org/abs/1810.07758 (Accessed 4 May 2019).

Vail, D. and Veloso, M. (2004) ‘Learning from accelerometer data on a legged robot’, *IFAC Proceedings*, vol. 37, no. 8, pp. 822–827 [Online]. Available at https://www.cs.cmu.edu/~mmv/papers/04iav-doug.pdf (Accessed 4 May 2019).

## 1.3 Data capture and processing

The robot data provided is the x-axis accelerometer data sampled at 125Hz (125 times per second). A positive value relates acceleration in the forward direction. Each data sample has 70 data points (0.56s) and is labelled as either cement or carpet. The original data had a positive mean, because the robot leans forwards slightly, and was in the range approximately [0, 0.4] gravities. 
+ The samples are aligned - each starts at the same point in the robot's walk.
+ The dataset provided has been standardised.

## 1.4 Dataset format
The entire dataset is provided in a single comma separated variable file (csv).

![621x71 matrix of data](images/time_series_dataset.png "Dataset")

# 2 Load Python code from other sources
Import the Python modules that we will need.

In [None]:
import numpy as np  # Arrays, matrices and functions on them. Required by Pandas, below
import pandas as pd # A data analysis library
from sklearn.model_selection import train_test_split # scikit-learn, machine learning tools
import matplotlib.pyplot as plt # A plotting library
import seaborn as sns # Built on matplotlib, facilitates aesthetically pleasing plots

# General settings
sns.set_style('whitegrid') # Plots will have a white grid

# 3 Load the data

In [None]:
url_root = 'https://raw.githubusercontent.com/jarusgnuj/ai-ml-wksh/master/data/UCR_TSC_archive/SonyAIBORobotSurface1_IoC'
url = url_root+'/SonyAIBORobotSurface1_IoC_ALL.txt'
robot_df = pd.read_csv(url, sep='\t', header=None) # Use Pandas to load the data into a Pandas DataFrame
print('Loaded from', url)
robot_data = robot_df.values # Convert from a Pandas DataFrame to a numpy array

# Print information about the data's shape and size
print('The shape of robot_data is', robot_data.shape, '\n')
print('The robot_data is a matrix. These are the first 7 rows and 5 columns of robot_data:\n', robot_data[:7, :5], '\n')
print('The first row of robot_data, in full:\n', robot_data[0], '\n')
print('The second row of robot_data, in full:\n', robot_data[1], '\n')
print('The first column of robot_data:\n', robot_data[:,0], '\n')

# 4 Process the data
Separate out the labels vector from the time series data samples. For convenience we will use class labels 0 and 1 instead of 1 and 2. 

class 0 : cement

class 1 : carpet

In [None]:
labels = robot_data[:,0]
data = robot_data[:,1:]
print('The shape of the data matrix is', data.shape)
print('The shape of the labels vector is', labels.shape)

# Change from classes 1 and 2 to classes 0 and 1, for convenience later
labels = labels - 1
labels = labels.astype(int)

In [None]:
# Variables that will help us work with the classes
class_names = ['cement', 'carpet']
class_colors = ['darkorange', 'steelblue']

## 4.1 Plot the data
Select the row number of the sample you wish to plot. Find out what class that sample belongs to then plot the sample.


In Python, and many other programming languages, the first row in a matrix is row 0.

In [None]:
sample_number = 0 ### CHANGE PARAMETER HERE ###
sample_label = labels[sample_number]
class_name = class_names[sample_label]
print('sample_number:', sample_number)
print('sample_label:', sample_label)
print('class_name:', class_name)

fig, ax = plt.subplots()
plt.plot(data[sample_number], label=class_name, color='darkred')
plt.legend(loc='upper right', frameon=False)
plt.suptitle('A single data sample')
ax.set_ylabel('Standardised x-axis accelerometer data')
ax.set_xlabel('Data point number')

## 4.2 Plot two samples to compare carpet to cement

In [None]:
sample_a = 0 ### CHANGE PARAMETER HERE ###
sample_b = 1 ### CHANGE PARAMETER HERE ###

fig, ax = plt.subplots()
plt.plot(data[sample_a], label=class_names[labels[sample_a]], color=class_colors[labels[sample_a]])
plt.plot(data[sample_b], label=class_names[labels[sample_b]], color=class_colors[labels[sample_b]])
plt.legend(loc='upper right', frameon=False)
plt.suptitle('Comparison of data samples')
ax.set_ylabel('Standardised x-axis accelerometer data')
ax.set_xlabel('Data point number')

# 5 Discussion 1a : Your own time series classification applications
+ Can you think of some useful applications within your organisation?
+ Does your organisation generate or depend upon time series data?

# 6 Exercise 1a : Explore the data
+ Look for the code comments "### CHANGE PARAMETER HERE ###"
+ Select 5 different class 0 samples and plot them together.
  + Do they look similar?
+ Continue to the next cell and select 5 different class 1 samples and plot them together.
+ The cells for this exercise end with the text "End of exercise 1.1"

In [None]:
print('Labels of some of the first few data samples:')
print(labels[:17])

In [None]:
# In this cell - change parameter i where instructed in order to 
# select 5 different class 0 samples and plot them together.
fig, ax = plt.subplots()

i = 1 ### CHANGE PARAMETER HERE ###
plt.plot(data[i], color=class_colors[labels[i]])
print('sample', i, 'class', str(labels[i]), class_names[labels[i]])

i = 1 ### CHANGE PARAMETER HERE ###
plt.plot(data[i], color=class_colors[labels[i]])
print('sample', i, 'class', str(labels[i]), class_names[labels[i]])

i = 1 ### CHANGE PARAMETER HERE ###
plt.plot(data[i], color=class_colors[labels[i]])
print('sample', i, 'class', str(labels[i]), class_names[labels[i]])

i = 1 ### CHANGE PARAMETER HERE ###
plt.plot(data[i], color=class_colors[labels[i]])
print('sample', i, 'class', str(labels[i]), class_names[labels[i]])

i = 1 ### CHANGE PARAMETER HERE ###
plt.plot(data[i], color=class_colors[labels[i]])
print('sample', i, 'class', str(labels[i]), class_names[labels[i]])

plt.ylim([-3.5, 3.5])
plt.title('Walking on cement')
ax.set_ylabel('Standardised x-axis accelerometer data')
ax.set_xlabel('Data point number')

+ Select 5 different class 1 samples and plot them together.
This time, we'll use a for loop to make the code more compact.

In [None]:
# In this cell - change the list of sample numbers where instructed in order to 
# select 5 different class 1 samples and plot them together.
fig, ax = plt.subplots()
samples = [0, 0, 0, 0, 0] ### CHANGE PARAMETER HERE ###

for i in samples:
    plt.plot(data[i], color=class_colors[labels[i]])
    print('sample', i, 'class', str(labels[i]), class_names[labels[i]])

plt.ylim([-3.5, 3.5])
plt.title('Walking on carpet')
ax.set_ylabel('Standardised x-axis accelerometer data')
ax.set_xlabel('Data point number')

## End of exercise 1a

## 6.1 Discussion 1b
Do the class 0 and class 1 samples look different?

In what way?

What class would you say the sample below belongs to?

In [None]:
sample_number = 500 
sample_label = labels[sample_number]
class_name = class_names[sample_label]
fig, ax = plt.subplots()
plt.plot(data[sample_number], color='darkred')
txt = 'Sample '+str(sample_number)+': Cement or carpet?\nDo you recognise the data\'s pattern?'
plt.suptitle(txt)
ax.set_ylabel('Standardised x-axis accelerometer data')
ax.set_xlabel('Data point number')

Were you right? Let's find out.

In [None]:
sample_number = 0  ### CHANGE PARAMETER HERE ###
print('sample_number:', sample_number)
print('sample_label:', sample_label)
print('class_name:', class_name)

# 7 Examine the balance of the dataset
How many class 0 samples are there? How many class 1 samples?

Here we use the Pandas library (pd) to create a barchart with ease. First we must create a pandas DataFrame, labels_df, from our vector of labels.

In [None]:
print('Number of samples of class 0 (cement)', (labels == 0).sum())
print('Number of samples of class 1 (carpet)', (labels == 1).sum())
print('Balance:', (labels == 0).sum() / ((labels == 0).sum()+(labels == 1).sum()), 'class 0')

fig, ax = plt.subplots()
labels_df = pd.DataFrame(labels) # Create a pandas DataFrame
labels_df[0].value_counts().reindex([0, 1]).plot(kind='bar', color=class_colors)
plt.title('Dataset balance')
ax.set_ylabel('Number of samples')
ax.set_xlabel('Class')

# 8 Load a pre-prepared balanced dataset

In [None]:
url = url_root+'/SonyAIBORobotSurface1_IoC_BALANCED.txt'
robot_df = pd.read_csv(url, sep='\t', header=None)
print('Loaded from', url)
robot_data_bal = robot_df.values
print('The shape of the balanced dataset, robot_data_bal, is', robot_data_bal.shape)

In [None]:
labels_bal = robot_data_bal[:,0] # '_bal' for 'balanced'
data_bal = robot_data_bal[:,1:]

# Change from classes 1 and 2 to classes 0 and 1
labels_bal = labels_bal - 1
labels_bal = labels_bal.astype(int)

print('Number of samples of class 0 (cement)', (labels_bal == 0).sum())
print('Number of samples of class 1 (carpet)', (labels_bal == 1).sum())

fig, ax = plt.subplots()
labels_df = pd.DataFrame(labels_bal) # Create a pandas DataFrame
labels_df[0].value_counts().reindex([0, 1]).plot(kind='bar', color=class_colors)
plt.title('Dataset balance')
ax.set_ylabel('Number of samples')
ax.set_xlabel('Class')

# 9 Split the dataset into development and final test datasets

![The dataset is split into two, unequal sets](images/final_test_dataset.png "Split into development and final test datasets")

In [None]:
final_test_set_size = 100
# Rename as x and y, for convenience.
x = data_bal
y = labels_bal

x_dev, x_finaltest, y_dev, y_finaltest = train_test_split(
    x, y, test_size=final_test_set_size, random_state=21, stratify=y)
print('The shape of x_dev is', x_dev.shape)
print('The shape of x_finaltest is', x_finaltest.shape)
print('Development data:')
print('Number of samples of class 0', (y_dev == 0).sum())
print('Number of samples of class 1', (y_dev == 1).sum())
print('Final test data:')
print('Number of samples of class 0', (y_finaltest == 0).sum())
print('Number of samples of class 1', (y_finaltest == 1).sum())

These datasets could now be saved to file and reloaded in the next notebook. Instead, we'll load a prepared dataset in the next notebook.