# "Random Convolutional Kernel Transform"
> "understand Rocket paper in simple words"

- toc: true
- branch: master
- badges: true
- comments: true
- categories: [research paper, time series]
- image: images/deployment-journey-reinforcement-learning.png
- hide: false
- search_exclude: false




## Introduction

  ***ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels***  is a reasearch paper published in October 2019 by Angus Dempster, François Petitjean, Geoffrey I. Webb. The paper presents a unique methodology to transform time series data using convolutional kernels in order to improve classification accuracy. This paper is unique in learning from recent success of convolutional neural networks and transferring it on time series datasets. 

The link to download the paper from arxiv - [Paper](https://arxiv.org/pdf/1910.13051)



## Time Series data

Time series data is defined as set of data points containing details about different point in time. Generally time series data contains data points sampled/observed at equal interval of time. Time series classification can be imagined as identifying patterns and signals in time series data in relation to respective classes. 

Authors in this paper are promising fast and accurate time series classification. they propose that generating features using randomly generated kernels on time series data results in much btter accuracy.

## Kernels

Kernels in simple terms is a small matrix used to modify the images. Let's try to understand kernels using an example: 

here is a 3 x 3 kernel used to sharpen images: 

$\begin{bmatrix} 0 & -1 & 0 \\ -1 & 5 & -1 \\ 0 & -1 & 0  \end{bmatrix}$

In order to sharpen an image using above kernel, we need to perform a dot product of each pixel in image with the kernel matrix. The resulting image would then be a sharpened version of original image. Observe the gif below to see a live version of kernel dot product in motion. 

Following is an example from setosa.io site to demonstrate how kernels can change the images. 

### 5 parameters of kernels

A kernel has 5 different parameter using which it can be configured. 

| Parameter      | Description | Value logic |
| ----------- | ----------- | --------- |
| Bias      | Bias is added to the result of the convolution operation between input time series and weights of the given kernel | Bias is sampled from a uniform distribution, b ∼ U(−1,1) |
| Size(Length)   | Size defines the number of rows and columns a kernel has. The above example has a size of 3 rows and 3 colums | Length is selected randomly from {7,9,11} with equal probability, making kernels considerably shorter than input time series in most cases |
| Weights | The values that make up the kernel matrix are weights  | The weights are sampled from a normal distribution, ∀w ∈ W, w ∼ N(0,1), and are mean centered after being set, ω = W − W. As such, most weights are relatively small, but can take on larger magnitudes |
| Dilation | Dilation spreads a kernel over the input such that with a dilation of value two, weights in a kernel are convolved with every second element of input time series | Dilation is sampled on an exponential scale d = ⌊2x⌋,x ∼ U(0,A), linput −1 where A = log2 lkernel −1 |
| Padding | Padding involves appending values(typically zero) to the start and end of input time series such that the middle weight of a kernel aligns with the first value of input time series at start of convolution| When each kernel is generated, a decision is made (at random, with equal probability) whether or not padding will be used when applying the kernel| 

## Features generated by Rocket kernel

Rocket computes two aggregate features from each kernel and feature convolution. The two features are created using the well known methodology global/average max pooling and a unique methodology positive proportion value (ppv). 

### Max pooling

Global max pooling is essentially picking the maximum value from the result of convolution and max pooling is picking the maximum value within a pool size. 
Assuming that the output of convolution is 0,1,2,2,5,1,2, global max pooling outputs 5, whereas ordinary max pooling  with pool size equals to 3 outputs 2,2,5,5,5

### Proportion of positive values

I am picking the author's own words to describe ppv. 

> ppv directly captures the proportion of the input which matches a given pattern, i.e., for which the output of the convolution operation is positive. The ppv works in conjunction with the bias term. The bias term acts as a kind of ‘threshold’ for ppv. A positive bias value means that ppv captures the proportion of the input reflecting even ‘weak’ matches between the input and a given pattern, while a negative bias value means that ppv only captures the proportion of the input reflecting ‘strong’ matches between the input and the given pattern.

## Rocket usage

Now that we understand what kernels are and how rocket generates two outputs by convolution of kernel and input vector, let's understand how to use it.

The time series data needs to be provided as input into the rocket transform method, the value for number of kernels (i.e. k) is set at 10,000 by default. This means that if the input data has one feature then it would result in 20,000 features as output after rocket transform. 

The tranformed feature table can now we used as any classification algorithm, authors advise linear algorithms like ridge regression classifier or logistic regression. 


## Rocket v/s others

Rocket's approach of creating large number of random karnels and generating two features is unique. Rocket distinguishes itself based on various factors which we will discuss below. 

### Rocket v/s neural nets

1. Rocket doesn’t use a hidden layer or any non-linearities
2. Features produced by Rocket are independent of each other
3. Rocket works with any kind of classifier

### Rocket v/s CNN

1. Rocket uses very large number of kernels
2. In CNN, a group of kernels tend to share same size, dilation and padding. Rocket has all 5 parameters randomized.
3. In CNN, Dilation increases exponentially with depth; Rocket has random dilation values
4. CNNs only have average/max pooling. Rocket has a unique pooling called as ppv which has proven to provide much better classification accuracy on time series. 


## Rocket performance
### Accuracy
### Time taken to train

## Example
