# Contents
* [Intro](#Intro)
* [Imports and config](#Imports-and-config)
* [Load data](#Load-data)
* [Preprocess](#Preprocess)
* [Minimally Random Convolutional Kernel Transform](#Minimally-Random-Convolutional-Kernel-Transform)
  * [Ternary](#Ternary)
    * [Results ternary](#Results-ternary)
  * [Binary](#Binary)
      * [Results binary](#Results-binary)
* [Discussion](#Discussion)

## Intro

This notebook explores the MINIROCKET classification algorithm on the Mel Frequency Cepstral Coefficients (MFCCs) extracted from samples of short duration. Both the ternary and three binary cases are considered. MINIROCKET outperformed the dummy classifiers in all cases.

## Imports and config

In [1]:
# Extensions
%load_ext lab_black
%load_ext nb_black
%load_ext autotime

In [2]:
# Core
import numpy as np
import pandas as pd

# metrics
from sklearn.metrics import classification_report, confusion_matrix

# util
import swifter

# display outputs w/o print calls
from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"

# suppress warnings
import warnings

warnings.filterwarnings("ignore")

time: 3.66 s


In [3]:
from tsai.all import *

computer_setup()

os             : Windows-10-10.0.22000-SP0
python         : 3.8.12
tsai           : 0.2.23
fastai         : 2.5.2
fastcore       : 1.3.26
torch          : 1.9.1+cpu
n_cpus         : 8
device         : cpu
time: 8.06 s


In [4]:
SEED = 2021

# Location of parquet
PARQUET_DF_FOLDER = "../5.0-mic-extract_spectrograms_and_MFCCs_short"

# Location where this notebook will output
DATA_OUT_FOLDER = "."

# The preprocessed data from the Unified Multilingual Dataset of Emotional Human utterances
WAV_DIRECTORY = (
    "../../unified_multilingual_dataset_of_emotional_human_utterances/data/preprocessed"
)

time: 8 ms


## Load data

In [5]:
short_df = pd.read_parquet(f"{PARQUET_DF_FOLDER}/short_plus.parquet")
short_df.head(1)

Unnamed: 0,file,duration,source,speaker_id,speaker_gender,emo,valence,lang1,lang2,neg,neu,pos,length,padded,mfcc,melspec_db
0,01788+BAUM1+BAUM1.s028+f+hap+1+tur+tr-tr.wav,0.387,BAUM1,BAUM1.s028,f,hap,1,tur,tr-tr,0,0,1,short,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...]","[[-680.11646, -680.11646, -673.7514, -377.4224, -281.58826, -261.989, -171.36475, -55.95906, 1.2606233, 15.852701, -9.603989, -57.960983, -107.54922, -140.82532, -152.95964, -169.95496], [0.0, 0.0, 8.79389, 66.162895, 79.53461, 100.93402, 75.350586, 13.998974, -14.617619, -17.756765, -5.6782565, 8.551853, 14.135569, 3.8511767, -6.7314606, -5.6710396], [0.0, 0.0, 8.264061, 9.75589, 13.253286, 15.912096, 18.082317, 2.4743164, -16.232258, -29.686052, -31.33509, -27.387304, -19.973206, -4.8711815, 1.358885, 10.830128], [0.0, 0.0, 7.477417, 24.733551, 16.511929, 10.745639, 15.796231, 35.82299, ...","[[-80.0, -80.0, -80.0, -80.0, -80.0, -80.0, -78.2808, -78.36134, -77.20024, -80.0, -80.0, -80.0, -80.0, -80.0, -80.0, -80.0], [-80.0, -80.0, -80.0, -80.0, -80.0, -80.0, -76.948425, -75.62396, -73.44333, -62.47532, -59.695614, -63.3192, -68.97307, -69.830055, -71.20323, -74.88162], [-80.0, -80.0, -80.0, -80.0, -80.0, -80.0, -65.29121, -44.23695, -34.33359, -31.309872, -37.83037, -55.600906, -74.79756, -76.61576, -80.0, -80.0], [-80.0, -80.0, -80.0, -80.0, -80.0, -58.58845, -29.926083, -13.307529, -13.760824, -19.257774, -27.894817, -47.51501, -57.81294, -67.32547, -80.0, -61.717278], [-80.0..."


time: 289 ms


In [6]:
df = short_df[["speaker_id", "neg", "neu", "pos", "valence", "mfcc"]]

time: 10 ms


## Preprocess

MINIROCKET will need each nested array as its own feature rather than having them all in one column of arrays.

In [7]:
df = df.drop(columns="mfcc").merge(
    pd.concat(df.mfcc.swifter.apply(lambda _: pd.DataFrame(_).T).tolist()).set_index(
        df.index
    ),
    left_index=True,
    right_index=True,
)

Pandas Apply: 100%|██████████| 480/480 [00:00<00:00, 793.39it/s]

time: 772 ms





In [8]:
len(df)
df.head(1)

480

Unnamed: 0,speaker_id,neg,neu,pos,valence,0,1,2,3,4,...,10,11,12,13,14,15,16,17,18,19
0,BAUM1.s028,0,0,1,1,"[-680.11646, -680.11646, -673.7514, -377.4224, -281.58826, -261.989, -171.36475, -55.95906, 1.2606233, 15.852701, -9.603989, -57.960983, -107.54922, -140.82532, -152.95964, -169.95496]","[0.0, 0.0, 8.79389, 66.162895, 79.53461, 100.93402, 75.350586, 13.998974, -14.617619, -17.756765, -5.6782565, 8.551853, 14.135569, 3.8511767, -6.7314606, -5.6710396]","[0.0, 0.0, 8.264061, 9.75589, 13.253286, 15.912096, 18.082317, 2.4743164, -16.232258, -29.686052, -31.33509, -27.387304, -19.973206, -4.8711815, 1.358885, 10.830128]","[0.0, 0.0, 7.477417, 24.733551, 16.511929, 10.745639, 15.796231, 35.82299, 40.227936, 36.642612, 23.608091, 11.745134, 4.5839405, 6.3710694, 8.985272, 17.496689]","[0.0, 0.0, 6.374815, -8.809654, -16.542591, -18.163967, -8.233423, -7.6783943, -6.66774, -1.172962, 0.50424504, -2.4193215, -2.3986745, -1.103419, -2.099951, 1.6683358]",...,"[0.0, 0.0, -2.0559676, -14.131269, -10.666561, -4.1570997, -0.5074165, -8.999041, -1.8765199, 5.3671103, 10.71143, 13.516993, 9.113131, 3.2246933, -3.8375764, -5.1680336]","[0.0, 0.0, -3.1609914, 10.403749, 14.534487, 13.064995, 8.332318, 5.8334017, 10.699654, 14.578323, 15.829313, 17.247204, 11.353828, 13.147543, 13.669, 14.035996]","[0.0, 0.0, -4.0377975, -29.72546, -22.83678, -16.445942, -4.9928675, 15.350084, 19.407963, 14.843939, 8.478773, -6.0084887, -10.568147, -4.7380877, -3.8080454, -3.8993516]","[0.0, 0.0, -4.599332, -11.415039, -8.413191, -10.51317, -10.130811, -0.12358958, -1.6337358, -8.780941, -11.972395, -12.327576, -9.829273, -11.770202, -9.532415, -6.396286]","[0.0, 0.0, -4.936611, -15.237455, -11.919718, -7.268784, 1.2249193, -12.30282, -14.568584, -16.464142, -16.866993, -10.407554, -4.7797623, -5.87093, -10.573454, -12.771958]","[0.0, 0.0, -5.071619, -10.831688, -9.4410095, -5.276347, -10.193133, -14.760956, -20.380651, -19.564754, -12.378426, -11.466377, -15.276484, -11.285213, -9.57357, -4.8591814]","[0.0, 0.0, -4.924422, 19.01656, 10.312002, 3.495909, 4.451464, 3.6062825, -5.5586996, -11.616903, -9.106398, -1.5299711, -2.1217766, -6.1071005, -8.450521, -8.974888]","[0.0, 0.0, -4.57422, 4.5666795, 1.1526635, -0.65908664, -11.719039, -12.666076, -12.984398, -12.09377, -9.214006, -6.897912, -9.373976, -12.786033, -14.691204, -17.516254]","[0.0, 0.0, -4.1403975, -18.84467, -15.528858, -11.019453, -6.235977, -13.344015, -12.349755, -7.268405, -4.918732, -1.0433177, 2.1604173, -0.69934475, -2.2053201, -1.2783842]","[0.0, 0.0, -3.5799732, -12.330613, -10.483339, -11.822466, -8.398724, -5.5190945, -1.1962025, 2.165871, -1.5818354, -5.6133785, -4.829237, 2.7716446, 4.678732, 4.0262184]"


time: 45 ms


## Train test split

The custom split ensures no data leakage due to speaker characteristics.

In [9]:
short_speakers = (
    pd.DataFrame(np.unique(df.speaker_id))
    .sample(frac=0.30, random_state=SEED)[0]
    .values
)

criterion = df.speaker_id.isin(short_speakers)

drop_columns = ["speaker_id", "neg", "neu", "pos"]
X_test = (_ := df.loc[criterion].drop(columns=drop_columns)).drop(columns="valence")
y_test = _.valence
X_train = (_ := df.loc[~criterion].drop(columns=drop_columns)).drop(columns="valence")
y_train = _.valence

len(df) == len(y_test) + len(y_train)
print(f"{len(y_test)} in test, {len(y_train)} in train")

True

190 in test, 290 in train
time: 36 ms


## Minimally Random Convolutional Kernel Transform

MiniRocket was [published in August 2021](https://doi.org/10.1145/3447548.3467231), touting state-of-the-art performance on benchmark time series classification tasks.

In [10]:
model = MiniRocketClassifier(random_state=SEED)

time: 4 ms


### Ternary results

In [11]:
fitted_minirocket = model.fit(X_train, y_train)

time: 625 ms


How well would a dummy classifier do?

In [12]:
counts = y_test.value_counts()
len_test = len(y_test)
for valence in ("-1", "0", "1"):
    print(
        f"{(_ := counts[valence])} samples of valence {valence}: {(100 * _)/len_test:.2f}% of {len_test}"
    )

66 samples of valence -1: 34.74% of 190
85 samples of valence 0: 44.74% of 190
39 samples of valence 1: 20.53% of 190
time: 16 ms


How well did MINIROCKET do?

In [13]:
print(
    confusion_matrix(
        y_test,
        _ := fitted_minirocket.predict(X_test),
        labels=["-1", "0", "1"],
    )
)
print(classification_report(y_test, _))

[[32 27  7]
 [26 53  6]
 [13 15 11]]
              precision    recall  f1-score   support

          -1       0.45      0.48      0.47        66
           0       0.56      0.62      0.59        85
           1       0.46      0.28      0.35        39

    accuracy                           0.51       190
   macro avg       0.49      0.46      0.47       190
weighted avg       0.50      0.51      0.50       190

time: 286 ms


Validation accuracy of 51% exceeds the proportion of the majority class (~45%) by about 6%.

### Binary Results

Next, we will repeat the above analysis with the binary cases. First, we need to set up the data.

In [14]:
OvrSet = namedtuple("OvrSet", "name, y_test, y_train")
binary_valence = [
    OvrSet(
        name=valence,
        y_test=df.loc[criterion][valence],
        y_train=df.loc[~criterion][valence],
    )
    for valence in ("neg", "neu", "pos")
]

time: 29 ms


How does MINIROCKET do in comparison to dummy classifiers in the binary cases?

In [15]:
for labels in binary_valence:
    y_test = labels.y_test
    percent = (100 * y_test.sum()) / len(y_test)
    print(
        f"majority classification percentage for {labels.name} valence: {percent if percent > 50 else 100 - percent:.3f}"
    )
    print(
        confusion_matrix(
            y_test,
            _ := model.fit(X_train, labels.y_train).predict(X_test),
        )
    )
    print(classification_report(y_test, _))

majority classification percentage for neg valence: 65.263
[[116   8]
 [ 50  16]]
              precision    recall  f1-score   support

           0       0.70      0.94      0.80       124
           1       0.67      0.24      0.36        66

    accuracy                           0.69       190
   macro avg       0.68      0.59      0.58       190
weighted avg       0.69      0.69      0.65       190

majority classification percentage for neu valence: 55.263
[[90 15]
 [54 31]]
              precision    recall  f1-score   support

           0       0.62      0.86      0.72       105
           1       0.67      0.36      0.47        85

    accuracy                           0.64       190
   macro avg       0.65      0.61      0.60       190
weighted avg       0.65      0.64      0.61       190

majority classification percentage for pos valence: 79.474
[[149   2]
 [ 36   3]]
              precision    recall  f1-score   support

           0       0.81      0.99      0.89      

In the negative/non-negative case, the dummy score on the test set was 65.3%, which underperformed the MINIROCKET classifier's score of 69% by about 3.7%.

In the neutral/non-neutral case, the dummy score on the test set was 55.3%, which underperformed the MINIROCKET classifier's score of 64% by about 8.7%.

In the positive/non-positive case, the dummy score on the test set was 79.5%, which underperformed the MINIROCKET classifier's score of 80% by about 0.5%.

## Discussion

In this notebook, we tested MINIROCKET on the MFCCs of the short set. Both the ternary and binary cases were considered. The MINIROCKET classifier was able to outperform the dummy classifier in all cases.

Class imbalance was the most drastic in the positive/non-positive case, which is where the margin between MINIROCKET and the dummy classifier was the smallest.

The MINIROCKET algorithm may have potential, especially if ensembled for one-vs-rest classification. Conversely, although preprocessing for `tsai` only needs to be computed once, storing two versions of MFCC arrays may be cumbersome in comparison to other methods. Nonetheless, the MFCC arrays are more lightweight than the spectrogram arrays.

[^top](#Contents)