<br/><font size=10>Feature extraction</font><br/>

Feature extraction refers to the process of extracting discriminating features from the input signals through domain knowledge. Traditional features are extracted from time-domain (e.g., variance, mean value, kurtosis), frequency-domain (e.g., fast Fourier transform), and timefrequency domains (e.g., discrete wavelet transform). They will enrich distinguishable information regarding user intention.[<sup>1</sup>](#refer-anchor-1)

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Manual-feature-extraction" data-toc-modified-id="Manual-feature-extraction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Manual feature extraction</a></span></li><li><span><a href="#Automatical-feature-extraction" data-toc-modified-id="Automatical-feature-extraction-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Automatical feature extraction</a></span><ul class="toc-item"><li><span><a href="#Autoencoder" data-toc-modified-id="Autoencoder-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Autoencoder</a></span></li></ul></li><li><span><a href="#Reference" data-toc-modified-id="Reference-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Reference</a></span></li></ul></div>

# Manual feature extraction 
Manual feature extraction is highly dependent on the domain knowledge. For example, neuroscience knowledge is required to extract distinctive features from motor imagery EEG signals. Manual feature extraction is also time-consuming and difficult. When manually extract feature from brain signals, some discriminating features such as time-frequency features, wavelet entropy and band-specific power are common choices.

One of the advantages of deep learning is that it can automatically learn the informative features and discover underlying patterns without domain knowledge. In this tutorial, we only introduce how to learn representative features from raw data and not discuss traditional feature extractions. 

# Automatical feature extraction

Representative deep learning models, which learn the pure and representative features automatically from the input data.
These algorithms only have the function of feature extraction but cannot make classification.
 
> As the figure[<sup>1</sup>](#refer-anchor-1) shown below:  
Representative models can be divided into Authoencoder (AE), Restricted Boltzmann Machine (RBM), and Deep Belief Networks (DBN). D-AE denotes DeepAutoencoder which refers to the Autoencoder with multiple hidden layers. Likewise, D-RBM denotes Deep-Restricted Boltzmann Machine with multiple hidden layers. Deep Belief Network can be composed of AE or RBM, therefore, we divided DBN into DBN-AE and DBN-RBM.

![avatar](https://raw.githubusercontent.com/xiangzhang1015/ML_BCI_tutorial/main/tutorial/dlm.PNG)

__Commonly used deep learning algorithms for representation are AE, RBM, DBN, along with their variations.__  

## Autoencoder
In this part, as an example of unsupervised feature extraction via deep learning, we will present the implementation of a simple framework that use AE as a feature extractor and feed the learned features to a standard KNN classifier.

In [1]:
import numpy as np
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import torch.utils.data as Data
import myimporter
from BCI_functions import *  # BCI_functions.ipynb contains some functions we might use multiple times in this tutorial

dataset_1 = np.load('1.npy')
print('dataset_1 shape:', dataset_1.shape)

# remove instance with label==10 (rest)
removed_label = [2,3,4,5,6,7,8,9,10]  #2,3,4,5,
for ll in removed_label:
    id = dataset_1[:, -1]!=ll
    dataset_1 = dataset_1[id]

# data segmentation
n_class = int(11-len(removed_label))  # 0~9 classes ('10:rest' is not considered)
no_feature = 64  # the number of the features
segment_length = 8  # selected time window; 16=160*0.1
LR = 0.005  # learning rate
EPOCH = 401

data_seg = extract(dataset_1, n_classes=n_class, n_fea=no_feature, time_window=segment_length, moving=(segment_length/2))  # 50% overlapping
print('After segmentation, the shape of the data:', data_seg.shape)

# split training and test data
no_longfeature = no_feature*segment_length
data_seg_feature = data_seg[:, :no_longfeature]
data_seg_label = data_seg[:, no_longfeature:no_longfeature+1]
train_feature, test_feature, train_label, test_label = train_test_split(data_seg_feature, data_seg_label, shuffle=True)

# normalization
# before normalize reshape data back to raw data shape
train_feature_2d = train_feature.reshape([-1, no_feature])
test_feature_2d = test_feature.reshape([-1, no_feature])

# min-max normalization
from sklearn.preprocessing import MinMaxScaler
scaler3 = MinMaxScaler().fit(train_feature)
train_fea_norm1 = scaler3.transform(train_feature)
test_fea_norm1 = scaler3.transform(test_feature)
print('After normalization, the shape of training feature:', train_fea_norm1.shape,
      '\nAfter normalization, the shape of test feature:', test_fea_norm1.shape)

# after normalization, reshape data to 3d in order to feed in to LSTM
train_fea_norm1 = train_fea_norm1.reshape([-1, segment_length, no_feature])
test_fea_norm1 = test_fea_norm1.reshape([-1, segment_length, no_feature])
print('After reshape, the shape of training feature:', train_fea_norm1.shape,
      '\nAfter reshape, the shape of test feature:', test_fea_norm1.shape)

BATCH_size = train_fea_norm1.shape[0] # use test_data as batch size

# feed data into dataloader
train_fea_norm1 = torch.tensor(train_fea_norm1).type('torch.FloatTensor')
train_label = torch.tensor(train_label.flatten())
train_data = Data.TensorDataset(train_fea_norm1, train_label)
train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_size, shuffle=False)

test_fea_norm1 = torch.tensor(test_fea_norm1).type('torch.FloatTensor')
test_label = torch.tensor(test_label.flatten())

class AutoEncoder(nn.Module):
    def __init__(self):
        super(AutoEncoder, self).__init__()

        self.encoder = nn.Sequential(
            nn.Linear(no_feature*segment_length, 64*4),
        )
        self.decoder = nn.Sequential(
            nn.Linear(64*4, no_feature*segment_length),
            nn.Sigmoid(),
        )
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return encoded, decoded

autoencoder = AutoEncoder()
optimizer = torch.optim.Adam(autoencoder.parameters(), lr=LR)
loss_func = nn.MSELoss()

best_acc = []

# classifier
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 3)

# training and testing
for epoch in range(EPOCH):
    for step, (train_x, train_y) in enumerate(train_loader):

        train_x = train_x.view(-1, no_feature*segment_length)
        train_encoded, train_decoded = autoencoder(train_x)

        loss = loss_func(train_decoded, train_x)  # mean square error
        optimizer.zero_grad()  # clear gradients for this training step
        loss.backward()  # backpropagation, compute gradients
        optimizer.step()  # apply gradients

        if epoch % 50 == 0 :
            knn.fit(train_encoded.data.numpy(), train_y.data.numpy())
            test_fea_norm1 = test_fea_norm1.view(-1, no_feature*segment_length)
            test_encoded, test_decoded = autoencoder(test_fea_norm1)
            knn_acc = knn.score(test_encoded.data.numpy(), test_label.data.numpy())

            print('Epoch: ', epoch, '| STEP: ', step, '|Autoencoder train loss: %.4f' % loss.item(),'|KNN accuracy: %.4f' % knn_acc)
            best_acc.append(knn_acc)

print('BEST TEST ACC: {}'.format(max(best_acc)))

importing Jupyter notebook from BCI_functions.ipynb
dataset_1 shape: (512, 15)


  avg = a.mean(axis)
  ret = ret.dtype.type(ret / rcount)


ValueError: cannot reshape array of size 15240 into shape (512)

_As we can see from above, the experiment performence of this combination is not good. The reason for this may trace to the characteristic of the dataset. However, for different dataset such as ones with fMRI data, autoencoder may achieve better performence._

# Reference

<div id="refer-anchor-1"></div>

- [1]  [Zhang, X., Yao, L., Wang, X., Monaghan, J.J., Mcalpine, D. and Zhang, Y., 2020. A survey on deep learning-based non-invasive brain signals: recent advances and new frontiers. Journal of Neural Engineering.](https://iopscience.iop.org/article/10.1088/1741-2552/abc902/meta)