# Deep Learning for EEG - Data

This tutorial is two-fold. One the one hand, we will go through a simple example on how to train a model with the braindecode environment. This is particularly useful for you, as cou can have benchmarks to compare with. On the other hand, we will look into setting up our own dataset with torch and create a small model. The latter part is not new to everyone, especially if you looked into the tutorial we provided. However, in order to get a feeling for how to work with EEG-Data, this should be useful. 

For the braindecode walkthrough, we simply follow the steps shown [here](https://braindecode.org/auto_examples/plot_bcic_iv_2a_moabb_trial.html). As you will see, braindecode makes use of skorch, to avoid all the "boilerplate" code, usually necessary for training. In my opinion, writing those code is not as harm- and stressful as some people claim, so we will do it for our own purpose again. 

Libraries we are using (directly or indirectly) in this tutorial:
- [Braindecode](https://braindecode.org/)
- [PyTorch](https://pytorch.org/tutorials/)
- [Pandas](https://pandas.pydata.org/docs/)
- [Skorch](https://skorch.readthedocs.io/en/stable/)

You do not have to know all those libraries. If you have trouble during the projects and need to work with them, you can still look into them. However, I assume that you can stick to a small set of libraries.


## Braindecode Training

In [None]:
# Install MNE, MOABB and Braindecode from github
!pip --quiet install mne 
!pip --quiet install braindecode 
!pip --quiet install moabb

[K     |████████████████████████████████| 7.4 MB 5.5 MB/s 
[K     |████████████████████████████████| 177 kB 5.4 MB/s 
[K     |████████████████████████████████| 155 kB 50.6 MB/s 
[K     |████████████████████████████████| 144 kB 5.1 MB/s 
[K     |████████████████████████████████| 636 kB 30.3 MB/s 
[K     |████████████████████████████████| 242 kB 52.5 MB/s 
[K     |████████████████████████████████| 42 kB 1.0 MB/s 
[K     |████████████████████████████████| 38.1 MB 1.4 MB/s 
[?25h  Building wheel for pyriemann (setup.py) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datascience 0.10.6 requires coverage==3.7.1, but you have coverage 5.5 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
coveralls 0.5 requires coverage<3.999,>=3.6, but you have coverage 5.5 which is in

In [None]:
# Torch
import torch

# Braindecode
import braindecode
from braindecode.datasets import MOABBDataset
from braindecode.preprocessing import (
    exponential_moving_standardize, preprocess, Preprocessor, scale)
from braindecode.preprocessing import create_windows_from_events
from braindecode import EEGClassifier
from braindecode.util import set_random_seeds
from braindecode.models import ShallowFBCSPNet

# Skorch - fits well with scikit learn and the tensorflow like style of wrapping thins together
from skorch.callbacks import LRScheduler
from skorch.helper import predefined_split

# Evaluation
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D 
import pandas as pd  # Creating a data frame from a history array
from sklearn.metrics import confusion_matrix
from braindecode.visualization import plot_confusion_matrix

### Data handling

At first, we load data as we did it in one of the previous tutorials via MOABB.
For computational reasons, we will stick to two subjects (check the descriptions
of the datasets if you want to know how many subjects were involved in the experiments). 

Following that, we apply some preprocessing to the data. At first, we are only interested in the EEG-data, so we **drop the stimuli and the MEG** electrodes. Furthermore, we **scale the the data**. Raw data is usually given in Volt. However, due to the "weak" electric fields, we are in a very small range. This is unhandy for deep learning, especially for regression tasks. Hence, we scale the data to micro Volt. Additionally **we filter very low and very high frequencies**, as both are unlikely to convey/ contain information to/ for the network. The last thing we apply is a **smoothing operation**, s.t. the network is confronted with smoothly varying functions instead with uncontinuously varying ones.

Two last things remain to do with the data. The first thing is to **cut it into windows**. This increases the number of available samples and reduces computational complexity in terms of time and memory. Secondly, we will **split our data** into a validation and a test set.

In [None]:
subject_ids = [1, 2]
dataset = MOABBDataset(dataset_name="BNCI2014001", subject_ids=subject_ids)

# We prepocess the data

# Frequency bands
low_cut_hz = 4. 
high_cut_hz = 38. 

# Parameters for exponential moving standardization
factor_new = 1e-3
init_block_size = 1000

preprocessors = [
    Preprocessor('pick_types', eeg=True, meg=False, stim=False),  
    Preprocessor(scale, factor=1e6, apply_on_array=True),  
    Preprocessor('filter', l_freq=low_cut_hz, h_freq=high_cut_hz),  
    Preprocessor(exponential_moving_standardize,  
                 factor_new=factor_new, init_block_size=init_block_size)
]

preprocess(dataset, preprocessors)

# Usually, you should check if the sampling frequency is the same for all datasets, 
# see the tutorial on the braindecode page for more infos
trial_start_offset_seconds = -0.5
sfreq = dataset.datasets[0].raw.info['sfreq']
trial_start_offset_samples = int(trial_start_offset_seconds * sfreq) # convert from seconds to samples

windows_dataset = create_windows_from_events(
    dataset,
    trial_start_offset_samples=trial_start_offset_samples,
    trial_stop_offset_samples=0,
    preload=True,
)


# The split operation from braindecode offers the option to split the datasets
# according to the information in the datset info object.
splitted = windows_dataset.split('session')
train_set = splitted['session_T']
valid_set = splitted['session_E']

In [None]:
"""
It is highly likely, that you do not want to load the data all the time, 
preprocess it and so on. Therefore, you can store the dataset to your drive 
(if you have enough memory there). 
"""
train_set.save("/content/drive/MyDrive/AINS/data/training", overwrite=True)
valid_set.save("/content/drive/MyDrive/AINS/data/validation", overwrite=True)


train_set_loaded = braindecode.datautil.load_concat_dataset("/content/drive/MyDrive/AINS/data/training", preload=True)
print(train_set.description)
print(train_set_loaded.description)

### Model and Hyperparameters
Now, that we have prepared our dataset, we are ready to turn to the fun part, namely the deep learning setup. This is easy for our toy example. First, we check for the GPU. Afterwards, we set a **seed for reproducible results**. In order to set up the model, we need some information about the data set.

1. Output dimension: Since we have a classification problem, we need the number of classes, 4 (left hand, right hand, feet, tongue) in our case. 
2. The number of input features: Equals the number of channels of our data
3. How many input samples we have

The model we are using is described in [1] (and is important for project 1 of the seminar). It is a **shallow convolutional network** - so nothing really deep here ;). Besides setting up the model, we will choose some **hyperparameters**. Hyperparameters are the settings one can adjust in order to improve the results. Examples are:
- Network parameters: e.g. for convolutional networks the number of kernels/ filters, the kernel size, the stride ...
- Optimization parameters: e.g. the concrete choice of the optimizer, the learning rate η, learning rate schedulers ...
- Number of epochs (training time), batch size ...


[1] Schirrmeister, R. T., Springenberg, J. T., Fiederer, L. D. J., Glasstetter, M., Eggensperger, K., Tangermann, M., Hutter, F. & Ball, T. (2017). Deep learning with convolutional neural networks for EEG decoding and visualization. Human Brain Mapping , Aug. 2017

In [None]:
cuda = torch.cuda.is_available()  # check if GPU is available, if True chooses to use it
device = 'cuda' if cuda else 'cpu'
if cuda:
    torch.backends.cudnn.benchmark = True

# Seed
seed = 20200220
set_random_seeds(seed=seed, cuda=cuda)

# Network params
n_classes = 4
n_chans = train_set[0][0].shape[0]
input_window_samples = train_set[0][0].shape[1]

model = ShallowFBCSPNet(
    n_chans,
    n_classes,
    input_window_samples=input_window_samples,
    final_conv_length='auto',
)

# Send model to GPU
if cuda:
    model.to(device)

# Hyperparameters
lr = 0.000625 
weight_decay = 0
batch_size = 64
n_epochs = 5

### Training

We are nearly done. The only thing left is, that we wrap everything into the braindecode "parent classes": **EEGClassifier** and **EEGRegressor**. Both provide methods to train and predict in the **tensorflow way**, as I would call it. That is, you have methods like `fit()` and `predict()`. In combination with skorch, which enables e.g the usage of **callbacks**, the library becomes user-friendly and one can test ideas very fast. Overall, you just need to implement your own model. We will do this later as well. 

In [None]:
# We should pass the modules not instanciated (although it is possible to pass instanciated classes)
state_dict = torch.load()   # Load state dict
model = model.load_state_dict(state_dict)   # 
model.eval()

clf = EEGClassifier(
    model,   # our model
    criterion=torch.nn.NLLLoss,  # We use a Negative Log LikeLihood Loss
    optimizer=torch.optim.Adam, # Adam 
    train_split=predefined_split(valid_set),  # using valid_set for validation
    optimizer__lr=lr,
    optimizer__weight_decay=weight_decay,
    batch_size=batch_size,
    callbacks=[
        "accuracy", ("lr_scheduler", LRScheduler('CosineAnnealingLR', T_max=n_epochs - 1)),
    ],
    device=device,
)
# Model training for a specified number of epochs. `y` is None as it is already supplied
# in the dataset.
clf.fit(train_set, y=None, epochs=n_epochs)

### Evaluation
Now, that we have trained our model, we want to plot the results of the training process. The skorch framework provides a history object where the training progress is stored. As I simply adopted the braindecode tutorial, you are free to plot in a different style, with a different library and without the usage of pandas! It is already quite exhausting to work with torch, MNE and braindecode, so look what suits you the best and makes life easier for you!

In [None]:
# The wrappers (EEGClassification/ EEGRegression) have a history attribute, 
# that contain the results. We can extraxt data from them
results_columns = ['train_loss', 'valid_loss', 'train_accuracy', 'valid_accuracy']
print(type(clf.history))
df = pd.DataFrame(clf.history[:, results_columns], columns=results_columns,
                  index=clf.history[:, 'epoch'])

# We can add new columns to the data frame: the misclassification is a good
# quantity to countercheck the loss
df = df.assign(train_misclass=100 - 100 * df.train_accuracy,
               valid_misclass=100 - 100 * df.valid_accuracy)

# Styling is set as in the original Schirrmeister Paper
plt.style.use('seaborn')

# Creating one single subplot
fig, ax1 = plt.subplots(figsize=(8, 3))

# We can access multiple columns in a data frame object and plot directly from 
# the data frame functionality .plot() into the created axes object ax1
df.loc[:, ['train_loss', 'valid_loss']].plot(
    ax=ax1, style=['-', ':'], marker='o', color='tab:blue', legend=False, fontsize=14)

ax1.tick_params(axis='y', labelcolor='tab:blue', labelsize=14)
ax1.set_ylabel("Loss", color='tab:blue', fontsize=14)

# We want to plot the missclassification values for comparison - therefore 
# we create a second axes which shares the same x-axes as ax1 (.twinx())
ax2 = ax1.twinx()  

# Afterwards, we use the same functionality for the plot of the misclassification
# as for the loss values
df.loc[:, ['train_misclass', 'valid_misclass']].plot(
    ax=ax2, style=['-', ':'], marker='o', color='tab:red', legend=False)
ax2.tick_params(axis='y', labelcolor='tab:red', labelsize=14)
ax2.set_ylabel("Misclassification Rate [%]", color='tab:red', fontsize=14)
ax2.set_ylim(ax2.get_ylim()[0], 85)  # make some room for legend
ax1.set_xlabel("Epoch", fontsize=14)

# We can modify the styling of the data plots (colors, linestyling etc)
handles = []
handles.append(Line2D([0], [0], color='black', linewidth=1, linestyle='-', label='Train'))
handles.append(Line2D([0], [0], color='black', linewidth=1, linestyle=':', label='Valid'))
plt.legend(handles, [h.get_label() for h in handles], fontsize=14)
plt.tight_layout()

#### Confusion Matrix
Something, that is used frequently in classification tasks are confusion matrices. They provide a good overview on how good the model actually performed besides the normal accuracy metric and the loss value. Wikipedia has a sufficient description in [this article](https://en.wikipedia.org/wiki/Confusion_matrix) if you need a recap or want to get familiar with the topic.

We will create a confusion matrix from scikit learn and plot it with the inbuild visualization tool from braindecode. Hew, we predict with our EEGClassification wrapper, however, you could simply do this with your own routine and your own model. The results obtained can be used for the confusion matrix method from sklearn again, the methods here are useful even if you do not use the wrapper facilites of skorch. 

Within the confusion matrix (see the copy below), you can read off different metrics/ quantities. At first, we assume that the left side represents the actual observations, while the bottom side of the confusion matrix represents the predicted values. Within the squares we see two numbers: 
- The integer number, tells us how often a class was predicted to be the same class or, respectively, another class
- The percentage number is the number of predictions with respect to the total number of samples in the set

Furthermore, we see the recall (or true-positve rate)

\begin{equation} TPR = \frac{TP}{TP + FN} \end{equation} 

and the precision (or positve predictive value)
\begin{equation} PPV = \frac{TP}{TP + FP} \quad .\end{equation} 

In the equations, the abbreviations stand for:
- $TP$ : True positives - the number of samples that are classified to be present and are indeed present
- $FN$ : False negatives - the number of samples that are classified to be not present but are
- $FP$ : False positives - the number of samples that are classified to be present but aren't
- $TN$ : True negatives (not contained above) - the number of samples that were correctly classified to be missing

<div>
<img src="https://drive.google.com/uc?export=view&id=1Mk0RTIYVMf5iKIm5dfNaEqOolz25oQ1o" width="400"/>

<div text-align=center class="caption">EEG 10-20 System (Source: Wikipedia) </div>
</div>

There are several other metrics that can be calculated from the above values. However, the values given here are sufficient and you can refer to the wikipedia article for more information. 







In [None]:
# generate confusion matrices
# get the targets
y_true = valid_set.get_metadata().target
y_pred = clf.predict(valid_set)

# generating confusion matrix
confusion_mat = confusion_matrix(y_true, y_pred)

# add class labels
# label_dict is class_name : str -> i_class : int
label_dict = valid_set.datasets[0].windows.event_id.items()
# sort the labels by values (values are integer class labels)
labels = list(dict(sorted(list(label_dict), key=lambda kv: kv[1])).keys())

# plot the basic conf. matrix
plot_confusion_matrix(confusion_mat, class_names=labels)

## Our own Deep Learning Pipeline

As we are interested in applying deep learning techniques to EEG data, the way presented above is nice and allows for quick experiments. However, Braindecode is limited and one is likely interested in developing his own approaches to solve problems in this area. Therefore, we have to design our own experiments, models and training procedures. This leads us to the question, of how to create a pipeline that suits our needs the best. 

To address this, we will develop a small pipeline from PyTorch, by using ownly torch facilities. We will create our own model, design a small wrapper class around the training procedure and train the model. This is not new to everyone, but as we are from different backgrounds, this serves as a smooth introduction. I assume, that everyone has looked into the python and torch tutorial. Hence, I omit a detailed explanation of basic facts. If you have trouble understanding, just ask (or write a message on mattermost later :) ). 



### A Deep Neural Network

<div>
<img src="https://drive.google.com/uc?export=view&id=1hwoIpCdR8Xv9C4CAYOhzNPpwr7rdhbao" height="650"/>

<div text-align=center class="caption">Model Structure </div>
</div>

In [None]:
"""
As a first step, we will build a model. We use the 
"""
class MyNetwork(torch.nn.Module):
  def __init__(self, embedding_dim, steps, n_classes=4, name="MyModule"):
    super().__init__()

    # Assume that the input is simply (batch_size, steps, n_channels)
    # We choose some layers randomly ... (nothing special here, no model that was particularly successfull...) 
    
    # projection to another space sounds great

    # We use a residual block

    # Batch Normalization

    # Flatten the convolutional output 

    # Linear prediction layer

    # Choose a nonlinearity

    # We need an output probability distribution 


    print(f"Model {name} successfully initialized")


    # We could also create a module list
    # torch.nn.ModuleList()

  def forward(self, x):
    """
    In the forward method, the calculations of the model are executed
    """

    
    return y

### Torch Datasets

PyTorch is based on the usage of datasets. In this way, the user can design datasets for a very special purposes. This is very flexible. At the same time, the user is responsible for the efficient usage of computing power. However, the flexibility allows a lot of experiments and is therefore quite appropriate for our task. The dataset itself implements (overwrites) two methods:
- `__getitem__()`: Allows indexing and gathering, the behaviour of iterables
- `__len__()`: returns total number of available (input, output) tuples

It is often convenient to wrap the dataset into a dataloader, which takes care of the iteration process and eases data handling. 

More information can be found [here](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) 

In [None]:
# Create dataset
class SampleDataset(torch.utils.data.Dataset):
  def __init__(self, data, transform=None, target_transform=None):
    self.input_data, self.labels = data  # For simplicity, we just pass a tuple, that is data=(input_data, labels)

  def __len__(self):
    return self.input_data.shape[0]  # return number of samples in the dataset

  def __getitem__(self, idx):
    return self.input_data[idx, :].unsqueeze(0), self.labels[idx]   # return data + label for given idx



def get_data(n_samples, n_steps, t_max, min, max, type="sine"):
  if type == "sine":
    fun = torch.sin
  else:
    fun = torch.cos
  delta = min + max*torch.rand(size=(n_samples,), dtype=torch.float32)[..., None]
  x = torch.linspace(0, t_max, n_steps, dtype=torch.float32)[None, ...]
  x = x.repeat(n_samples, 1)
  data_phase, labels_phase = fun(x + delta), torch.zeros(size=(x.shape[0],),  dtype=torch.long)
  data_shift, labels_shift = fun(x) + delta, torch.ones(size=(x.shape[0],), dtype=torch.long)
  data_freq, labels_freq = fun(delta*x), 2*torch.ones(size=(x.shape[0],), dtype=torch.long)

  input_data = torch.concat([data_phase, data_shift, data_freq], dim=0)
  labels = torch.concat([labels_phase, labels_shift, labels_freq], dim=0)
  return input_data, labels

# Create dataset
n_samples = 10000
n_steps = 200
t_max = 10
min = -5.
max = 5.

data, labels = get_data(n_samples, n_steps, t_max, min, max)


# Inspect data
plt.figure()
plt.plot(data[10, :], label="phase")
plt.plot(data[7020, :], label="shift")
plt.plot(data[13100, :], label="freq")
plt.legend()
plt.show()

dataset = SampleDataset((data, labels))
print(dataset[0][0].shape)


# Check if data suits network
model = MyNetwork(embedding_dim=100, steps=n_steps, n_classes=3, name="SineClassifier")
x, y = data[0:4, :], labels[0:4]
x = x.unsqueeze(dim=1)
print(x.shape)
out = model(x)
print(out)

### Wrapping everything together

We will do, what braindecode did before. We **implement a small wrapper class** to enable fast training within scripts. In this way, we can create a class that provides all the methods we need for our task, but can reuse it easily in several scripts or if we are interested in hyperparameter tuning. This is not necessary, you can also write methods and just call them. For me, I found it very useful to have a class that enables this tensorflow logic. Sure, we could also stick to braindecode. In this way, however, we can easily modify the wrapper. 

In [None]:
class TrainWrapper:
  seed = 725
  def __init__(self, dataset, model, optimizer, loss, batch_size, scheduler=None):
    # Split data into train, val set
    train_length = int(0.85 * len(dataset))
    val_length = len(dataset) - train_length
    train, val = torch.utils.data.random_split(dataset=dataset, lengths=[train_length, val_length], generator=torch.Generator().manual_seed(self.seed))

    # For iterable datasets, torch provides data loaders. With them, you can tune the GPU usage by increasing the batch size
    # and increasing the number of workers. Below you will see, that it behaves somewhat like an python iterator
    self.train_loader = torch.utils.data.DataLoader(dataset=train, batch_size=batch_size, num_workers=2, shuffle=True)
    self.val_loader = torch.utils.data.DataLoader(dataset=val, batch_size=batch_size, num_workers=2)

    # Save optimizer, loss function and scheduler (and a metric if you want)
    self.optimizer = optimizer
    self.loss_fn = loss
    self.scheduler = scheduler
    
    # If the GPU is available, we move the model to cuda and store it to a variable
    self.model = model.to("cuda") if torch.cuda.is_available() else model
    self.print_model()

    # Lists where the loss values are saved
    self.train_loss = []
    self.val_loss = []

  def val_set():
    return self.val_loader.dataset.input_data, self.val_loader.dataset.input_data
  
  def count_parameters(self):
    return sum(p.numel() for p in self.model.parameters() if p.requires_grad)

  def print_model(self):
    print(f"Number of model parameters: {self.count_parameters()}")
    print(self.model)

  def train_step(self):
    """
    The train step: here we can 
    """
    loss_val = 0
    self.model.train(True)
    for i, (x, y) in enumerate(self.train_loader):
      if torch.cuda.is_available():
        x, y = x.to("cuda"), y.to("cuda") 
      y_pred = model(x)
      batch_loss = self.loss_fn(y_pred, y)

      loss_val += batch_loss.item()

      self.optimizer.zero_grad()
      batch_loss.backward()
      self.optimizer.step()

    self.train_loss.append(loss_val / (i+1))

  def val_step(self):
    loss_val = 0
    self.model.train(False)
    for i, (x, y) in enumerate(self.val_loader):
      if torch.cuda.is_available():
        x, y = x.to("cuda"), y.to("cuda") 
      y_pred = model(x)
      batch_loss = self.loss_fn(y_pred, y)

      loss_val += batch_loss.item()

    self.val_loss.append(loss_val / (i+1))

  def fit(self, n_epochs=10):
    print("Epoch\tTrain Loss\tVal. Loss\tLearning Rate\n"
          "-----\t----------\t---------\t-------------")
    for epoch in range(n_epochs):
      self.train_step()
      self.val_step()
      self.scheduler.step()

      print(f"{epoch:>5d}\t{self.train_loss[-1]:>2.7f}\t{self.val_loss[-1]:>.7f}\t{self.scheduler.get_last_lr()}")


In [None]:
loss_fn = torch.nn.CrossEntropyLoss()
optim = torch.optim.SGD(model.parameters(), lr=0.001)
scheduler = torch.optim.lr_scheduler.StepLR(optim, step_size=4, gamma=0.25)
wrapper = TrainWrapper(dataset, model, optimizer=optim, loss=loss_fn, batch_size=64, scheduler=scheduler)
wrapper.fit(n_epochs=20)

model = wrapper.model

plt.figure()
plt.plot(wrapper.train_loss, label="Training Loss")
plt.plot(wrapper.val_loss, label="Validation Loss")
plt.grid(True)
plt.title("Loss Plot")
plt.show()

### Evaluate the Network

In [None]:
# We can save our model to the drive (you should do this for evaluation reasons)
# Below we save the state dict - this are the layer names plus their state (weights, bias ...)
# torch.save(model.state_dict(), f="/content/drive/MyDrive/AINS/models/MyModel/my_model.pth")

# Now we can load the state dict
state_dict = torch.load("/content/drive/MyDrive/AINS/models/MyModel/my_model.pth")

# The state dict are just the parameters - we need to recreate the model with random 
# weights
loaded_model = MyNetwork(embedding_dim=100, steps=n_steps, n_classes=3, name="SineClassifier")
loaded_model.load_state_dict(state_dict)
loaded_model.eval()   # Always call evaluate if you use e.g. dropout, otherwise you will get "wrong" results

# Create test data
n_samples = 50
n_steps = 200
t_max = 10
min = -8.
max = 8.

test_data = get_data(n_samples, n_steps, t_max, min, max, type="cos")
test_input = test_data[0]
test_labels = test_data[1]

# We pass the whole training set in one run
pred = loaded_model(test_input.unsqueeze(1))

# Generating the confusion matrix again
confusion_mat = confusion_matrix(test_labels, torch.argmax(pred.detach(), dim=-1))

# add class labels
label_dict = ["Phase", "Shift", "Scale"]

# plot the basic conf. matrix
print("Confusion Matrix\n"
      "----------------")
plot_confusion_matrix(confusion_mat, class_names=label_dict)


# Inspect intermediate results
embedding = loaded_model.embedding_layer(test_input[0:126:25].unsqueeze(1))
convolved = loaded_model.convolution_0(embedding)
convolved = loaded_model.convolution_1(convolved)
convolved = loaded_model.convolution_2(convolved)
convolved = loaded_model.convolution_3(convolved)
print(convolved.shape)

convolved = convolved.detach()
"""
fig, axs = plt.subplots(ncols=4, nrows=8, figsize=(15, 30))
for i in range(8):
  for j in range(4):
    axs[i, j].plot(convolved[0, i*4 + j, :], label="Channel "+ str(i*4 + j))
    axs[i, j].grid()
plt.grid()
plt.show   
"""
print("\nLayer Outputs\n"
      "----------------")
fig, axs = plt.subplots(ncols=3, nrows=1, figsize=(15, 3))
axs[0].plot(convolved[0, 0], 'b-.', label="Phase")
axs[0].plot(convolved[1, 0], 'b--', label="Phase")
axs[0].grid()
axs[0].legend()

axs[1].plot(convolved[2, 0], 'k-.', label="Shift")
axs[1].plot(convolved[3, 0], 'k--', label="Shift")
axs[1].grid()
axs[1].legend()

axs[2].plot(convolved[4, 0], 'r-.', label="Scale")
axs[2].plot(convolved[5, 0], 'r--', label="Scale")
axs[2].grid()
axs[2].legend()

plt.show()

## Bonus - Latent Spaces and UMAP

We have a very comprehensive look into the UMAP. I just want to show, that this is working as it is important if you are working on the SSL project, where you want to create meaningful latent spaces. The visualizations in the Banville paper from project 2 are very interesting and also created with UMAP. 

UMAP is a dimension reduction algorithm, that embedds a high-dimensional data structure into two dimensions - everything based on topology and fuzzy theory. If you are interested, you can start with the description on the [UMAP page](https://umap-learn.readthedocs.io/en/latest/how_umap_works.html). We will not deal with that here (I am still trying to figure out how this works).


In [None]:
!pip install git+https://github.com/lmcinnes/umap.git

Collecting git+https://github.com/lmcinnes/umap.git
  Cloning https://github.com/lmcinnes/umap.git to /tmp/pip-req-build-6e71s45e
  Running command git clone -q https://github.com/lmcinnes/umap.git /tmp/pip-req-build-6e71s45e
Collecting pynndescent>=0.5
  Downloading pynndescent-0.5.6.tar.gz (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 8.1 MB/s 
Building wheels for collected packages: umap-learn, pynndescent
  Building wheel for umap-learn (setup.py) ... [?25l[?25hdone
  Created wheel for umap-learn: filename=umap_learn-0.5.2-py3-none-any.whl size=82830 sha256=94ed80a72240b9353141afe7a545e33669cb6dbb47dec0aa4ce985ef84f10b6c
  Stored in directory: /tmp/pip-ephem-wheel-cache-czznzmz7/wheels/8e/8f/23/7b32f2bbe743ffefedd425e60aa259e296c823ab6cc5d14e5b
  Building wheel for pynndescent (setup.py) ... [?25l[?25hdone
  Created wheel for pynndescent: filename=pynndescent-0.5.6-py3-none-any.whl size=53943 sha256=1bbfce143a130753ce6ebfd003da56d16f5459711caaaaf06277b96624dbc789
 

In [None]:
import umap
import numpy as np

In [None]:
reducer = umap.UMAP(random_state=42)
reducer.fit(test_input)
embedding = reducer.transform(test_input)



In [None]:
plt.scatter(embedding[:, 0], embedding[:, 1], c=test_labels, cmap='viridis', s=5)
plt.gca().set_aspect('equal', 'datalim')
plt.colorbar(boundaries=np.arange(4)).set_ticks(np.arange(1, 4))
plt.title('UMAP projection of the sine waves dataset', fontsize=16);

In [None]:
nn_embedding = loaded_model.embedding_layer(test_input)
nn_embedding = nn_embedding.detach()

reducer = umap.UMAP(random_state=42)
reducer.fit(nn_embedding)
umap_embedding = reducer.transform(nn_embedding)

plt.scatter(umap_embedding[:, 0], umap_embedding[:, 1], c=test_labels, cmap='viridis', s=5)
plt.gca().set_aspect('equal', 'datalim')
plt.colorbar(boundaries=np.arange(4)).set_ticks(np.arange(1, 4))
plt.title('UMAP projection of the sine waves dataset', fontsize=16);

In [None]:
embedding = loaded_model.embedding_layer(test_input.unsqueeze(1))
convolved = loaded_model.convolution_0(embedding)
convolved = loaded_model.convolution_1(convolved)
convolved = loaded_model.convolution_2(convolved)
convolved = loaded_model.convolution_3(convolved)
convolved = convolved.detach().squeeze(1)

reducer = umap.UMAP(random_state=42)
reducer.fit(convolved)
umap_embedding = reducer.transform(convolved)

plt.scatter(umap_embedding[:, 0], umap_embedding[:, 1], c=test_labels, cmap='viridis', s=5)
plt.gca().set_aspect('equal', 'datalim')
plt.colorbar(boundaries=np.arange(4)).set_ticks(np.arange(1, 4))
plt.title('UMAP projection of the sine waves dataset', fontsize=16);

In [None]:
# Mapping to non-euclidean distances
sphere_mapper = umap.UMAP(output_metric='haversine', random_state=42).fit(convolved)
x = np.sin(sphere_mapper.embedding_[:, 0]) * np.cos(sphere_mapper.embedding_[:, 1])
y = np.sin(sphere_mapper.embedding_[:, 0]) * np.sin(sphere_mapper.embedding_[:, 1])
z = np.cos(sphere_mapper.embedding_[:, 0])

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z, c=test_labels, cmap='Spectral')
