(understanding-pathology)=
# Understanding Pathology Decoding With Invertible Networks

Move somewhere?
We chose pathology decoding as  (i)n the question which learned features the networks uses to diagnose pathology seemed especially fascinating to us (ii) the TUH dataset is especially large.

We then present the results of invertible networks trained to decode pathology from EEG, showing competitive results with regular deep convolutional networks. Our visualizations show the network uses slowing, i.e. increased amplitudes in the lower frequencies as well as spikes/burts as markers that indicate pathology and a strong stable alpha rhythm, especially on the posterior electrodes as markers for normal EEG.

After our initial work on pathology decoding, we wanted to gain a deeper understanding of the features deep networks learn to distinguish healthy from pathological recordings. For that, we used invertible networks as generative classifiers since they offer more ways to visualize their learned prediction function in input space. We visualize prototypes of the two classes as well as individual electrode signals predictive of a certain class independent of the signals at other electrodes. These visualizations revealed both well-known features as well as surprising patterns in the very low frequencies. To gain an even better understanding, we distilled the invertible network's knowledge into a very small network that is interpretable by design. These visualizations showed regular patterns in the alpha and beta range associated with healthy recordings and a diverse set of more irregular waveforms associated with pathology. All work presented here is novel unpublished work performed in the context of this thesis.

## Dataset, Training Details and Performance of Invertible Network

```{table} Accuracy of Invertible Network in comparison with accuracies of regular ConvNets.
:name: table-tuh-invertible-accuracy

|Deep|Shallow|TCN|EEGNet|EEG-InvNet|
|-|-|-|-|-|
|84.6|84.1|86.2|83.4|85.5|

```

We apply our EEG-InvNet to pathology decoding on the same TUH dataset as in {ref}`pathology`. We use only 2 minutes of each recording at 64 Hz, and input 2 seconds as one example to the invertible network. This reduced dataset allows fast experimentation while still yielding good decoding performance. We used AdamW as our optimizer and cosine annealing with restarts every 25 epochs as our learning rate schedule.  We emphasize these details were not heavily optimized for maximum decoding performance, but rather chosen to obtain a robustly performing model worth investigating more deeply. Results in {numref}`table-tuh-invertible-accuracy` show that our EEG-InvNet compares similar than regular ConvNets, even better in some cases.

## Class Prototypes

![title](images/net-disc-prototypes.png)

```{figure} images/net-disc-prototypes.png
---
name: disc-invnet-prototypes
---
Learned Class Prototypes from Invertible Network. Obtained by inverting learned means of class-conditional gaussian distributions from latent space to input space through the invertible network trained for pathology decoding.

```

Class prototypes reveal known oscillatory features and surprisingly hint at the use of very-low-frequency information by the invertible network. We inverted the learned latent means of the healthy and the pathological class distributions back to the input space to visualize the most likely healthy and most likely pathological examples under the learned distribution. We differences in the alpha rhythm like a stronger alpha rhythm at O1 in the healthy example. We also see further differences with a variety of different oscillatory patterns present for both classes. Surprisingly, there are also differences in the very low frequencies like substantially different mean values for FP1 and FP2 for the two class prototypes, which we will further investigate later. One challenge of this visualization is that one has to look at each prototype as one complete example and cannot interpret signals at individiual electrodes independently. This is what we tackle in our next visualization.

## Per-Channel Prototypes

![title](images/marginal-chan-6.png)

```{figure} images/marginal-chan-6.png
---
name: marginal-chan
---
Learned Per-Channel Prototypes from Invertible Network. Each channels' input is optimized independently to increase the invertible networks prediction for the respective class. During that optimization, signals for the other non-optimized channels are sampled from the training data.  Color indicates average softmax prediction over 10000 samples for the other channels. Very prominent slowing patterns appear for the pathological class at mjultiple electrodes.

```

The per-channel prototypes reveal interesting learned features for the two classes. The pathological prototypes show strong low-frequency activity, for example at T3 and T4, consistent with slowing as a biomarker for pathology. The healthy signal shows alpha activity, for example at C4 and T6.  Besides these patterns, a lot of other interesting patterns may be interesting to further investigate. One of them, the differences in the very low frequencies will be further explored below. Note that it was not possible to synthesize a signal that is clearly indicative of one class independent of the other electrodes for all electrodes. This is to be expected if the network may for example use the degree of synchrony between signals at different  electrodes as a feature. 

### EEG-CosNet Visualizations

```{table} Accuracy of small interpretable network on invertible network predictions and original labels.
:name: table-tuh-cos-net-accuracy

||EEG-InvNet Predictions|Original Labels|
|-|-|-|
|Train|92.5|89.1|
|Test|88.8|82.6|

```

![cos-pattern](images/cos-sim-net-pattern-with-hspace.png)

```{figure} images/cos-sim-net-pattern-with-hspace.png
---
name: cos-sim-net-pattern-fig
---
Visualization of small interpretable EEG-CosNet trained to mimic the EEG-InvNet. Scalp Plots are spatial filter weights transformed to patterns, signals below each scalp plot show corresponding convolutional filter. Signal colors represent the weights of the linear classification layer, transformed to patterns (see TODO for explanation). Plots are sorted by these colors. Note that polarities of the scalp plots and temporal waveforms are arbitrary as absolute cosine similarities are computed on the spatially filtered and temporally convolved signals. 

```

Results for the EEG-CosNet show that a large fraction of the predictions of the invertible network can be predicted from a relatively small number of mostly neurophysiologically plausible spatio-temporal patterns. EEG-CosNet predicts 88.8% of the recordings in the same way as the EEG-InvNet and retains a test set label accuracy of 82.6% (see {numref}`table-tuh-cos-net-accuracy`. This shows that from just 64 spatiotemporal features, the EEG-CosNet is able to predict the vast majority of the EEG-InvNet predictions. Still, the remaining gap indicates that the EEG-InvNet has learned some features that the EEG-CosNet cannot represent.

Visualizations in {numref}`cos-sim-net-pattern-fig` show more regular waveforms in the alpha and beta-frequency ranges with higher association for the healthy class and more waveforms in other frequency ranges and less regular waveforms with higher association for the pathological class. As examples for the healthy class, plots 1 and 3 show oscillations with a strong alpha component and plots 15-17 show oscillations with strong beta components. For the pathological class, we see slower oscillations, e.g., in plots 53 and 60, and also more irregular waveforms in, e.g., plots 49 and 52.

## Investigation of Very Low Frequencies

One surprising observation from the visualizations are differences in the very low frequencies (<0.5 Hz) between the two class prototypes. For example, the very different mean values in the class prototypes for FP1 and FP2 suggest very low frequency information differs between the two classes on those electrodes. These kinds of differences motivated us to more deeply investigate in how far  very low frequency information is predictive of pathology.

```{table} Accuracy on data lowpassed below 0.5 H.
:name: table-tuh-low-freq-accuracy

|EEG-InvNet|EEG-CosNet|Fourier-GMM|
|-|-|-|
|75.4|75.0|75.4|

```

For this, we first trained an invertible network on data lowpassed to be below 0.5 Hz via first removing all Fourier components above 0.5 Hz for both each recording and also each 2-second input window for the network. This retain 75.4% accuracy, indicating even these very low frequencies remain fairly informative about the pathologicality of the recording. We additionally trained the EEG_CosNet with a temporal filter spanning the entire input window length of 2 seconds and found it to retain 75% test accuracy. Finally, we also directly trained a 8-component gaussian mixture model Fourier-GMM in Fourier space as only 3 dimensions per electrode remain (summed value of the input window as well as real and imaginary value of the 0.5-Hz Fourier component). Each of the 8 mixture components also had learnable class weights. The Fourier-GMM also retains 75.4% test accuracy.

![cos-pattern](images/net-lowfreq-prototypes.png)

```{figure} images/net-lowfreq-prototypes.png
---
name: net-low-freq-prototypes-fig
---
Class prototypes for the EEG-InvNet trained on data lowpassed to be below 0.5 Hz.

```

![cos-pattern](images/marginal-chan-low-freq.png)

```{figure} images/marginal-chan-low-freq.png
---
name: marginal-chan-low-freq-fig
---
Per-Electrode Prototypes for EEG-InvNet trained on data lowpassed below 0.5 Hz. Note strongly predictive signals at T3,T4,T6.

```

![cos-pattern](images/cos-sim-net-low-freq-pattern-with-hspace.png)

```{figure} images/cos-sim-net-low-freq-pattern-with-hspace.png
---
name: cos-sim-net-low-freq-pattern-fig
---
Spatiotemporal patterns for EEG-CosNet trained on lowpassed data below 0.5 Hz.

```

::::{subfigure} ABCD|EFGH
:gap: 0px
:name: low-freq-input-space-prototypes-fig
:class-grid: outline
:subcaptions: below

:::{image} images/low-freq-prototypes-0.png
:::

:::{image} images/low-freq-prototypes-1.png
:::

:::{image} images/low-freq-prototypes-2.png
:::

:::{image} images/low-freq-prototypes-3.png
:::

:::{image} images/low-freq-prototypes-4.png
:::

:::{image} images/low-freq-prototypes-5.png
:::

:::{image} images/low-freq-prototypes-6.png
:::

:::{image} images/low-freq-prototypes-7.png
:::


Means of the Fourier-GMM  mixture components shown after inversion into input space.
::::


![cos-pattern](images/low-freq-gmm-prototypes-scaled-per-freq-with-class-color-and-bar.png)

```{figure} images/low-freq-gmm-prototypes-scaled-per-freq-with-class-color-and-bar.png
---
name: fourier-gmm-low-freq-fig
---
Means of the Fourier-GMM mixture components in Fourier space. Scal plots for 0-Hz bin, real and imaginary values of 0.5-Hz bin. Components sorted by pathological class weight, also shown as colored text in top right of each component. Colormaps scaled per plot. Note strong frontal components for mixture components associated with healthy class.

```

Overall, class prototypes show A1 and A1 play a role, per-electrode prototypes show temporal electrodes are relevant, EEG-CosNet and GMM show importance of frontal electrodes. All together, one can hopefully understand.