# Pipline for waveform analysis A-Z.
### Clustering neural signals associated to cytokine activity.
#### Gabriel Andersson 
---

### Main imports:

In [14]:
import sys
import warnings
sys.path.insert(1,'../')

import numpy as np
import matplotlib.pyplot as plt

First pre-process steps of raw plx-file avalible at < qqq. > is done in MATLAB. The resulting files **waveforms** and **timestamps** are saved in the corresponing paths below.

---
## Loading **waveforms** and **timestamps** : 

In [2]:
# HERE THE FILES IS TO BE LOADED
from scipy.io import loadmat
path_to_wf = '../../matlab_files/gg_waveforms-R10_IL1B_TNF_03.mat'
path_to_ts = '../../matlab_files/gg_timestamps.mat'


waveforms = loadmat(path_to_wf)
#print(f' keys of matlab file: {waveforms.keys()}')
waveforms = waveforms['waveforms']
timestamps = loadmat(path_to_ts)['gg_timestamps']
print('MATLAB files loaded succesfully...')
print()
print(f'Shape of waveforms: {waveforms.shape}.')
print()
print(f'Shape of timestamps: {timestamps.shape}.')
print()
assert waveforms.shape[0] == timestamps.shape[0], 'Missmatch of waveforms and timestamps shape.'

MATLAB files loaded succesfully...

Shape of waveforms: (136259, 141).

Shape of timestamps: (136259, 1).



---
## Preprocess of the loaded files. 
The preprocess funtions are avalible in **preprocess_wf.py**

In [19]:
# Preprocess waveforms
#---------------------
import preprocess_wf as preprocess 
# ****STEPS IN PROCESS****
# - Standardise
wf_std = preprocess.standardise_wf(waveforms)
# - Some translation invariant representation? 

---
## Label the waveforms with their respective change in event rate $\Delta EV$ at time of injections
### Alternativly use probability given considered waveform as mean. However similar results..
The process goes like: 
- Use tandardise waveforms
- Calculate correlation $\rho_{ij}$ between each waveform
- Consider waveforms as "similar" if $\rho_{ij} > THRES$, for some threshold ($THRES=0.6$). Label these into a cluster with label 1, otherwise 0.
- Calculate event rate for the 1-cluster.
- Caclate the change in event rate after injections $\{ \Delta EV_i\}_{i=1}^2$, and consider it sufficient if $\Delta EV_i > \sigma_{ev_i}$, where $\sigma_{ev_i}$ is the standard deviation during the "baseline" period. That is, the time before each injection. 

The nessasary funtions are avalible in **event_rates.py** and **wf_similarity_measure.py**

Experementing with the hyperparameter "THRES" in the notebook "tune_ev_label_hyp_params" lead me choose $THRES=0.6$. This as a result of considering both size of similarity-cluster together with how similar the waveforms actually are within cluster.

- qqq: still no translation invariant measure.

In [24]:
import time
from event_rate_first import *
from wf_similarity_measures import *
# OBS assumes existance of the standardised waveforms : wf_std

# Calculate event_rates 
threshold=0.6
sub_steps = 1000

ev_labels = np.empty((3,waveforms.shape[0]))

ii = 0

start_t = time.time()
prev_substep = 0
for sub_step in np.arange(0,waveforms.shape[0],sub_steps):
    i_range = np.arange(prev_substep,sub_step)
    correlations = wf_correlation(i_range,wf_std)
    for corr_vec in correlations.T:
        bool_labels = label_from_corr(corr_vec,threshold=threshold,return_boolean=True)
        event_rates, real_clusters = get_event_rates(timestamps[:,0],bool_labels,bin_width=1,consider_only=1)
        delta_ev, ev_stats = delta_ev_measure(event_rates)
        #ev_labels = ev_label(delta_ev,ev_stats,n_std=1)
        ev_labels[:,ii] = ev_label(delta_ev,ev_stats,n_std=1)[:,0]
        ii +=1
    prev_substep = sub_step
    if ii%2000==0:
        print(f'Time for {ii} labels : {time.time()-start_t}')

Time for 0 labels : 0.5025441646575928
Time for 2000 labels : 8.814529180526733
Time for 4000 labels : 16.14594793319702
Time for 6000 labels : 23.602784156799316


KeyboardInterrupt: 

---
## Build and train conditional VAE using labeled data $\mathcal{T} = \{ (x_i, \Delta EV_i) \}_{i=1}^N$

In [None]:
# Build and train CVAE -- Probably best to train using terminal and just load saved weights...
# yet to be done...

---
## Perform gradiant decent on $I(x|\Delta EV > 0) = - \log p(x|\Delta EV>0)$

The result will be the most probable waveforms given that the event rate increases after injection, or high probability data-points (hpdp). Denote : $\{ \hat{x}_j \}_{j=1}^m$


In [None]:
# Perform GD on conditional pdf..

---
## Label the hpdp in some way...

In [None]:
# Final labeling

---
# Calculate the event rate of the final result and see if we can infer something or not.