In [None]:
%matplotlib widget
import numpy as np
import matplotlib.pyplot as plt
import flammkuchen as fl
from pathlib import Path

In [None]:
fps = 1.5
dt_imaging = 1/fps
data_root = Path(r"C:\Users\vilim\analysis\lsmlsda_data\whole_brain")
traces = fl.load(str(data_root / "traces_better_deconvolved.h5"))

In [None]:
t_imaging = traces.shape[1]

## Plotting the traces

Normalize the data (so that each trace has a mean 0 and variance 1) and plot all traces together as a heatmap.

# Regression

In [None]:
from scipy import signal
from scipy.interpolate import interp1d

In [None]:
# In this part we will correlate the individual traces (original traces, not the ones averaged over trials) with sensory and motor regressors.
# To do so, fist load the behavioural log and stimulus log
stimulus_log = fl.load(data_root / "stimulus_log.h5")
behavior_log = fl.load(data_root / "behavior_log.h5")

## Creating the regressors
### Motor regressor
The motor regressor we sould like to have will be a general measure of the fish swimming power. Such regressor can be based on the standart deviation (SD) of the tail angle during the experiment. 
The behaviour of the fish was recorded and saved in the file "behavioural_log". In this DataFrame you will see the diffeent angles of the segments of the fish tail, as well as the variable "tail_sum". The motor regressor should be a moving SD of tail_sum. 

In [None]:
# Creating the motor regressor 

tail_sum = behavior_log['tail_sum'].values

dt_beh = np.mean(np.diff(behavior_log.t[100:200]))
vig_win = 2/1.5
n_vig = int(vig_win/dt_beh)
vigor = interp1d(behavior_log.t, behavior_log.tail_sum.rolling(n_vig,  min_periods=2).std(),
                 fill_value=0.0, bounds_error=False)(t_imaging)

### Sensory regressors
Creating two regressors for the stimulus (stimulus speed).
From the stimulu_log, get the variable "gain_kag_cl1D_vel". This is the velocity of the moving gratings. We will use this trace to create two regressors - one for positive velocity and one for negative velocity. Use the interpolation me

First, we resample the stimulation data so that it is equaly spaced in time, at 200 times the imaging frame rate (another method is the one demonstrated above for vigor)

In [None]:
int_fact = 200
n_t_imaging = traces.shape[1]
t_imaging_int = np.arange(n_t_imaging*int_fact)*dt_imaging/int_fact

vel_int = interp1d(stimulus_log.t, stimulus_log["gain_lag_cl1D_vel"], bounds_error=False, fill_value=0)(t_imaging_int)

velocity = signal.decimate(vel_int, int_fact, ftype="fir")

In [None]:
## Create the regressors

### Correlating the traces with the regressors
At this point you will correlate each calcium trace with the three regressors.

In [None]:
# Downsample the regressors

# Correlate traces with the regressors
# Create a scatter plot of the correlation values
# put the regression results in a dataframe for 

### Plot the best fitted neuron for each of the regressors

In [None]:
# 

## Average trials

Create trial-averaged traces. Each trial is 180 seconds. This will show a cleaner stimulus-related response

In [None]:
n_trials = 9
trial_duration = 180.0

# Dimensionality reduction and clustering

Extract principal components of the average response.components?

In [None]:
from sklearn.decomposition import PCA
import pandas as pd

In [None]:
# Run the PCA

In [None]:
# Plot the first 3 PCs

Plot the variance explained by each component and try to establish how many components you need to explain everything that is not noise. Extra credit: do cross validated PCA (fit the PCs on average traces of some trials, and check how many components you need to explain other trials)

Can you interpret the principal components in terms of stimulus?

## PC trajectories

Plot the neural activity of the whole brain as a phase-space plot (extra credit: encode time or stimulus value in the color)


## Clustering

Use K means clustering to classify neurons by principal component loading (using all components that are not noise)

In [None]:
from sklearn.cluster import KMeans

Plot the neurons in the space of principal component loading coefficients (for PC1 and PC2) and color them by cluster

Are the clusters showing discrete response classes? What are the assumptions of K-Means and does this datasat satisfy it?

## Clusters in anatomical space

(in the readme now there is a link to the coords file)

In [None]:
coords = fl.load(str(data_root / "coords.h5"))

In [None]:
dx = 0.6
dy = 0.6
dz = 7.0

In [None]:
fig, ax = plt.subplots(2, 2)
ax[0,0].scatter(coords[:,1]*dx, coords[:,2]*dy, s=0.1)
ax[0,0].set_aspect(1)
ax[0,0].set_xticklabels('')

ax[1,0].scatter(coords[:,1]*dx, coords[:,0]*dz, s=0.1)
ax[1,0].set_aspect(1)

ax[0,1].scatter(coords[:,0]*dz, coords[:,2]*dy, s=0.1)
ax[0,1].set_aspect(1)
ax[0,1].set_yticklabels('')
ax[1,1].axis("off")

Now, color the cells according to principal component loading or cluster assignement

# Decode the velcity from the traces

Split the velocity and the traces into a traning and test set. Choose carefully so that most conditions are well represented

In [None]:
traces_test = 
traces_train = 

vel_test = 
vel_train = 

Use methods from scikit-learn, starting with sklearn.linear_model.LinearRegression (or write your own linear regression!), use the fit and predict methods to decode velocity

E.g. for a linear model:
    $$v(t) = \Sigma_{i}^{n\_neurons}w_i a_i(t)$$

Plot the decoded velocity vs the real velocity, in time and as a scatter plot. Which regions of the stimulus space are decoded best?

## Extra credit 
* try to determine how many cells you need to decode the velocity. Which cells are the most important ones, if there are such?
* do nonlinear decoding methods (e.g. neural networks, also available with the same interface in scikit-learn) improve the decoding?
* try to decode behavior