# Background subspace inhibition with PCA or ICA neurons
Comparing different projection learning algorithms for olfactory habituation. In separate notebooks, IBCM is studied. Here, online versions of PCA and ICA are used at the inhibitory layer to suppress the activation of projection neurons (second layer) in response to fluctuating background odor mixtures. 

Synaptic weights for inhbition, from the inhibitory neurons to the projection layer, are learnt to minimize the squared norm of the projection neuron (PN) layer. In this way, the network of inhibitory neurons is like an autoencoder applying feedforward inhibition. 

![test](figures/feedforward_inhibitory_network.png)

## Biologically plausible online PCA (biopca)
Model proposed in Minden, Pehlevan, and Chklovskii, 2018. 
I implemented their model and inserted it into my inhibitory network simulations in the module ``modelfcts.biopca``. More precisely, I will be using Algorithm 1, inverse-free Principal Subspace Projection (ifPSP). 

## Lateral inhibition with the PCA neurons
Use the PCA-learning neurons as lateral inhibitory neurons just like in the IBCM case. See notes in the relevant Jupyter notebook (copied below). 

There are weights $\vec{w}_i$ going from inhibitory neuron $i$ to projection neurons. We can store them in the matrix $W$, where each column is a vector $\vec{w}_i$. The total inhibition received by projection neurons is therefore $\sum_{j} \bar{c}_j \vec{w}_j = W \vec{\bar{c}}$; if these neurons have an element-wise activation function $R$, typically a ReLU function, they take the value

$$ \vec{s} = R\left(\vec{x}_{in}(t) - W \vec{\bar{c}}  \right) $$

By calculating the gradient of the cost function

$$ C(\vec{w}_j) = \frac12 \mathbb{E}\left[\vec{s}^T \vec{s} \right] + \frac{\beta}{2\alpha} \mathbb{E}\left[ \vec{w}_j^T \vec{w}_j \right] $$

we find that the inhibition weights leaving neuron $j$ should evolve according to

$$ \frac{d \vec{w}_j}{dt} = -\alpha \nabla_{\vec{w}_j} C(\vec{w}_j) = \alpha \bar{c}_j \vec{s} R'\left(\vec{s}\right) -  \beta \vec{w}_j $$

after assuming that $\alpha$ is small enough to take instantaneous values but still see $\vec{w}$ converge on average to the optimum of the cost function. $R'\left(\vec{s}\right)$ is the element-wise derivative of the activation function; for instance, it is $1$ if $R$ is the identity function, or a Heaviside function if $R$ is a ReLU. In those two cases, it can simply be omitted -- in the latter case, the ReLU applied on $\vec{s}$ itself ensures the term is zero if the difference $\vec{x} - W\vec{\bar{c}}$ is negative. 
Here, I will use a ReLU, so

$$ \frac{d\vec{w}_j}{dt} = \alpha \bar{c}_j \vec{s} -  \beta \vec{w}_j$$

In [None]:
import numpy as np
import scipy as sp
from scipy import stats
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
from time import perf_counter

from modelfcts.biopca import integrate_inhib_ifpsp_network_skip, relu_inplace

# Offline PCA for comparison
from utils.statistics import principal_component_analysis, seed_from_gen

## Input process
To have multiple input components fluctuating realistically, we write the input as
$$ \vec{x}(t) = \sum_{\alpha=1}^K \nu_\alpha \vec{x}_\alpha $$
where the $\nu_\alpha$ are random variables with some correlation, to mimick the turbulent flow that carries all odorants together. Ideally, they would follow the distributions derived in (Celani, Villermaux and Vergassola 2014), but those are a bit tricky to simulate.

### General case of the Ornstein-Uhlenbeck process
The multivariate Langevin equation for the Ornstein-Uhlenbeck process is:

$$ d\vec{x} = -A \vec{x}(t) dt + B dW(t) $$

where $\frac{dW}{dt} = \vec{\eta}(t)$, a vector of gaussian white noise (independent components), $A$ and $B$ are matrices. Assume the matrix $A$ is normal and can be diagonalized as $A = U D U^\dagger$, where $D = \mathrm{diag}(\lambda^1, ..., \lambda^n)$. For a deterministic initial condition $\vec{x}(t_0) = \vec{x}_0$, the general solution is that $\vec{x}(t)$ follows a multivariate normal distribution, with mean and variance given by

$$ \langle \vec{x}(t) \rangle = U\mathrm{e}^{-D(t-t_0)}U^{\dagger} \vec{x}_0 $$
$$ \langle \vec{x}(t) \vec{x}(t)^T \rangle = U J(t, t_0) U^\dagger $$

where the components of $J$ are 

$$ J^{ij}(t, t_0) = \left(\frac{U^\dagger B B^T U}{\lambda^i + \lambda^j} \right)^{ij} \left(1 - e^{-(\lambda^i + \lambda^j)(t - t_0)}  \right) $$


The stationary distribution of $\vec{x}$ is thus

$$ \vec{x}^* \sim \mathcal{N} \left(\vec{0}, \left(\frac{B B^T}{\lambda^k + \lambda^l} \right)^{kl} \right) \,\, .$$

For a non-zero mean, simulate the zero-mean process and add the average afterwards, it's simpler. 

For details, see Gardiner, chapter 4.2.6. 

### Exact numerical simulation, general case
To simulate a realization of this process exactly, we use a trick suggested by Gillespie in the univariate case (which only works for the Ornstein-Uhlenbeck process because it's linear and gaussian). We iteratively take $\vec{x}(t)$ as the initial condition of the evolution up to $\vec{x}(t + \Delta t)$, the distribution of which is

$$ \vec{x}(t + \Delta t) \sim \mathcal{N}\left( U e^{-D \Delta t}U^\dagger \vec{x}(t) , U J(t + \Delta t, t) U^\dagger \right) $$

which can be rewritten using the following property of multivariate normal distributions: if $\vec{n} \sim \mathcal{N}(\vec{0}, \mathbb{1})$, then $\vec{x} = \vec{\mu} + \Psi \vec{n} \sim \mathcal{N}(\vec{\mu}, \Psi \Psi^T)$ ($\Psi$ is the Cholesky decomposition of the desired covariance matrix). This property is easily demonstrated by computing $\langle \vec{x} \rangle$ and $\langle \vec{x} \vec{x}^T \rangle$ and using the linearity of multivariate normal distributions. For our update rule, this gives

$$ \vec{x}(t + \Delta t) = U e^{-D \Delta t}U^\dagger \vec{x}(t) + \mathrm{Chol}\left[U J(t + \Delta t, t) U^\dagger \right] \cdot \vec{n} $$

where $\vec{n}$ is a vector of standard normal(0, 1) samples. The matrices $U e^{-D \Delta t}U^\dagger$ and $\mathrm{Chol}\left[U J(t + \Delta t, t) U^\dagger \right]$ can be computed only once and applied repeatedly to the $\vec{x}(t)$ obtained in sequence and the $\vec{n}$ drawn at each iteration. The Cholesky decomposition of $UJU^\dagger$ is not obviously expressed in terms of $B$, because the possibly different $\lambda^i$ values mix up components. 


### Simple case and exact simulation of it
If $A$ is diagonal, the $U$ matrices are just identity matrices and disappear, but the Cholesky decomposition of $J(t + \Delta t, t)$ is still not obvious. More explicit expressions can be obtained in the simplifying case where $A$ is proportional to the identity matrix, i.e., all components of $\vec{x}$ have the same fluctuation time scale. 

Let's say that $A =  \frac{1}{\tau} \mathbb{1}$, where $\tau$ is the fluctuation time scale ($\lambda^i = \tau \,\, \forall i$). Then, the matrix $J$ simplifies to 

$$J(t, t_0) = \frac{\tau}{2}\left(1 - e^{-2(t - t_0)/\tau} \right)  BB^T  $$

and its Cholesky decomposition is simply $\sqrt{\frac{\tau}{2}\left(1 - e^{-2(t - t_0)/\tau} \right) } B$. Hence, the distribution of $\vec{x}(t)$ at any time since $t_0$ (deterministic initial condition $\vec{x}_0$) is

$$ \vec{x}(t) \sim \mathcal{N} \left(e^{-(t-t_0)/\tau} \vec{x}_0, \frac{\tau}{2}\left(1 - e^{-2(t - t_0)/\tau} \right)  BB^T  \right) $$

The stationary distribution is simply the above with the exponential factors set to 0. The update rule from $\vec{x}(t)$ to $\vec{x}(t + \Delta t)$ to simulate a realization of the process is nicer as well:

$$ \vec{x}(t + \Delta t) = e^{-\Delta t / \tau} \vec{x}(t) + \sqrt{\frac{\tau}{2} \left(1 - e^{-2\Delta t/\tau}  \right)} B \cdot \vec{n} $$

where $\vec{n} \sim \mathcal{N}(\vec{0}, \mathbb{1})$ is a vector of independent standard normal samples.

As before, we can compute once the (scalar) factor $e^{-\Delta t / \tau}$ and the . This is exact for any $\Delta t$, there is no increase in accuracy by decreasing $\Delta t$. You just choose the $\Delta t$ resolution at which you want to sample the realization of the process. 

### Symmetric choices for correlations
We want all pairs of $\nu_\gamma$ to have the same correlation. More specifically, we want to force a Pearson correlation coefficient of $0 < \rho < 1$ between any pair of $\nu$s. We suppose all background components have the same individual variance $\sigma^2$. The corresponding covariance matrix we want for the steady-state distribution is

$$ \Sigma = \sigma^2 \begin{pmatrix}
    1 & \rho & \ldots & \rho \\
    \rho & 1 & \ldots & \rho \\
    \ldots & \ldots & \ldots & \ldots \\
    \rho & \rho & \ldots & 1
\end{pmatrix} $$

If we apply Cholesky decomposition to get $\Sigma = \Psi \Psi^T$, then $\sqrt{\tau/2} B = \Psi$, since the steady-state covariance of the Ornstein-Uhlenbeck process is, in this simplified case, $\frac{\tau}{2} BB^T$. The $M_B$ coefficient in the update rule is then

$$ M_B = \sqrt{\tau/2(1 - e^{-2 \Delta t/\tau})}B = \sqrt{(1 - e^{-2 \Delta t/\tau})} \Psi $$

The other coefficient is just

$$ M_A = e^{-\Delta t / \tau} \mathbb{1} $$

In [None]:
# Functions to update the fluctuating background variable
from modelfcts.backgrounds import update_ou_kinputs, update_ou_2inputs, decompose_nonorthogonal_basis

# Run a simulation with gaussian background
Select parameters, background components below to integrate the PCA and inhibitory neuron equations while the background fluctuates. One simulation runs in ~10 s on my laptop (not very efficient Python code). 

In [None]:
### General simulation parameters
n_dimensions = 4
n_components = 3
n_neurons = 3

# Simulation time scales
duration = 50000.0
deltat = 1.0
tau_nu = 2.0  # Correlation time scale of the background nu_gammas (same for all)
learnrate = 0.001  # Learning rate of M
rel_lrate = 2.0  # Learning rate of L, relative to learnrate
lambda_range = 0.5
# Choose Lambda diagonal matrix as advised in Minden et al., 2018
lambda_mat_diag = np.asarray([1.0 - lambda_range*k / (n_neurons - 1) for k in range(n_neurons)])
biopca_rates = [learnrate, rel_lrate, lambda_range]

inhib_rates = [25e-5, 5e-5]  # alpha, beta

# Initial synaptic weights: as advised in Minden et al., 2018 
rgen_meta = np.random.default_rng(seed=0x8496f883e85163519eb26fb84733ebad)
init_mmat = rgen_meta.standard_normal(size=[n_neurons, n_dimensions]) / np.sqrt(n_dimensions)
init_lmat = np.eye(n_neurons, n_neurons)  # Supposed to be near-identity, start as identity
ml_inits = [init_mmat, init_lmat]

# Choose three LI vectors in (+, +, +) octant: [0.8, 0.1, 0.1], [0.1, 0.8, 0.1], etc.
back_components = 0.2*np.ones([n_components, n_dimensions])
# Symmetric background components are an issue for PCA, it causes degeneracy and fluctuations
for i in range(n_components):
    if i < n_dimensions:
        back_components[i, i] = 0.8 - 0.0*i
    else:  # If there are more components than there are dimensions (ORNs)
        back_components[i, i % n_dimensions] = 0.8 - i
    # Normalize
    back_components[i] = back_components[i] / np.sqrt(np.sum(back_components[i]**2))

# Initial background vector and initial nu values
averages_nu = np.ones(n_components) / np.sqrt(n_components)
init_nu = np.zeros(n_components)
init_bkvec = averages_nu.dot(back_components)
# Initial background params, ordered with nu first for the update_ou_kinputs function
init_back_list = [init_nu, init_bkvec]

## Compute the matrices in the Ornstein-Uhlenbeck update equation
# Update matrix for the mean term: 
# Exponential decay with time scale tau_nu over time deltat
update_mat_A = np.identity(n_components)*np.exp(-deltat/tau_nu)

# Steady-state covariance matrix
sigma2 = 0.09
correl_rho = 0.0  # Set to zero for comparison with analytical prediction
steady_covmat = correl_rho * sigma2 * np.ones([n_components, n_components])  # Off-diagonals: rho
steady_covmat[np.eye(n_components, dtype=bool)] = sigma2  # diagonal: ones

# Cholesky decomposition of steady_covmat gives sqrt(tau/2) B
# Update matrix for the noise term: \sqrt(tau/2(1 - exp(-2*deltat/tau))) B
psi_mat = np.linalg.cholesky(steady_covmat)
update_mat_B = np.sqrt(1.0 - np.exp(-2.0*deltat/tau_nu)) * psi_mat

back_params = [update_mat_A, update_mat_B, back_components, averages_nu]

In [None]:
# m_init, update_bk, bk_init, ibcm_params, inhib_params, bk_params, tmax, dt, seed=None, noisetype="normal"
sim_results = integrate_inhib_ifpsp_network_skip(ml_inits, update_ou_kinputs, 
                        init_back_list, biopca_rates, inhib_rates, back_params, duration, 
                        deltat, seed=seed_from_gen(rgen_meta), noisetype="normal")
# tseries, bk_series, bkvec_series, m_series, cbar_series, w_series, s_series
tser, nuser, bkvecser, mser, lser, _, cbarser, wser, sser = sim_results

## Plots of the solution

### TODO: upgrade plots
We should look at $\vec{m}$ (the PCA neurons's input weights, to be compared to the analytical prediction), $\vec{w}$ (the inhibitory neurons), and $\vec{s}$, the olfactory background after inhibition, which we expect to have less variance than the original background $\vec{x}$.

Also look at $\vec{c}$, this one should look like PCA projections. And look at the combined matrix $L^{-1}M^T$, which is supposed to be like a projection matrix on principal components. 


In [None]:
from simulfcts.plotting import (plot_3d_series, plot_w_matrix, plot_m_matrix, plot_pca_results,
                            plot_background_norm_inhibition, plot_background_neurons_inhibition)

### Check PCA learning
Let $X$ be a matrix with each input sample in a column. According to Lemma 3 from Minden et al., 2018:
 - $L$ is diagonal with the $K$ first principal values, that is, the $K$ first eigenvalues of $XX^T$, on its diagonal
 - $\hat{U}_K = \Lambda^{-1} L^{-1} M$ is the learnt projector on the $K$ subspace: rows of $\hat{U}_K$ are the first $K$ principal components. 
 - Note that in the ifPSP algorithm, the Taylor series approximation $L^{-1} = L_d^{-1} - L_d^{-1} L_o L_d^{-1}$ is used. 

In [None]:
def compute_pca_meankept(samp, do_proj=False, vari_thresh=1.0, force_svd=False):
    """ Given an array of samples, compute the empirical covariance and
    diagonalize it to obtain the principal components and principal values,
    which are what is returned.

    If less than 10*d samples, take SVD of the sample matrix directly
    divided by 1/sqrt(N-1), because this amounts to eigendecomposition of
    the covariance matrix, but with better numerical stability and accuracy
    (but it's a lot slower).

    Args:
        samp (np.array): nxp matrix for n samples of p dimensions each.
            Pass the values of a dataframe for proper slicing.
        do_proj (bool): if True, also project the sample points
        vari_thresh (float in [0., 1.]): include principal components until
            a fraction vari_thresh of the total variance is explained.
        force_svd (bool): if True, use SVD of the data matrix directly.
    Returns:
        p_values (np.ndarray): 1d array of principal values, descending order.
        p_components (np.ndarray): 2d array of principal components.
            p_components[:, i] is the vector for p_values[i]
        samp_proj (np.ndarray): of shape (samp.shape[0], n_comp) where n_comp
            is the number of principal components needed to explain
            vari_thresh of the total variance.
    """
    # Few samples: use SVD on the de-meaned data directly.
    if force_svd or samp.shape[0] <= 10*samp.shape[1]:
        svd_res = np.linalg.svd(samp.T / np.sqrt(samp.shape[0] - 1))
        # U, Sigma, V. Better use transpose so small first dimension,
        # because higher accuracy in eigenvectors in U
        # Each column of U is an eigenvector of samp^T*samp/(N-1)
        p_components = svd_res[0]
        p_values = svd_res[1]**2  # Singular values are sqrt of eigenvalues

    # Many samples are available; use covariance then eigen decomposition
    else:
        covmat = np.dot(samp.T, samp) / (samp.shape[0] - 1)
        p_values, p_components = np.linalg.eigh(covmat)
        # Sort in decreasing order; eigh returns increasing order
        p_components = p_components[:, ::-1]
        p_values = p_values[::-1]

    if do_proj:
        vari_explained = 0.
        total_variance = np.sum(p_values)
        n_comp = 0
        while vari_explained < total_variance*vari_thresh:
            vari_explained += p_values[n_comp]
            n_comp += 1
            if n_comp > p_values.shape[0]: break
        samp_proj = samp.dot(p_components[:, :n_comp])

    else:
        samp_proj = None

    return p_values, p_components, samp_proj

In [None]:
def frobnorm(mat):
    """ Compute Frobenius norm of matrix A, 
    ||A||^2 = Tr(A^T A). """
    return np.trace(mat.T.dot(mat))

def subspace_align_error(mat, target):
    """ Compute min_Q ||Q.dot(mat) - target||^2 / ||target||^2. 
    The solution to that orthogonal Procrustes problem is 
        Q = U V^T, where USV^T is the SVD of target.dot(mat.T)
    according to Wikipedia, citing the solution of Schönemann, 1966. 
    (https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem)
    """
    # Solve Procrustes problem
    u, s, vh = np.linalg.svd(target.dot(mat.T))
    q = u.dot(vh)
    # Compute alignment error
    return frobnorm(q.dot(mat) - target) / frobnorm(target)

In [None]:
def compute_projector_series(mser, lser):
    nt = mser.shape[0]
    nk = lser.shape[1]
    linvdiag = 1.0 / np.diagonal(lser, axis1=1, axis2=2)
    loffd = lser.copy()
    loffd[:, np.arange(nk), np.arange(nk)] = 0.0
    
    linvser = linvdiag[:, :, None] * (np.tile(np.eye(nk), (nt, 1, 1)) - loffd * linvdiag[:, None, :])
    fser = np.einsum("...ij,...jk", linvser, mser)  # ... allows broadcasting
    return fser
    
def analyze_pca_learning(xser, mser, lser, lambda_diag):
    # Exact PCA: eigenvalue decomposition of xx^T / (n_samples-1)
    nk = lser.shape[1]
    nt = xser.shape[0]
    eigvals, eigvecs, _ = compute_pca_meankept(xser, do_proj=False)
    
    # Determine basis learnt by algorithm and return
    fser = compute_projector_series(mser, lser)
    learntvecs = ((1.0/lambda_diag[None, :, None]) * fser)
    # Each row of learntvecs[t] is an eigenvector learnt at time t
    
    # Values on the diagonal of L are supposed to be eigenvalues
    learntvals = np.diagonal(lser, axis1=1, axis2=2)
    # Sort them in decreasing order
    sort_arg = np.argsort(np.mean(learntvals[nt//2:], axis=0))[::-1]
    learntvals = learntvals[:, sort_arg]
    
    # Off-diagonal values are supposed to tend to zero
    loffd = lser.copy()
    loffd[:, np.arange(nk), np.arange(nk)] = 0.0
    offd_avg_abs = np.mean(np.abs(loffd), axis=(1, 2))
    
    # Subspace alignment
    learnt_pca_ser = [learntvecs[i].T for i in range(nt)]
    error_series = np.asarray([subspace_align_error(eigvecs[:, :nk], v) for v in learnt_pca_ser])
    
    return [eigvals, eigvecs], [learntvals, learntvecs], fser, offd_avg_abs, error_series

In [None]:
res = analyze_pca_learning(bkvecser, mser, lser, lambda_mat_diag)
true_pca, learnt_pca, fser, off_diag_l_avg_abs, align_error_ser = res

In [None]:
# Plot fluctuations of the projector
fig, axes = plt.subplots(n_neurons)
axes = axes.flatten()
for i in range(n_neurons):
    for j in range(n_dimensions):
        #axes[i].plot(tser, learnt_pca[1][:, i, j])
        axes[i].plot(tser, fser[:, i, j])
plt.show()
plt.close()

In [None]:
fig, axes = plot_pca_results(tser, true_pca, learnt_pca, align_error_ser, off_diag_l_avg_abs)
plt.show()
plt.close()

### PCA neurons
Dynamics, comparison to analytical fixed points, and plot of the time course of some selected components of $\vec{m}$ of a few vectors. 

In [None]:
fig, ax = plot_3d_series(mser, dim_idx=[0, 1, 2], transient=1000, skp=100)

# Annotate with vectors representing the odor components
orig = np.zeros([3, n_components])
xlim, ylim, zlim = ax.get_xlim(), ax.get_ylim(), ax.get_zlim()
scale = 0.3
vecs = back_components.copy()
for i in range(n_components):
    vecs[i] = back_components[i] / np.sqrt(np.sum(back_components[i]**2)) * scale
ax.quiver(*orig, *(vecs[:, :3].T), color="k", lw=2.0)
ax.view_init(azim=120, elev=30)
ax.set(xlabel=r"$\overline{m}_1$", ylabel=r"$\overline{m}_2$", zlabel=r"$\overline{m}_3$")
plt.show()
plt.close()

In [None]:
fig, axes_mat = plot_m_matrix(tser, mser, skp=100)

In [None]:
# The L matrix should be nearly diagonal, it does not seem to be the case here. 
fig, axes_mat = plot_m_matrix(tser, lser, skp=100)
for i in range(axes_mat.shape[0]):
    axes_mat[i, 0].set_ylabel(r"$L^{ij}$")
plt.show()
plt.close()

In [None]:
# 3D plot of the original and inhibited odors, sampled sparsely in time
# NB: due to our choice of components, there is symmetry and degeneracy 
# in the two lower eigenvalues, so vectors may not be the same, up to
# a rotation around the largest, unique PC. 
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

# Raw background
transient = 30000
skp = 100
tslice = slice(transient, None, skp)
dimslice = [0, 1, 2]
ax.scatter(bkvecser[tslice, 0], bkvecser[tslice, 1], bkvecser[tslice, 2], color="r", label="Background")
ax.scatter(sser[tslice, 0], sser[tslice, 1], sser[tslice, 2], 
           color="b", label="PCA inhibition")
# Compare to inhibition of the mean
mean_inhibition = bkvecser - np.mean(bkvecser[transient:], axis=0)*inhib_rates[0]/sum(inhib_rates)
ax.scatter(mean_inhibition[tslice, 0], mean_inhibition[tslice, 1], mean_inhibition[tslice, 2], 
           color="xkcd:light blue", label="Average subtraction")
ax.scatter(0, 0, 0, color="k", s=200, alpha=1)

# Each row is the mean bkvec
orig = np.zeros([3, 3])
orig = np.zeros([3, n_components]) + np.mean(bkvecser[transient:, dimslice, None], axis=0) 
ax.scatter(*orig, c="k", s=9, zorder=1)
# Need to unravel dimensions, which is the last axis of stacks of vectors
# However in true_pca[1], each column is an eigenvector: p_components[:, i] is the vector for p_values[i]
# so there we don't need to transpose. 
ax.quiver(*orig, *(true_pca[1][dimslice, :n_components]), 
          color="k", lw=2.0, zorder=100, label="True PCs")
print(fser.shape)
# In fser, each row is an eigenvector, each column a dimension
vecs = np.mean(fser[transient:, :, dimslice], axis=0)  # Each row is a vector
#vecs = vecs / np.sqrt(np.sum(vecs**2, axis=1, keepdims=True))
# Remove scale; rows of F have norm F.F^T = Lambda^2
vecs = 1.0 / lambda_mat_diag[:, None] * vecs
ax.quiver(*orig, *(vecs[:n_components].T), color="b", lw=2.0, zorder=101, label="Learnt PCs")
ax.view_init(azim=140, elev=30)
ax.legend(loc="upper left", bbox_to_anchor=(1, 0.85))
plt.show()
plt.close()

In [None]:
# Dot products of learnt vectors with themselves. 
print(np.dot(vecs, vecs.T))  # Should be orthogonal or nearly. 
print(vecs)

## Evolution of the inhibitory neurons' weights $\vec{w}_i$
Analytically, I find that, on average, $\vec{w}_i$ converges to $\vec{x}(\pm \sigma)$, i.e. to either input vector one standard deviation away from the mean input. So, here, I compare the numerical results for $\vec{w}$ to the possible fixed points. 

In [None]:
fig, axes_mat = plot_w_matrix(tser, wser, skp=100)

## Background before and after inhibition

### Analytical prediction of $\vec{w}$ for gaussian $\vec{x}$

TODO

### Analytical prediction of $\vec{s}$ for gaussian $\vec{x}$ and many neurons
TODO

In [None]:
fig, ax, bknorm_ser, snorm_ser = plot_background_norm_inhibition(tser, bkvecser, sser, skp=10)

# Compute noise reduction factor, annotate
transient = 30000
avg_bknorm = np.mean(bknorm_ser[transient:])
avg_snorm = np.mean(snorm_ser[transient:])
avg_reduction_factor = avg_snorm / avg_bknorm
std_bknorm = np.std(bknorm_ser[transient:])
std_snorm = np.std(snorm_ser[transient:])
std_reduction_factor = std_snorm / std_bknorm

print("Mean activity norm reduced to "
      + "{:.1f} % of input".format(avg_reduction_factor * 100))
print("Standard deviation of activity norm reduced to "
      + "{:.1f} % of input".format(std_reduction_factor * 100))
ax.annotate("St. dev. reduced to {:.1f} %".format(std_reduction_factor * 100), 
           xy=(0.98, 0.98), xycoords="axes fraction", ha="right", va="top")

ax.legend(loc="center right", bbox_to_anchor=(1.0, 0.8))
fig.tight_layout()
plt.show()
plt.close()

In [None]:
fig, axes_mat, axes = plot_background_neurons_inhibition(tser, bkvecser, sser, skp=10)
axes[-1].legend(loc="center right", bbox_to_anchor=(1.0, 0.6), fontsize=8, handlelength=1.5)
fig.tight_layout()

# Compute noise reduction factor, annotate
transient = 30000
avg_bknorm = np.mean(bkvecser[transient:])
avg_snorm = np.mean(sser[transient:])
avg_reduction_factor = avg_snorm / avg_bknorm
std_bknorm = np.std(bkvecser[transient:])
std_snorm = np.std(sser[transient:])
std_reduction_factor = std_snorm / std_bknorm

print("Mean activity of a projection neuron reduced to "
      + "{:.1f} % of input".format(avg_reduction_factor * 100))
print("Standard deviation of a projection neuron's activity reduced to "
      + "{:.1f} % of input".format(std_reduction_factor * 100))

plt.show()
plt.close()

In [None]:
# Standard deviations
# TODO: determine analytical expression
stdev_inhib_comps = np.std(sser[transient:], axis=0)
stdev_bk_comps = np.std(bkvecser[transient:], axis=0)
print("Standard deviation of components, no inhibition:", stdev_bk_comps)
print("Standard deviation of components, after inhibition:", stdev_inhib_comps)
print("The noise is reduced on average to", (stdev_inhib_comps[:3]/stdev_bk_comps[:3]).mean()*100, "% of original")
#print("Theoretical prediction: not determined yet, intuitively 1-a/(a+b) =", 100*(1 - inhib_rates[0]/sum(inhib_rates)), "%")
print()

# Averages
avg_inhib_comps = np.mean(sser[transient:], axis=0)
avg_bk_comps = averages_nu.dot(back_components)
print("Average background after inhibition:", avg_inhib_comps)

## Response to a new odor
This part of the code only runs if the simulation above had ``n_dimensions > n_components``. 

The goal is to see whether a new odor, not linearly dependent of the ones in the background, also gets repressed close to zero, or produces an inhibited output noticeably different from the inhibited background, and still similar to the new odor vector, at least its component perpendicular to the background subspace.

I should test this property more carefully, over a statistical ensemble of new odors, once I have a more realistic model of odors themselves. 

If the background, at the moment where the new odor is presented, is significantly different from the average background, the simple inhibition by average background subtraction will not work: the subtracted background (the average) will be very different from the current background, and the new odor will not be isolated. The IBCM model should work better for this by being able to subtract components of the background separately. 

In [None]:
def respond_new_odor_ifpsp(odor, typical_f, typical_w):
    # Compute activation of neurons with this new odor (new+background)
    # Given the ifPSP projector's current state 
    # (either latest or some average state of the neurons)
    cbar = typical_f.dot(odor)

    # New odor after inhibition by the network, ReLU activation on s
    # Inhibit with the mean cbar*wser, to see how on average the new odor will show
    new_output = relu_inplace(odor - typical_w.dot(cbar))  # s = x - Wc
    return new_output

In [None]:
new_odor = np.roll(back_components[0], shift=-1)  # Should be a new vector
full_basis = np.hstack([back_components.T, new_odor.reshape(-1, 1)])

# Mix new odor with background
# new_odor = 0.5*new_odor + 0.5*np.sum(back_components, axis=0) / n_components  # Combine with mean background
new_odor = 0.5*new_odor + 0.5*back_components[1]  # Combine with one component
# Inhibit with average m synapses and w
new_odor_after_inhibition_average = respond_new_odor_ifpsp(new_odor, np.mean(fser[transient:], axis=0), 
                                                     np.mean(wser[transient:], axis=0))
# Inhibit with latest m and w
new_odor_after_inhibition_latest = respond_new_odor_ifpsp(new_odor, mser[-1], wser[-1])

# Response to new odor after inhibition by removal of average background
# s = x - alpha/(alpha+beta)*mean_background
new_odor_after_average_subtract = new_odor - (inhib_rates[0]/(sum(inhib_rates))
                                              * np.mean(bkvecser[transient:], axis=0))

# Show components along the three background vectors plus the new odor vector (full_basis)
print("Decomposition on the basis of x_gamma")
print("Unhinibited new odor:", decompose_nonorthogonal_basis(new_odor, full_basis))
print("Average after ifPSP inhibition:", decompose_nonorthogonal_basis(
                        new_odor_after_inhibition_average, full_basis))
print("Inhibition by average subtraction:", decompose_nonorthogonal_basis(
                        new_odor_after_average_subtract, full_basis))

# We indeed detect the new odor in the plane perpendicular to the inhibition. 
# Hence the more components of the new odor are not spanned by the old odors, the more
# we can detect it. 

# Inhibition of an alternating background

This works very well with the IBCM model. What about PCA?

In [None]:
def update_alternating_inputs(idx_bk, params_bk, noises, dt):
    """ Select randomly the next background input. 
    Args:
        nu_bk (np.ndarray): array of length k-1, containing proportions nu_i of odorants
        params_bk (list):  Contains the following parameters       
            cumul_probs (np.ndarray): cumulative probabilities up to the kth input vector. 
            vecs (np.ndarray): 2d array where each row is one of the possible input vectors
        noises (np.1darray): pre-generated uniform(0, 1) samples, in an array of length 1, 
            to choose next input vector. 
        """
    # Index of the next input
    cumul_probs, vecs = params_bk
    idx = np.argmax(cumul_probs > noises[0])
    return vecs[idx], np.asarray([idx])

In [None]:
init_back_altern = [np.zeros(1), back_components[0]]  # Start with component 0
back_params_altern = [np.arange(n_components)/n_components, back_components]

sim_res = integrate_inhib_ifpsp_network_skip(ml_inits, update_alternating_inputs, 
                        init_back_altern, biopca_rates, inhib_rates, back_params_altern, duration, 
                        deltat, seed=seed_from_gen(rgen_meta), noisetype="uniform")
# tseries, bk_series, bkvec_series, m_series, cbar_series, w_series, s_series
tser_alt, nuser_alt, bkvecser_alt, mser_alt, lser_alt, cbarser_alt, wser_alt, sser_alt = sim_res

### Time evolution of the learnt PCA

In [None]:
res = analyze_pca_learning(bkvecser_alt, mser_alt, lser_alt, lambda_mat_diag)
true_pca_alt, learnt_pca_alt, fser_alt, off_diag_l_avg_abs_alt, align_error_ser_alt = res

In [None]:
fig, axes = plot_pca_results(tser_alt, true_pca_alt, learnt_pca_alt, align_error_ser_alt, off_diag_l_avg_abs_alt)
plt.show()
plt.close()

In [None]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Plot a sample of points for each neuron
transient = 30000
skp = 500
tslice = slice(transient, None, skp)
ax.plot(0, 0, 0, color="k", marker="o", ls="none", ms=12)
colors = sns.color_palette("magma", n_colors=n_neurons)
for i in range(n_neurons):
    ax.scatter(mser_alt[tslice, i, 0], mser_alt[tslice, i, 1], mser_alt[tslice, i, 2], 
               alpha=0.5, color=colors[i], label="Neuron {}".format(i))

# Annotate with vectors representing the odor components
orig = np.zeros([n_components, n_components])
xlim, ylim, zlim = ax.get_xlim(), ax.get_ylim(), ax.get_zlim()
scale = 0.3
vecs = back_components.copy()
for i in range(n_components):
    vecs[i] = back_components[i] / np.sqrt(np.sum(back_components[i]**2)) * scale
ax.quiver(*orig, *(vecs[:, :3].T), color="k", lw=2.0)
# ax.view_init(azim=0, elev=30)
ax.view_init(azim=30, elev=30)
# ax.legend()
plt.show()
plt.close()

### Background components after inhibition
And their standard deviation: it is reduced. 

In [None]:
fig, axes_mat, axes = plot_background_neurons_inhibition(tser_alt, bkvecser_alt, sser_alt, skp=10)
axes[-1].legend(loc="center right", bbox_to_anchor=(1.0, 0.6), fontsize=8, handlelength=1.5)
fig.tight_layout()

# Compute noise reduction factor, annotate
transient = 20000
avg_bknorm = np.mean(bkvecser_alt[transient:])
avg_snorm = np.mean(sser_alt[transient:])
avg_reduction_factor = avg_snorm / avg_bknorm
std_bknorm = np.std(bkvecser_alt[transient:])
std_snorm = np.std(sser_alt[transient:])
std_reduction_factor = std_snorm / std_bknorm

print("Mean activity of a projection neuron reduced to "
      + "{:.1f} % of input".format(avg_reduction_factor * 100))
print("Standard deviation of a projection neuron's activity reduced to "
      + "{:.1f} % of input".format(std_reduction_factor * 100))

plt.show()
plt.close()

# Non-gaussian unimodal distribution
If there is a discrete number of fixed points when the $\nu_{\alpha}$ have a distribution with non-zero third moment, there should be a transition from a continuum of fixed points to this discrete case as we increase a parameter $\epsilon = \langle (\nu - \langle\nu\rangle)^3 \rangle$ above zero. 

To interpolate with a parameter $\epsilon$ from a gaussian distribution to a distribution with non-zero central third moment, one trick is to simulate $x$ as an Ornstein-Uhlenbeck process with zero mean, then take
$$ \nu = s + x + \epsilon x^2 $$
or, in the multivariate case, 
$$ \vec{\nu} = \vec{s} + \vec{x} + \epsilon \mathrm{diag}(\vec{x}) \vec{x} $$

If there are no correlations, we can treat each component $\nu_{\alpha}$ as a univariate case, and we then have a third moment of order $\epsilon$, with only lower-order corrections to the second moment and order $\epsilon$ corrections to the desired mean $s$:

$$ \langle \nu \rangle = s + \epsilon \sigma^2 $$
$$ \langle (\nu - \langle \nu \rangle)^2 \rangle = \sigma^2 + 2 \epsilon^2 \sigma^4 $$
$$ \langle (\nu - \langle \nu \rangle)^3 \rangle = 6 \epsilon \sigma^4 + 8 \epsilon^3 \sigma^6 $$

In [None]:
from modelfcts.backgrounds import update_thirdmoment_kinputs

In [None]:
# Reset some simulations parameters, others stay as before
# Initial synaptic weights: small positive noise near origin
rgen_meta = np.random.default_rng(seed=0xeefeb6e5f101c07cf9e80e95d5e8ecfd)
init_mmat = rgen_meta.standard_normal(size=[n_neurons, n_dimensions]) / np.sqrt(n_dimensions)
init_lmat = np.eye(n_neurons, n_neurons)  # Supposed to be near-identity, start as identity
ml_inits3 = [init_mmat, init_lmat]

# biopca parameters
learnrate3 = 0.003  # Learning rate of M
rel_lrate3 = 2.0  # Learning rate of L, relative to learnrate
# Choose Lambda diagonal matrix as advised in Minden et al., 2018
lambda_range3 = 0.5
lambda_mat_diag3 = np.asarray([1.0 - lambda_range3*k / (n_neurons - 1) for k in range(n_neurons)])
biopca_rates3 = [learnrate3, rel_lrate3, lambda_range3]

# Initial background vector and initial nu values
averages_nu = np.ones(n_components) / np.sqrt(n_components)
init_nu = np.zeros(n_components)
init_bkvec = averages_nu.dot(back_components)
# nus are first in the list of initial background params
init_back_list = [init_nu, init_bkvec]

## Compute the matrices in the Ornstein-Uhlenbeck update equation
# Update matrix for the mean term: 
# Exponential decay with time scale tau_nu over time deltat
tau_nu = 2  # Fluctuation time scale of the background nu_gammas (same for all)
update_mat_A = np.identity(n_components)*np.exp(-deltat/tau_nu)

# Steady-state covariance matrix
sigma2 = 0.09
correl_rho = 0.0
epsilon_nu = 0.2
steady_covmat = correl_rho * sigma2 * np.ones([n_components, n_components])  # Off-diagonals: rho
steady_covmat[np.eye(n_components, dtype=bool)] = sigma2  # diagonal: ones

# Cholesky decomposition of steady_covmat gives sqrt(tau/2) B
# Update matrix for the noise term: \sqrt(tau/2(1 - exp(-2*deltat/tau))) B
psi_mat = np.linalg.cholesky(steady_covmat)
update_mat_B = np.sqrt(1.0 - np.exp(-2.0*deltat/tau_nu)) * psi_mat

back_params_3 = [update_mat_A, update_mat_B, back_components, averages_nu, epsilon_nu]

In [None]:
# m_init, update_bk, bk_init, ibcm_params, inhib_params, bk_params, tmax, dt, seed=None, noisetype="normal"
sim_results = integrate_inhib_ifpsp_network_skip(ml_inits3, update_thirdmoment_kinputs, 
                        init_back_list, biopca_rates3, inhib_rates, back_params_3, duration, 
                        deltat, seed=seed_from_gen(rgen_meta), noisetype="normal")
# tseries, bk_series, bkvec_series, m_series, cbar_series, w_series, s_series
tser3, nuser3, bkvecser3, mser3, lser3, cbarser3, wser3, sser3 = sim_results

### Check PCA learning

In [None]:
res = analyze_pca_learning(bkvecser3, mser3, lser3, lambda_mat_diag3)
true_pca3, learnt_pca3, fser3, off_diag_l_avg_abs3, align_error_ser3 = res

In [None]:
fig, axes = plot_pca_results(tser3, true_pca3, learnt_pca3, align_error_ser3, off_diag_l_avg_abs3)
plt.show()
plt.close()

## Time evolution of IBCM neurons

In [None]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Plot a sample of points for each neuron
transient = 20000
skp = 100
tslice = slice(transient, None, skp)
ax.plot(0, 0, 0, color="k", marker="o", ls="none", ms=12)
colors = sns.color_palette("magma", n_colors=n_neurons)
for i in range(n_neurons):
    ax.scatter(mser3[tslice, i, 0], mser3[tslice, i, 1], mser3[tslice, i, 2], 
               alpha=0.5, color=colors[i], label="Neuron {}".format(i))

# Annotate with vectors representing the odor components
orig = np.zeros([n_components, n_components])
ax.plot(orig[0, 0], orig[0, 1], orig[0, 2], mfc="k", marker="o", mec="k", ls="none")
xlim, ylim, zlim = ax.get_xlim(), ax.get_ylim(), ax.get_zlim()
scale = 0.3
vecs = back_components.copy()
for i in range(n_components):
    vecs[i] = back_components[i] / np.sqrt(np.sum(back_components[i]**2)) * scale
ax.quiver(*orig, *(vecs[:, :3].T), color="k", lw=2.0)
#ax.view_init(azim=45, elev=30)
# ax.view_init(azim=45, elev=140)
# ax.legend()
ax.set(xlabel=r"$\overline{m}_1$", ylabel=r"$\overline{m}_2$", zlabel=r"$\overline{m}_3$")
#fig.savefig("figures/three_odors/neurones_ifpsp_3e_moment_epsilon_2e-1.pdf", transparent=True)
plt.show()
plt.close()

## Background components after inhibition

In [None]:
fig, ax, bknorm_ser3, snorm_ser3 = plot_background_norm_inhibition(tser3, bkvecser3, sser3, skp=10)

# Compute noise reduction factor, annotate
transient = 30000
avg_bknorm3 = np.mean(bknorm_ser3[transient:])
avg_snorm3 = np.mean(snorm_ser3[transient:])
avg_reduction_factor3 = avg_snorm3 / avg_bknorm3
std_bknorm3 = np.std(bknorm_ser3[transient:])
std_snorm3 = np.std(snorm_ser3[transient:])
std_reduction_factor3 = std_snorm3 / std_bknorm3

print("Mean activity norm reduced to "
      + "{:.1f} % of input".format(avg_reduction_factor3 * 100))
print("Standard deviation of activity norm reduced to "
      + "{:.1f} % of input".format(std_reduction_factor3 * 100))
ax.annotate("St. dev. reduced to {:.1f} %".format(std_reduction_factor3 * 100), 
           xy=(0.98, 0.98), xycoords="axes fraction", ha="right", va="top")

ax.legend(loc="center right", bbox_to_anchor=(1.0, 0.8))
fig.tight_layout()
plt.show()
plt.close()

In [None]:
# 3D plot of the original and inhibited odors, sampled sparsely in time
# NB: due to our choice of components, there is symmetry and degeneracy 
# in the two lower eigenvalues, so vectors may not be the same, up to
# a rotation around the largest, unique PC. 
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

# Raw background
transient = 30000
skp = 100
tslice = slice(transient, None, skp)
dimslice = [0, 1, 2]
ax.scatter(bkvecser3[tslice, 0], bkvecser3[tslice, 1], bkvecser3[tslice, 2], color="r", 
           label="Background")
ax.scatter(sser3[tslice, 0], sser3[tslice, 1], sser3[tslice, 2], 
           color="b", label="PCA inhibition")
# Compare to inhibition of the mean
mean_inhibition = bkvecser3 - np.mean(bkvecser3[transient:], axis=0)*inhib_rates[0]/sum(inhib_rates)
ax.scatter(mean_inhibition[tslice, 0], mean_inhibition[tslice, 1], mean_inhibition[tslice, 2], 
           color="xkcd:light blue", label="Average subtraction")
ax.scatter(0, 0, 0, color="k", s=200, alpha=1)

# Each row is the mean bkvec
orig = np.zeros([3, 3])
orig = np.zeros([3, n_components]) + np.mean(bkvecser3[transient:, dimslice, None], axis=0) 
ax.scatter(*orig, c="k", s=9)
# Need to unravel dimensions, which is the last axis of stacks of vectors
# However in true_pca[1], each column is an eigenvector: p_components[:, i] is the vector for p_values[i]
# so there we don't need to transpose. 
ax.quiver(*orig, *(true_pca3[1][dimslice, :n_components]), 
          color="k", lw=2.0, label="True PCs")
# In fser, each row is an eigenvector, each column a dimension
vecs = np.mean(fser3[transient:, :, dimslice], axis=0)  # Each row is a vector
#vecs = vecs / np.sqrt(np.sum(vecs**2, axis=1, keepdims=True))
# Remove scale; rows of F have norm F.F^T = Lambda^2
vecs = 1.0 / lambda_mat_diag3[:, None] * vecs
ax.quiver(*orig, *(vecs[:n_components].T), color="b", lw=2.0, label="Learnt PCs")
ax.view_init(azim=330, elev=10)
ax.legend(loc="upper left", bbox_to_anchor=(1, 0.85))
plt.show()
plt.close()