## Sparse Dictionary Learning for Data Driven Analysis of Neuroimaging Signals
Presenter: Navid Shokouhi

*The University of Melbourne*

work in collaboration with Dr. Abd-Krim Seghouane


# Outline
- Presenter Background
    - Previous Work
- Data-driven functional analysis for resting-state fMRI
    - ICA
- Sparse Dictionary Learning
- Numerical Results
- Directions of Future Work 
- Summary

## Navid Shokouhi
PhD in *Statistical Signal Processing* and *Machine Learning*
- *The University of Texas at Dallas, USA*
- Worked on machine learning for speech applications: spoken dialogue systems


Postdoc in *Statistical Signal Processing* and *Machine Learning*
- *The University of Melbourne*
- functional data analysis for Neuroimaging
    

- Research Interests:
    - Statistical interpretation of Machine Learning Algorithms
    - Multivariate statistical analysis: PCA, ICA, CCA, LDA
    - Sparse Statistical Learning 
    
    
- Research Experience:
    - Machine learning for data driven Neuroimaging analysis: 
        - fMRI, fNIRS
    - Speech Recognition and Voice biometrics
        - PhD dissertation


- Professional (non-academic) Experience:
    - *Pull String Inc., San Francisco, CA, USA*
        - Software developer in applied Computational Linguistics
        - Python/C++
    - Work in collaboration with *Samsung, USA*
        - Speech Recognition for Smart TVs 

## Previous Work in Neuroimaging:
- Model order selection for ICA (resting-state fMRI analysis)
    <img src="figures/2018_TMI_RSNplotOrderselection_SeghouaneShokouhi.png" alt="RSN_orderselection" height="100"/>

**Seghouane, Shokouhi, "Consistent Estimation of Dimensionality for Data-Driven Methods in fMRI Analysis", Transactions on Medical Imaging, 2018.**

- Model order selection for CCA (multi-modal data analysis)
<img src="figures/2018_TSP_CCAorderselection_SeghouaneShokouhi.png" alt="RSN_orderselection" height="75"/>

**Seghouane, Shokouhi, "Estimating the number of Significant Canonical Coordinates", Transactions on Signal Processing, (submitted) 2018.**

- Robust Estimation of the Haemodynamic Response Function
    <img src="figures/2018_unpublished_HRFestimation_SeghouaneShokouhi.png" alt="HRF_alpharobust" width="200"/>
    <img src="figures/2018_unpublished_HRFestimationAlphaRobustPairs_SeghouaneShokouhi.png" alt="HRF_alpharobust" width="400"/>

**Seghouane, Shokouhi, "Robust Estimation of Output Layer Parameters for RBFNs", Transactions on Cybernetics, (submitted) 2018.**

- Sparse PCA as a preprocessing step for ICA
    <img src="figures/2018_TIP_SparsePCArealfmri_SeghouaneShokouhiKoch.png" alt="sparsePCA" width="200"/>

**Seghouane, Shokouhi, Koch, "Sparse Principal Component Analysis with Preserved Sparsity Pattern", Transactions on Image Processing, (submitted) 2018.**

- Simultaneous spatio-temporal data normalization (2D whitening)
<img src="figures/2018_SPL_twodimensional_SeghouaneShokouhi.png" alt="twoD" width="400"/>

**Seghouane, Shokouhi, "Two-Dimensional Whitening of Face Images for Improved PCA Performance
, Signal Processing Letters, 2018.**

## Data-driven functional analysis for resting-state fMRI
Popular fMRI data analysis tools:
- General Linear Model
    - variant of the linear regression fitted by least-squares to each voxel (aka SPM).
    - useful for studying brain response to task stimuli
- Data-driven methods
    - principal/independent component analysis
    - can be used for both task-dependent and resting-state data

### Task-dependent
<img src="figures/taskdependent_fMRI.png" alt="sparsePCA" width="500"/>

### Resting-state 
Without Paradigm

Our data is the collection of voxel time-series. 
    <img src="figures/boldsignal_fmri.png" alt="bold" width="400"/>

**Resting-state networks**: spontaneous (i.e., non task-dependent) BOLD signals that are known to be functionally and/or structurally related. 

An example is the **default mode network (DMN)**, which shows a decrease in activity during cognitive tasks, inversely related to regions activated by cognitive tasks. 

### Formulation used in Data-Driven Methods
Let ${\bf Y} = [{\bf y}_1,\dots,{\bf y}_m]$ represents the data matrix comprised of vector samples of a BOLD signal at voxel $i$,

where ${\bf y}_i\in R^m$ for $i=1,\dots,N$.

For each voxel $i$ assume the linear mixture model: 
$${\bf y}_i = {\bf D}{\bf x}_i + \boldsymbol{\epsilon}_i$$
where $\boldsymbol{\epsilon}_i$ correspond to noise. 

In matrix format (concatenating all ${\bf y}_i$, ${\bf x}_i$, and $\boldsymbol{\epsilon}_i$):
$${\bf Y} = {\bf D}{\bf X} + {\bf E}$$


> **The main objective of any data-driven method is to perform the matrix decomposition** 
${\bf Y} \approx {\bf D}{\bf X}$ 
**, which is done using certain assumptions on ** 
${\bf D}$ 
**and/or** 
${\bf X}$.

### Independent Component Analysis (ICA)
ICA has become the most popular data-driven method for fMRI analysis. 

It finds an unmixing matrix ${\bf W}$, such that:
$$\hat{\bf x}_i = {\bf W}{\bf y}_i$$ 
or similarly
$$\hat{\bf X} = {\bf W}{\bf Y}$$ 

ICA enforces an independence assumption on the vectors constructing ${\bf X}$ to perform decomposition. 

### Spatial ICA
There are two ways to address ICA:

1) spatial ICA (sICA): extracts spatially independent brain maps
<img src="figures/spatial_ICA.png" alt="sICA" width="600"/>

1) temporal ICA (sICA): extracts temporally independent time series'
<img src="figures/temporal_ICA.png" alt="sICA" width="600"/>

### Short-comings of ICA
- Some studies have questioned the independence assumption in brain activity analysis for fMRI data.
    - interconnections of neural networks.
    - preprocessing steps such as smoothing, normalization affect independence. 
    
> **We would like to use a more effective data-driven method that overcomes the aforementioned short-comings of ICA in fMRI data analysis.**

## Sparsity vs. Independence
ICA algorithms are known to work well if the components (i.e., elements of ${\bf X}$) have *generalized Gaussian densities*. 
$$p(x) = C e^{-\alpha |x|^\gamma}$$

For example, a special case is when $\gamma=1$, forming a Laplacian prior on ${\bf X}$. 

This prior is commonly known as a sparsity assumption on ${\bf X}$, meaning that most of its elements are $0$. 

## Sparse Dictionary Learning
Using the same model as before:
$${\bf y}_i = {\bf D}{\bf x}_i + \boldsymbol{\epsilon}_i$$

- ${\bf D}$ is called a dictionary of size $m\times k$. 
- $k \gg m$, meaining that ${\bf D}$ contains redundancy. 
- ${\bf x}_i$ is a sparse set of coefficients for the columns of ${\bf D}$. 

In matrix format, sparse dictionary learning solves:
$$ min_{\bf D,X} \parallel {\bf Y} - {\bf DX}\parallel_{F}^2 $$
$$s.t.: \parallel {\bf x}_i\parallel_{0} < \alpha$$

- $\parallel \parallel_0$ is the number of non-zero elements in a vector. 

In sparse dictionary learning for fMRI:
> **Each BOLD signal**
${\bf y}_i$ 
**is represented as a sparse linear combination of the dictionary elements.**

<img src="figures/dictionary_learning_decomposition.png" alt="dict_learning" width="600"/>

Typically, an approximation is solved: 
$$ min_{\bf D,X} \parallel {\bf Y} - {\bf DX}\parallel_{F}^2 $$
$$s.t.: \parallel {\bf x}_i\parallel_{1} < \beta$$

Using $\parallel \parallel_1$ instead of $\parallel \parallel_0$: 
- while inducing sparsity
- is solvable
- is related to Bayesian estimation through the Laplace prior $p(x) = C e^{-\alpha |x|}$. 

### Introducing prior information
One of the benefits of sparse dictionary learning is that it allows introducing prior information on the **spatial and/or temporal structure** of the data. 

For example:
$$ min_{\bf D,X} \parallel {\bf Y} - {\bf DX}\parallel_{F}^2 + \lambda\parallel {\bf D}\parallel_{F}^2$$
$$s.t.: \parallel {\bf x}_i\parallel_{1} < \beta$$

gives smooth dictionary components [1]. 

**[1] M. Yaghoobi, T. Blumensath and M. Davies, “Regularized dictionary learning for sparse approximation”, EUSIPCO 2008.**

## Numerical Results
<img src="figures/simulation_1.png" alt="sICA" width="700"/>

<img src="figures/simulation_2.png" alt="sICA" width="700"/>

## Summary of advantages of sparse dictionary learning
- Sparse dictionary learning (DL) can be connected to ICA. 
- Sparse DL imposes less restrictive assumptions on the data.
- Sparse DL is easier to extend to introduce spatio-temporal structure. 
- The optimization solved in sparse DL is scalable (allows both batch and online processing).
- Sparse DL results are reproducable. 

### Directions for future work
- Extensive multi-subject studies for resting-state fMRI analysis using sparse DL. 
    - Most work on sparse dictionary learning in fMRI is scarce and performed on limited data. 
- Improving the robustness of sparse dictionary learning. 