Skip to content
This repository has been archived by the owner on Mar 7, 2022. It is now read-only.

Commit

Permalink
Test inclusion of image in README
Browse files Browse the repository at this point in the history
  • Loading branch information
lukassnoek committed Aug 3, 2016
1 parent 615ea8c commit da30c7c
Show file tree
Hide file tree
Showing 2 changed files with 63 additions and 134 deletions.
197 changes: 63 additions & 134 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
skbold - utilities for machine learning analyses on BOLD-fMRI data
==================================================================
skbold - utilities and tools for machine learning on BOLD-fMRI data
===================================================================

.. image:: https://travis-ci.org/lukassnoek/skbold.svg?branch=develop
:target: https://travis-ci.org/lukassnoek/skbold
Expand All @@ -11,29 +11,58 @@ skbold - utilities for machine learning analyses on BOLD-fMRI data
.. image:: https://coveralls.io/repos/github/lukassnoek/skbold/badge.svg
:target: https://coveralls.io/github/lukassnoek/skbold

Functional MRI (fMRI) data has traditionally been analyzed by calculating average
signal differences between conditions. In the past decade, however,
pattern-based type of analyses have become increasingly popular. Especially
machine-learning based analyses experience a surge in popularity among
(cognitive) neuroscientists.

While many great resources for domain-general machine learning exists
(e.g. `scikit-learn <www.scikit-learn.org>`_,
`caret <http://topepo.github.io/caret/index.html>`_, and
`libsvm <https://www.csie.ntu.edu.tw/~cjlin/libsvm>`_), few resources are
available specifically for machine learning analyses of neuroimaging data
(but see `nilearn <https://nilearn.github.io/>`_).

As my PhD involved mainly machine learning analyses of fMRI data, I decided
to bundle my (relevant) code into this package, which provides a nice
opportunity for me to develop my programming skills by forcing me to write
concise, readable, and efficient code.

The skbold-package contains mostly extensions and utilities for machine learning
analyses of fMRI data. Its structure/setup draws heavily upon the *scikit-learn*
(sklearn, hence the name) machine learning library in Python. Also, credit should
be given to `this <http://rasbt.github.io/mlxtend/>`_ repository, as it has
a similar purpose and served as an example for much of my code.
The Python package ``skbold`` offers a set of tools and utilities for
machine learning and RSA-type analyses of functional MRI (BOLD-fMRI) data.
Instead of (largely) reinventing the wheel, this package builds upon an existing
machine learning framework in Python: `scikit-learn <www.scikit-learn.org>`_.
Specifically, it offers a module with scikit-learn-style 'transformers' (with
the corresponding scikit-learn API) and some (experimental) scikit-learn
type estimators.

Next to these transformer- and estimator-functionalities, ``skbold`` offers
a new data-structure, the ``Mvp`` (Multivoxel pattern), that allows for an
efficient way to store and access data and metadata necessary for multivoxel
analyses of fMRI data. A novel feature of this data-structure is that it is
able to easily load data from `FSL <www.fmrib.ox.ac.uk/fsl>`_-FEAT output
directories. As the ``Mvp`` object is available in two 'options', they are
explained in more detail below.

MvpWithin vs. MvpBetween
------------------------
At the core, an ``Mvp``-object is simply a collection of data - a 2D array
of samples by features - and fMRI-specific metadata necessary to perform
customized preprocessing and feature engineering. However, machine learning
analyses, or more generally any type of multivoxel-type analysis (i.e. MVPA),
can be done in two basic ways.

One way is to perform analyses *within subjects*. This means that a model is
fit on each subjects' data separately. Data, in this context, often refers to
single-trial data, in which each trial comprises a sample in our data-matrix and
the values per voxel constitute our features. This type of analysis is
alternatively called *single-trial decoding*, and is often performed as an
alternative to massively (whole-brain) univariate analysis. Ultimately, this
type of analysis aims to predict some kind of attribute of the trials (for
example condition/class membership in classification analyses or some
continuous feature in regression analyses). Ultimately, group-analyses may
be done on subject-specific analysis metrics (such as classification accuracy
or R2-score) and group-level feature-importance maps may be calculated to
draw conclusions about the model's predictive power and the spatial
distribution of informative features, respectively.

.. image:: img/MvpWithin.png

With the apparent increase in large-sample neuroimaging datasets, another
type of analysis starts to become feasible, which we'll call *between subject*
analyses. In this type of analyses, single subjects constitute the data's
samples and a corresponding single multivoxel pattern constitutes the data's
features.

Below, a typical analysis workflow using ``skbold`` is described
to get a better idea of the package's functionality.

An example workflow
-------------------
Blabla

Installing skbold
-----------------
Expand All @@ -51,31 +80,6 @@ Or, alternatively, download the package as a zip-file from Github, unzip, and ru

$ python setup.py install

Functionality
-------------

Currently, the package contains the following features.

- A class that transforms FSL first-level directories into observation X feature arrays;
- Classes that provide scikit-learn style *transformers*;
- Some custom classifiers (voting- and stacked generalization classifiers).

Below, some basic examples are given.

Generating observation X feature arrays
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The class ``Mvp`` (from the ``core.py`` module) functions as a container for
the objects used in the skbold-package. This object is subclassed to provide
methods that allow to generate and load in data. Currently, the only
implementation is ``Fsl2mvp`` (see ``data2mvp`` module), which converts first-level
GLM estimates from an FSL FEAT directory to an array of observations X features.
Importantly, this function assumes that a single-trial design is used (i.e. each
trial is modelled as a separate regressor) and that the analysis is done within
subjects (as opposed to between subjects, with each subject as an 'observation').

For example, given a FEAT directory, the following code would create a
trial X voxel array (stored as an Mvp-object; more on this later).

.. code:: python
Expand All @@ -96,90 +100,15 @@ trial X voxel array (stored as an Mvp-object; more on this later).
# Transform directory
fsl2mvp.glm2mvp()
Calling the method ``glm2mvp()`` creates a directory *mvp_data* with a data-file
(hdf5) and a corresponding header-file (cPickle).

Alteratively, there is a command line function ``glm2mvp`` that has the same
functionality as outlined in the example above::

$ glm2mvp -h
usage: glm2mvp [-h] [-d DIRECTORY] [-m MASK] [-t THRESHOLD] [-b BETA2T]
[-s SPACE] [-r [REMOVE [REMOVE ...]]]

$ cd /home/users/data/sub002
$ glm2mvp -d pwd -b True -s epi

Loading Mvp objects
~~~~~~~~~~~~~~~~~~~

To load Mvp-objects, the class DataHandler from the ``utils`` module can be used.
An example is given below:

.. code:: python
from utils import DataHandler
# Arguments
directory = '/home/user/data/subject_001 # assumes the existence of mvp_data dir!
# Initialize object
loader = DataHandler()
# Load data!
mvp = loader.load_separate_sub(sub_dir=directory)
The loaded Mvp-object contains all the necessary data and meta-data necessary
for a proper machine learning analysis using scikit-learn.

Structure of Mvp-objects
~~~~~~~~~~~~~~~~~~~~~~~~

The Mvp class contains the following main attributes:

- ``X`` : numpy-ndarray of length = [n_samples, n_features]. This contains the actual patterns!
- ``y`` : list, containing the target class as numeric labels.

Other useful metadata is stored in the following attributes:

- ``mask_index`` : index applied to the original whole-brain data
- ``mask_shape`` : shape of original mask, most likely MNI152 (2mm) shape (91 * 109 * 91)

Transforming data using transformer-classes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A major part of the skbold-package is the ``transformers`` module, which contains
scikit-learn style ``transformer``-objects that adhere to the consistent
scikit-learn API, using the same ``.fit()`` and ``.transform()`` methods. The major
advantage of directly inheriting from scikit-learn's Transformer objects is
that they can be seamlessly integrated in `Pipelines <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_
and `gridsearch <http://scikit-learn.org/stable/modules/grid_search.html>`_ procedures.

In the following example, we'll create a scikit-learn pipeline to extract
the patterns from only a single brain region from the whole-brain data
contained in mvp.X (using the ``RoiIndexer`` transformer) and perform a type of
univariate feature selection based on the average euclidean distance between
classes (using the ``MeanEuclidean`` transformer).

.. code:: python
from utils import DataHandler
from transformers import RoiIndexer, MeanEuclidean
from sklearn.pipeline import Pipeline
loader = DataHandler()
mvp = loader.load_separate_sub('/home/user/data/subject_001')
mask = 'Frontal_pole.nii.gz' # masks are included in skbold!
roiindexer = RoiIndexer(mvp=mvp, mask=mask, mask_threshold=0)
mean_euclidean = MeanEuclidean(cutoff=2)
# You could sequentially transform the data, as such:
X_tmp = roiindexer.fit(mvp.X).transform(mvp.X)
X_final = mean_euclidean.fit(X_tmp, mvp.y).transform(X_tmp)
# Or you could use a pipeline!
pipeline = Pipeline([('roiindex', roiindexer), ('meaneuc', mean_euclidean)])
X_tmp = pipeline.fit_transform(mvp.X, mvp.y)
Credits
~~~~~~~
At the advent of this package, I knew next to nothing about Python programming
in general and packaging in specific. The `mlxtend
<https://github.com/rasbt/mlxtend>`_ package has been a great 'template' and
helped a great deal in structuring the current package. Also, `Steven
<https://github.com/StevenM1>`_ has contributed some very nice features as
part of his internship. Lastly, `Joost <https://github.com/y0ast`_ has been
a major help in virtually every single phase of this package!

License and contact
~~~~~~~~~~~~~~~~~~~
Expand Down
Binary file added img/MvpWithin.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit da30c7c

Please sign in to comment.