Test inclusion of image in README

lukassnoek · Aug 3, 2016 · da30c7c · da30c7c
1 parent 615ea8c
commit da30c7c
Show file tree

Hide file tree

Showing 2 changed files with 63 additions and 134 deletions.
diff --git a/README.rst b/README.rst
@@ -1,5 +1,5 @@
-skbold - utilities for machine learning analyses on BOLD-fMRI data
-==================================================================
+skbold - utilities and tools for machine learning on BOLD-fMRI data
+===================================================================
 
 .. image:: https://travis-ci.org/lukassnoek/skbold.svg?branch=develop
     :target: https://travis-ci.org/lukassnoek/skbold
@@ -11,29 +11,58 @@ skbold - utilities for machine learning analyses on BOLD-fMRI data
 .. image:: https://coveralls.io/repos/github/lukassnoek/skbold/badge.svg
     :target: https://coveralls.io/github/lukassnoek/skbold
 
-Functional MRI (fMRI) data has traditionally been analyzed by calculating average
-signal differences between conditions. In the past decade, however,
-pattern-based type of analyses have become increasingly popular. Especially
-machine-learning based analyses experience a surge in popularity among
-(cognitive) neuroscientists.
-
-While many great resources for domain-general machine learning exists
-(e.g. `scikit-learn <www.scikit-learn.org>`_,
-`caret <http://topepo.github.io/caret/index.html>`_, and
-`libsvm <https://www.csie.ntu.edu.tw/~cjlin/libsvm>`_), few resources are
-available specifically for machine learning analyses of neuroimaging data
-(but see `nilearn <https://nilearn.github.io/>`_).
-
-As my PhD involved mainly machine learning analyses of fMRI data, I decided
-to bundle my (relevant) code into this package, which provides a nice
-opportunity for me to develop my programming skills by forcing me to write
-concise, readable, and efficient code.
-
-The skbold-package contains mostly extensions and utilities for machine learning
-analyses of fMRI data. Its structure/setup draws heavily upon the *scikit-learn*
-(sklearn, hence the name) machine learning library in Python. Also, credit should
-be given to `this <http://rasbt.github.io/mlxtend/>`_ repository, as it has
-a similar purpose and served as an example for much of my code.
+The Python package ``skbold`` offers a set of tools and utilities for
+machine learning and RSA-type analyses of functional MRI (BOLD-fMRI) data.
+Instead of (largely) reinventing the wheel, this package builds upon an existing
+machine learning framework in Python: `scikit-learn <www.scikit-learn.org>`_.
+Specifically, it offers a module with scikit-learn-style 'transformers' (with
+the corresponding scikit-learn API) and some (experimental) scikit-learn
+type estimators.
+
+Next to these transformer- and estimator-functionalities, ``skbold`` offers
+a new data-structure, the ``Mvp`` (Multivoxel pattern), that allows for an
+efficient way to store and access data and metadata necessary for multivoxel
+analyses of fMRI data. A novel feature of this data-structure is that it is
+able to easily load data from `FSL <www.fmrib.ox.ac.uk/fsl>`_-FEAT output
+directories. As the ``Mvp`` object is available in two 'options', they are
+explained in more detail below.
+
+MvpWithin vs. MvpBetween
+------------------------
+At the core, an ``Mvp``-object is simply a collection of data - a 2D array
+of samples by features - and fMRI-specific metadata necessary to perform
+customized preprocessing and feature engineering. However, machine learning
+analyses, or more generally any type of multivoxel-type analysis (i.e. MVPA),
+can be done in two basic ways.
+
+One way is to perform analyses *within subjects*. This means that a model is
+fit on each subjects' data separately. Data, in this context, often refers to
+single-trial data, in which each trial comprises a sample in our data-matrix and
+the values per voxel constitute our features. This type of analysis is
+alternatively called *single-trial decoding*, and is often performed as an
+alternative to massively (whole-brain) univariate analysis. Ultimately, this
+type of analysis aims to predict some kind of attribute of the trials (for
+example condition/class membership in classification analyses or some
+continuous feature in regression analyses). Ultimately, group-analyses may
+be done on subject-specific analysis metrics (such as classification accuracy
+or R2-score) and group-level feature-importance maps may be calculated to
+draw conclusions about the model's predictive power and the spatial
+distribution of informative features, respectively.
+
+.. image:: img/MvpWithin.png
+
+With the apparent increase in large-sample neuroimaging datasets, another
+type of analysis starts to become feasible, which we'll call *between subject*
+analyses. In this type of analyses, single subjects constitute the data's
+samples and a corresponding single multivoxel pattern constitutes the data's
+features.
+
+Below, a typical analysis workflow using ``skbold`` is described
+to get a better idea of the package's functionality.
+
+An example workflow
+-------------------
+Blabla
 
 Installing skbold
 -----------------
@@ -51,31 +80,6 @@ Or, alternatively, download the package as a zip-file from Github, unzip, and ru
 
 	$ python setup.py install
 
-Functionality
--------------
-
-Currently, the package contains the following features.
-
-- A class that transforms FSL first-level directories into observation X feature arrays;
-- Classes that provide scikit-learn style *transformers*;
-- Some custom classifiers (voting- and stacked generalization classifiers).
-
-Below, some basic examples are given.
-
-Generating observation X feature arrays
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The class ``Mvp`` (from the ``core.py`` module) functions as a container for
-the objects used in the skbold-package. This object is subclassed to provide
-methods that allow to generate and load in data. Currently, the only
-implementation is ``Fsl2mvp`` (see ``data2mvp`` module), which converts first-level
-GLM estimates from an FSL FEAT directory to an array of observations X features.
-Importantly, this function assumes that a single-trial design is used (i.e. each
-trial is modelled as a separate regressor) and that the analysis is done within
-subjects (as opposed to between subjects, with each subject as an 'observation').
-
-For example, given a FEAT directory, the following code would create a
-trial X voxel array (stored as an Mvp-object; more on this later).
 
 .. code:: python
 
@@ -96,90 +100,15 @@ trial X voxel array (stored as an Mvp-object; more on this later).
     # Transform directory
     fsl2mvp.glm2mvp()
 
-Calling the method ``glm2mvp()`` creates a directory *mvp_data* with a data-file
-(hdf5) and a corresponding header-file (cPickle).
-
-Alteratively, there is a command line function ``glm2mvp`` that has the same
-functionality as outlined in the example above::
-
-    $ glm2mvp -h
-      usage: glm2mvp [-h] [-d DIRECTORY] [-m MASK] [-t THRESHOLD] [-b BETA2T]
-             [-s SPACE] [-r [REMOVE [REMOVE ...]]]
-
-    $ cd /home/users/data/sub002
-    $ glm2mvp -d pwd -b True -s epi
-
-Loading Mvp objects
-~~~~~~~~~~~~~~~~~~~
-
-To load Mvp-objects, the class DataHandler from the ``utils`` module can be used.
-An example is given below:
-
-.. code:: python
-
-    from utils import DataHandler
-
-    # Arguments
-    directory = '/home/user/data/subject_001 # assumes the existence of mvp_data dir!
-
-    # Initialize object
-    loader = DataHandler()
-
-    # Load data!
-    mvp = loader.load_separate_sub(sub_dir=directory)
-
-The loaded Mvp-object contains all the necessary data and meta-data necessary
-for a proper machine learning analysis using scikit-learn.
-
-Structure of Mvp-objects
-~~~~~~~~~~~~~~~~~~~~~~~~
-
-The Mvp class contains the following main attributes:
-
-- ``X`` : numpy-ndarray of length = [n_samples, n_features]. This contains the actual patterns!
-- ``y`` : list, containing the target class as numeric labels.
-
-Other useful metadata is stored in the following attributes:
-
-- ``mask_index`` : index applied to the original whole-brain data
-- ``mask_shape`` : shape of original mask, most likely MNI152 (2mm) shape (91 * 109 * 91)
-
-Transforming data using transformer-classes
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-A major part of the skbold-package is the ``transformers`` module, which contains
-scikit-learn style ``transformer``-objects that adhere to the consistent
-scikit-learn API, using the same ``.fit()`` and ``.transform()`` methods. The major
-advantage of directly inheriting from scikit-learn's Transformer objects is
-that they can be seamlessly integrated in `Pipelines <http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html>`_
-and `gridsearch <http://scikit-learn.org/stable/modules/grid_search.html>`_ procedures.
-
-In the following example, we'll create a scikit-learn pipeline to extract
-the patterns from only a single brain region from the whole-brain data
-contained in mvp.X (using the ``RoiIndexer`` transformer) and perform a type of
-univariate feature selection based on the average euclidean distance between
-classes (using the ``MeanEuclidean`` transformer).
-
-.. code:: python
-
-    from utils import DataHandler
-    from transformers import RoiIndexer, MeanEuclidean
-    from sklearn.pipeline import Pipeline
-
-    loader = DataHandler()
-    mvp = loader.load_separate_sub('/home/user/data/subject_001')
-
-    mask = 'Frontal_pole.nii.gz' # masks are included in skbold!
-    roiindexer = RoiIndexer(mvp=mvp, mask=mask, mask_threshold=0)
-    mean_euclidean = MeanEuclidean(cutoff=2)
-
-    # You could sequentially transform the data, as such:
-    X_tmp = roiindexer.fit(mvp.X).transform(mvp.X)
-    X_final = mean_euclidean.fit(X_tmp, mvp.y).transform(X_tmp)
-
-    # Or you could use a pipeline!
-    pipeline = Pipeline([('roiindex', roiindexer), ('meaneuc', mean_euclidean)])
-    X_tmp = pipeline.fit_transform(mvp.X, mvp.y)
+Credits
+~~~~~~~
+At the advent of this package, I knew next to nothing about Python programming
+in general and packaging in specific. The `mlxtend
+<https://github.com/rasbt/mlxtend>`_ package has been a great 'template' and
+helped a great deal in structuring the current package. Also, `Steven
+<https://github.com/StevenM1>`_ has contributed some very nice features as
+part of his internship. Lastly, `Joost <https://github.com/y0ast`_ has been
+a major help in virtually every single phase of this package!
 
 License and contact
 ~~~~~~~~~~~~~~~~~~~

diff --git a/img/MvpWithin.png b/img/MvpWithin.png