Merge pull request #112 from neuroscout/fetch_utils

ENH: Add high-level predictor fetching utilities
neuroscout · Dec 20, 2022 · a84996f · a84996f
2 parents 56bd1a5 + 2d08071
commit a84996f
Show file tree

Hide file tree

Showing 13 changed files with 426 additions and 9 deletions.
diff --git a/docs/source/_autosummary/pyns.fetch_utils.fetch_images.rst b/docs/source/_autosummary/pyns.fetch_utils.fetch_images.rst
@@ -0,0 +1,6 @@
+pyns.fetch\_utils.fetch\_images
+===============================
+
+.. currentmodule:: pyns.fetch_utils
+
+.. autofunction:: fetch_images
diff --git a/docs/source/_autosummary/pyns.fetch_utils.fetch_predictors.rst b/docs/source/_autosummary/pyns.fetch_utils.fetch_predictors.rst
@@ -0,0 +1,6 @@
+pyns.fetch\_utils.fetch\_predictors
+===================================
+
+.. currentmodule:: pyns.fetch_utils
+
+.. autofunction:: fetch_predictors
diff --git a/docs/source/_autosummary/pyns.fetch_utils.get_paths.rst b/docs/source/_autosummary/pyns.fetch_utils.get_paths.rst
@@ -0,0 +1,6 @@
+pyns.fetch\_utils.get\_paths
+============================
+
+.. currentmodule:: pyns.fetch_utils
+
+.. autofunction:: get_paths
diff --git a/docs/source/_autosummary/pyns.fetch_utils.install_dataset.rst b/docs/source/_autosummary/pyns.fetch_utils.install_dataset.rst
@@ -0,0 +1,6 @@
+pyns.fetch\_utils.install\_dataset
+==================================
+
+.. currentmodule:: pyns.fetch_utils
+
+.. autofunction:: install_dataset
diff --git a/docs/source/_autosummary/pyns.fetch_utils.rst b/docs/source/_autosummary/pyns.fetch_utils.rst
@@ -0,0 +1,33 @@
+pyns.fetch\_utils
+=================
+
+.. automodule:: pyns.fetch_utils
+
+
+
+
+
+
+
+   .. rubric:: Functions
+
+   .. autosummary::
+      :toctree:                                          
+
+      fetch_images
+      fetch_predictors
+      get_paths
+      install_dataset
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/source/api.rst b/docs/source/api.rst
@@ -7,4 +7,5 @@ API
    :recursive:
 
    pyns.api
-   pyns.endpoints
+   pyns.endpoints
+   pyns.fetch_utils
diff --git a/docs/source/fetching.rst b/docs/source/fetching.rst
@@ -0,0 +1,115 @@
+Fetching predictors & images
+=============================
+
+To facilitate creating custom analysis workflows, `pyNS` provides a number of high-level utilities for fetching 
+predictors from the Neuroscout API, and the corresponding images from the preprocessed BIDS dataset.
+
+.. note::
+
+    Analysis pipelines created using these utilities will not be centrally registered on Neuroscout, and 
+    will not be available to other users by the Neuroscout API or web interface.
+
+    If your analysis type is supported by `Neuroscout-CLI <https://neuroscout-cli.readthedocs.io/en/latest/>`_ 
+    (e.g. summary statistics GLM), it is recommended to use the 
+    web interface to create your analysis or the follow the guide for :doc:`analyses` using pyNS.
+
+    If you use these data in a publication, please cite the following paper:
+
+    Alejandro de la Vega, Roberta Rocca, Ross W Blair, Christopher J Markiewicz, Jeff Mentch, James D Kent, Peer Herholz, Satrajit S Ghosh, Russell A Poldrack, Tal Yarkoni (2022). *Neuroscout, a unified platform for generalizable and reproducible fMRI research*. eLife 11:e79277
+    https://doi.org/10.7554/eLife.79277
+
+    In addition, please cite the original dataset(s), and the predictor extractors you use.
+
+
+--------------------------------------
+Fetching & re-sampling predictor data
+--------------------------------------
+
+The method :meth:`pyns.fetch_utils.fetch_predictors` can be used to fetch predictor data, 
+resample it to the TR of the images, and return it as a pandas DataFrame.
+
+You only need two things: a list of predictors, and the name of the BIDS dataset.
+Optionally, you can also restrict the data to a subset of subjects, runs or tasks (reccomended for testing).
+
+.. code-block:: python
+
+    fetch_predictors(predictor_names=['speech', 'rms'], dataset_name='Budapest', 
+        subject='sid000005', run=[1, 2], resample=True, rescale=False)
+
+
+
++----+---------+------------+--------------+--------------+-------+-----------+----------+
+|    |   onset |   duration |       speech |          rms |   run | subject   |   run_id |
++====+=========+============+==============+==============+=======+===========+==========+
+|  0 |       0 |          1 |  9.5801e-06  |  6.18876e-07 |     1 | sid000005 |     1433 |
++----+---------+------------+--------------+--------------+-------+-----------+----------+
+|  2 |       1 |          1 | -2.57011e-05 | -1.49298e-06 |     1 | sid000005 |     1433 |
++----+---------+------------+--------------+--------------+-------+-----------+----------+
+|  4 |       2 |          1 |  6.755e-05   |  3.50004e-06 |     1 | sid000005 |     1433 |
++----+---------+------------+--------------+--------------+-------+-----------+----------+
+|  6 |       3 |          1 | -0.000173993 | -7.91888e-06 |     1 | sid000005 |     1433 |
++----+---------+------------+--------------+--------------+-------+-----------+----------+
+|  8 |       4 |          1 |  0.000439006 |  1.70871e-05 |     1 | sid000005 |     1433 |
++----+---------+------------+--------------+--------------+-------+-----------+----------+
+
+
+This will return a pandas DataFrame with the predictors resampled to the TR (in this case 0.33s) 
+with `onset` and `duration` columns. In addition, columns describing the entities identifying each columns
+(e.g. `subject`, `run`...) are included as columns.
+
+Note that you can choose to rescale the predictors to have a mean of 0 and standard deviation of 1, by setting
+`rescale=True`. This operation will occur prior to densification and resampling of variables.
+
+It's possible to retrieve `BIDSRunVariableCollection` collection (`return_type='collection'`), which can be used to
+apply further transformations to the data.
+
+.. note::
+
+    To learn about low-level utilities for fetching predictors, see the :doc:`querying` documentation.
+
+-----------------------------
+Fetching preprocessed images
+-----------------------------
+
+.. note::
+
+    Datalad is required to download images. See `DataLad documentation <https://handbook.datalad.org>`_
+    for installation instructions.
+
+The method :meth:`pyns.fetch_utils.fetch_images` facilitates downloading preprocessed images from the
+Neuroscout datasets. It can be used to download images for a single subject, or for all subjects in a
+dataset.
+
+Simply provide a directory where Neuroscout datasets should be installed, and the dataset name.
+Optionally, you can also restrict the data to a subset of subjects, runs or tasks (reccomended for testing).
+
+.. code-block:: python
+    
+    preproc_dir, img_paths = fetch_images('Budapest', '/tmp/', subject=subject)
+    img_paths[0]
+    
+    <BIDSImageFile filename='/tmp/Budapest/fmriprep/sub-sid000005/func/sub-sid000005_task-movie_run-1_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz'>
+
+:meth:`pyns.fetch_utils.fetch_images` installs the dataset using datalad, and returns the path to the 
+preprocessed dataset, as well as a list of `BIDSImageFile` objects for each image.
+
+The `BIDSImageFile` objects can be used to load the images into memory using `nibabel <https://nipy.org/nibabel/>`_, 
+and can be used to extract metadata about the image, such as the associated entities:
+
+.. code-block:: python
+
+    target = img_paths[0]
+    img = target.get_image()
+    target.get_entities()
+    
+     {'datatype': 'func',
+      'desc': 'preproc',
+      'extension': '.nii.gz',
+      'run': 1,
+      'space': 'MNI152NLin2009cAsym',
+      'subject': 'sid000005',
+      'suffix': 'bold',
+      'task': 'movie'}
+
+
+Using these methods you can easily create custom analysis workflows. 
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -1,15 +1,20 @@
-Welcome to pyNS's documentation!
-===================================
+pyNS: Neuroscout API client documentation
+=========================================
 
 .. image:: neuroscout-logo.svg
   :width: 400
   :alt: Neuroscout Logo
 
 
-**pyNS** is the Python client library for accessing the `Neuroscout API <https://neuroscout.org/api>`_
-
-**pyNS** enables advanced used cases not supported by the `neuroscout.org <https://neuroscout.org>`_
-web-based analysis builder, such as batch-creation of analyses, or meta-analytic applications.
+**pyNS** is the Python client for the `Neuroscout API <https://neuroscout.org/api>`_, allowing users 
+to programmatically query and interactive with the Neuroscout database. This allows users to
+create analyses, query for analyses, and download analysis results.
+
+**pyNS** provides a number of high-level functions for common tasks (that would typically require 
+multiple API calls), such as creating and registering analyses, and fetching predictor and imaging data directly.
+
+Advanced use cases include: batch-creation of analyses (e.g. for meta-analysis) and the
+creation of custom analysis pipelines.
 
 **pyNS** mirrors the official Neuroscout API with a Pythonic interface.
 Note that the best reference for the API is the official `API docs <https://neuroscout.org/api>`_
@@ -30,4 +35,5 @@ Contents
    quickstart
    querying
    analyses
+   fetching
    api
diff --git a/docs/source/querying.rst b/docs/source/querying.rst
@@ -84,6 +84,10 @@ Under the hood, `pyNS` looks up the ``dataset_id`` and ``task_id`` for the given
 Getting the predictor data
 ----------------------------------
 
+.. note::
+
+    High-level utilities are available to facilitate this process. See the :doc:`fetching` documentation.
+
 An important aspect of `pyNS` is the ability to retrieve moment by moment events for specific predictors.
 
 The simplest way is to simply use ``predictor_id`` to query for a specific Predictor, for a specific ``run_id``:

diff --git a/optional_requirements.txt b/optional_requirements.txt
@@ -0,0 +1,2 @@
+pybids
+pandas
diff --git a/pyns/__init__.py b/pyns/__init__.py
@@ -8,7 +8,7 @@
 from .api import Neuroscout
 from . import endpoints
 
-__all__ = ['Neuroscout', 'endpoints']
+__all__ = ['Neuroscout', 'endpoints', 'fetch_utils']
 
 __author__ = ['Alejandro de la Vega']
 __license__ = 'MIT'
diff --git a/pyns/endpoints/base.py b/pyns/endpoints/base.py
@@ -152,7 +152,7 @@ def _id_to_entities(df):
             else:
                 names = {
                     r: endpoint.get(r)['name'] 
-                    for r in df[col].unique()
+                    for r in df[col].dropna().unique()
                     }
                 df[col.replace('_id', '_name')] = df[col].map(names)
     return df