Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data and doc string Examples #7

Closed
binarybottle opened this issue May 12, 2013 · 8 comments
Closed

Data and doc string Examples #7

binarybottle opened this issue May 12, 2013 · 8 comments

Comments

@binarybottle
Copy link
Member

Throughout the Mindboggle code base, I've included Examples in the documentation with lines like the following:

>>> path = os.environ['MINDBOGGLE_DATA']
>>> sulci_file = os.path.join(path, 'arno', 'features', 'sulci.vtk')

where MINDBOGGLE_DATA is an environment variable set according to the instructions in http://mindboggle.info/users/installation.html.

​Is this reasonable, or is there a better way for users to try out functions?

I also created these examples with the goal of testing code within doc strings with sphinx, and of carefully unit testing everything, but haven't had time to do this.​

@satra
Copy link
Member

satra commented May 12, 2013

you could consider two options:

  1. A mindboggle data package: pip install mindboggle-data and it will install the package together with necessary data elements
  2. a mindboggle function: from mindboggle import get_test_data

The reason i would stay away from using the data for doctests depends on how large the datasets are. for doctests you can craft a dataset that's fairly light and included with mindboggle, but leave regression and other tests to unit-tests. you can decorate those to skip the longer tests if the larger test data are not available.

@ohinds
Copy link
Contributor

ohinds commented May 13, 2013

+1 for (1) and satra's answer.

@forrestbao
Copy link
Collaborator

Actually I prefer Arno's old way. I think it's easy for users to point to where they store their data. If we use the other two options, would it be difficult for users to run MindBoggle pipeline on their own data? For example, do they need to define paths in mindboggle.get_test_data ?

@satra
Copy link
Member

satra commented May 15, 2013

@forrestbao: users should be able to run minboggle independent of whether this test data exists or not or the environment variable exists or not. if mindboggle's core code is dependent on the environment variable, then that data is integral to mindboggle and should be distributed as a dependency (via a mindboggle-data package).

doctests should not be dependent on large data - you want those things to run very quickly and to demonstrate the point and potentially code coverage rather than use them to perform regression tests. in fact data used for doc-tests should be part of the mindboggle package i think.

@forrestbao
Copy link
Collaborator

Thank you for your explanation @satra. I see. This is for test data. Then I +1 for your (1) answer too.

@binarybottle
Copy link
Member Author

in addition to the test data for running the examples, the data that users might need to run mindboggle include:

  1. the DKT40 or DKT100 atlas
  2. the freesurfer templates made from Mindboggle-101 data
  3. a pickle file containing fundus likelihood depth/curvature distribution training data

#1 might not be necessary if we mandate that users run the newest freesurfer with DKT40 labeling.
#2 might not be necessary if we disable multi-atlas multi-registration-based labeling as an alternative to #1.
#3 is currently necessary to run the compute_likelihood() function.

@binarybottle
Copy link
Member Author

  1. so the idea is that in addition to the mindboggle software being available as a github repository, it would also be available with the data in my previous comment as well as test data as part of a distributed package? and to make it available via pip install, what more would i need to do to the present code base?
  2. when you say "doctest", this does not necessarily mean a test within a docstring that is run when executing all tests, does it? is it a good idea to have externally executable tests in the docstrings, to ensure that the documentation and examples are current? if so, how do you set up and run such a test?
  3. i work best from examples. could someone please write a unit test that i can model all other unit tests after?

@satra
Copy link
Member

satra commented May 16, 2013

  1. single source of mindboggle
    a. mindboggle as is with any necessary data
    b. a separate mindboggle-data package containing data only

    for an example see: http://nipy.sourceforge.net/nibabel/devel/data_pkg_design.html#data-package-design

  2. doctests are both examples and tests

    when you say "doctest", this does not necessarily mean a test within a docstring that is run when executing all tests, does it?

    yes it does

    is it a good idea to have externally executable tests in the docstrings, to ensure that the documentation and examples are current? if so, how do you set up and run such a test?

    don't know what this means. but you should always do a test before building docs, to ensure that the docs are built on tests that pass.

  3. regarding unit-tests there are plenty around: again, see nipy.

finallly:

  • once you open up the repo, you should setup continuous integration testing with travis, so that every pull-request is checked to see if it breaks any tests.
  • you should also setup regression testing with larger workflows that you don't want to run everyday but perhaps on a weekly basis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants