New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing workflows #695

Merged
merged 38 commits into from Jan 13, 2016

Conversation

Projects
None yet
4 participants
@matthieudumont
Contributor

matthieudumont commented Aug 4, 2015

Introducing the workflows structure with the median_otsu demonstration workflow

def median_otsu_bet(input_file, out_dir, save_masked=False, median_radius=4,
numpass=4, autocrop=False, vol_idx=None, dilate=None):

This comment has been minimized.

@arokem

arokem Aug 4, 2015

Member

docstring?

from dipy.segment.mask import median_otsu
def median_otsu_bet(input_file, out_dir, save_masked=False, median_radius=4,

This comment has been minimized.

@Garyfallidis

Garyfallidis Aug 4, 2015

Member

I am suggesting that all the functions in workflows should end with the word flow.
def median_otsu_flow
and that all parameters should be strings by default. In that way there will be no need for parsing the arguments outside of the flow. That would make life easier when integrating with nipype and client/server web interfaces.
Look for inspiration here:
https://github.com/Garyfallidis/dipy/blob/slr_tutorials/dipy/workflows/segment.py#L54

This comment has been minimized.

@Garyfallidis

Garyfallidis Aug 4, 2015

Member

Of course you will need to write a docstring with an explanation of the parameters and perhaps an example of usage. The good news is that most of this information can then be copied in the dipy/bin file.

mask_fname = fname + '_mask' + ext
if out_dir == '':

This comment has been minimized.

@Garyfallidis

Garyfallidis Aug 4, 2015

Member

Okay so here you say that if empty string then use the directory of fpath. Nice.

if out_dir == '':
out_dir_path = dirname(fpath)
elif not isabs(out_dir):

This comment has been minimized.

@Garyfallidis

Garyfallidis Aug 4, 2015

Member

Here you say that if out_dir is not an absolute path you can still append it in the directory of fpath. Good idea! I will change my workflows to do the same.

def median_otsu_bet(input_file, out_dir, save_masked=False, median_radius=4,
numpass=4, autocrop=False, vol_idx=None, dilate=None):
for fpath in glob(input_file):

This comment has been minimized.

@Garyfallidis

Garyfallidis Aug 4, 2015

Member

Because this allows it to run with multiple files it would be nice to have some print out too. Showing what is being currently processed.

parser.add_argument('--out_dir', type=str, default='')
parser.add_argument('--save_masked', action='store_true')
parser.set_defaults(save_masked=False)
parser.add_argument('--radius', type=int, default=4)

This comment has been minimized.

@Garyfallidis

Garyfallidis Aug 4, 2015

Member

This is nice. Apart from the missing documentation/help/example.
But here I am suggesting a radical change which I believe will save us from concentrating on the cmds and ArgumentParsers and work more on the actual workflow. So, my suggestion is that we should assume that all inputs of workflows are always strings and therefore we will cast the needer parameters inside the workflow. I am still playing in my head with this idea but I think it will simplify many things in the relative close to far future.

So, no casting will happen in the ArgumentParser too. All inputs are strings and are given to the relevant workflow.
@arokem, @matthieudumont can you think a bit this suggestion? Are the advantages clear or do I need to elaborate more?

This comment has been minimized.

@Garyfallidis

Garyfallidis Aug 4, 2015

Member

Hmm... think of this for a bit as a client/server problem. Let's say we had a client that would get some inputs and send them to the server to run the workflow. The client is only able to communicate with the server by sending the inputs as strings. So, I think that the most generic design which we can have for the workflows would be that all parameters are strings. But hey I may be wrong. Happy to hear other opinions about this.

@@ -0,0 +1,18 @@
#! /usr/bin/env python

This comment has been minimized.

@Garyfallidis

Garyfallidis Aug 4, 2015

Member

The name is dipy_median_otsu
Also it is not executable. You need to change the permissions of the file. Or we have to make it
executable when we install it. Not sure.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Aug 5, 2015

@chrisfilo, @oesteban and @arokem with this workflow design I think it would be trivial to use nipype to call these workflows from dipy. Can you think of this and let me know your thoughts? @matthieudumont and me and hopefully others too will be working on making many of those available very soon in dipy. We want to have workflows that will be called very easily from other pipelines but which also process multiple files and hopefully parameters. The trick here is to have in dipy.workflows all workflows with string params. In this PR we give an example using median_otsu which is an algorithm for scalp removal from EPI images. Let me know what you think. Thanks in advance.

@chrisfilo

This comment has been minimized.

Member

chrisfilo commented Aug 5, 2015

Yes it would be fairly easy to wrap such functions. I'm not sure if "workflow" is the best name (since it suggest many processing steps and long running time), but it's just a minor point.

I know we talked about this numerous times, but I would like to once again encourage you to write a Nipype interface instead of argparse. This way writing code once you get:

  • command line functionality
  • integration with Nipype
  • integration with CBrain (through boutiques)
@arokem

This comment has been minimized.

Member

arokem commented Aug 5, 2015

I'm with @chrisfilo on this one. Seems like a duplication of effort to do this here, and if we start down the path, we will have to eventually reinvent all the wheels that nipype has already invented.

For example, it's clear that there are already emerging patterns here, that appear as boiler-plate in the code in this PR. Such things as checking whether output locations exist, whether the files required exist (and if not, what needs to be done), and so on.

Seems like a better use of time to write simplified APIs in dipy for processes that require many steps (e.g. registration with the SyN algorithm, see my attempts over on #673), and write workflows in nipype.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Aug 5, 2015

Hi @chrisfilo. Thank you for your quick feedback. Workflow is used in different projects with a different meaning. Nonetheless, I think it is a good name exactly because in dipy.workflows many functions/methods are called from the different individual workflows. The workflow introduced here is now one of the simplest but there there will be other more complicated ones. Also there will be some generic ones where you will be able to do a complete dwi analysis by running a single command.

About the nipype interfaces. As I said before I am very happy to write nipype interfaces. But also I want to be as general as possible. So I would like to support both ArgParser and Nipype and calling this from a web client and from another script etc.

So what I want you to tell me is what would be the best way to write an inferface for dipy.workflows.segment.median_otsu_flow.

Is this a good example to start from?
https://github.com/nipy/nipype/blob/master/nipype/interfaces/dipy/tracks.py

If I remember correctly I need to write an
InputSpec, OutputSpec and a BaseInterface
is this correct?

So assuming that all input and outputs are given as string. Can you give me a recommendation of how this classes would look?

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Aug 5, 2015

Hi @arokem. I disagree that moving all the workflows in nipype will be a complete solution for dipy. I think we really need functions that handle io without extra dependencies. Also it will be much easier to have our quality assurance workflows tested first from the dipy developers and then submit the best ones to nipype. And these functions can be also called from our visualization module right now without waiting for a nipype release. I would like to play with a new technology before making radical decisions. So please do not rush into conclusions.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Aug 5, 2015

As about checking if the files exist. I think these are trivial to write. But our scripts also allow in a beautiful way to run multiple subjects sitting in different directories (using * and other wild-cards) and possibly even multiple parameters and multiple outputs from only one command. I love this option. For example, you can write

dipy_median_otsu "Data/subj_1*0/t1.nii.gz" ...

and this would process all subjects starting with 1 and ending with 0.

If I remember correctly from what @chrisfilo said when we met in OHBM this is still experimental or under development in nipype.

@chrisfilo

This comment has been minimized.

Member

chrisfilo commented Aug 5, 2015

This is a good starting point:
http://nipy.org/nipype/devel/interface_specs.html

Python example:
http://nipy.org/nipype/devel/python_interface_devel.html

Real world example from nipy:
https://github.com/nipy/nipype/blob/master/nipype/interfaces/nipy/utils.py#L28

On the subject of command line inputs. Your current syntax is equivalent to how bash is dealing with wildcards, but will only work if users will specify parameters in parentheses. I would recommend leaving wild-cards expansion to bash (it's more intuitive for any command line user) and writing your function in such way that accepts lists of files instead of strings with wildcards.

In nipype input specification list of files are described this way:

input_files = InputMultiPath(File(exists=True), desc='list of files')

this will accept both "/path/file" and ["/path/file"] as well as ["/path/file1", "/path/file2"].

Let me know if you have any specific questions!

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Aug 6, 2015

Hi @chrisfilo thanks for the links! Cool, I will have something ready for you to see asap. Will do my best.

Aha, here you say
Your current syntax is equivalent to how bash is dealing with wildcards, but will only work if users will specify parameters in parentheses.

I suppose you mean quotes not parentheses. In my investigation I realized that if I leave the terminal language to parse for multiple files the commands will not be multiplatform. Yes, the user will have to write those quotes but if he does that makes the dipy/bin commands exactly the same to run across many platforms and no assumption is needed for bash or for operating system. That means that those can be run from Windows too. Which is an advantage for a library. I think it will be easy for them to learn to use quotes. And if they don't they can always use the nipype version or just run the commands with single files.

@chrisfilo

This comment has been minimized.

Member

chrisfilo commented Aug 6, 2015

A modified version of this solution: http://stackoverflow.com/a/12501893
would be a more robust way of dealing with the windows problem. This way
you can avoid quotes altogether and use nargs in argparse. We can include
this in nipype_cmd if it will make your life easier.

Best,
Chris
On Aug 5, 2015 7:02 PM, "Eleftherios Garyfallidis" notifications@github.com
wrote:

Hi @chrisfilo https://github.com/chrisfilo thanks for the links! Cool,
I will have something ready for you to see asap. Will do my best.

Aha, here you say
Your current syntax is equivalent to how bash is dealing with wildcards,
but will only work if users will specify parameters in parentheses.

I suppose you mean quotes not parentheses. In my investigation I realized
that if I leave the terminal language to parse for multiple files the
commands will not be multiplatform. Yes, the user will have to write those
quotes but if he does that makes the dipy/bin commands exactly the same to
run across many platforms and no assumption is needed for bash or for
operating system. That means that those can be run from Windows too. Which
is an advantage for a library. I think it will be easy for them to learn to
use quotes. And if they don't they can always use the nipype version or
just run the commands with single files.


Reply to this email directly or view it on GitHub
#695 (comment).

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Aug 6, 2015

Cool! I need to get hands on with your suggestions. Will get back to you asap! 👍

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Sep 12, 2015

Hi just wanted to give a quick feedback here so that the others now how this goes. A few days ago I proposed a new design to @matthieudumont which will allow automatic generation of cmd interfaces and nipype interfaces from the workflow functions using introspection. Mr. Dumont is currently working and extending this idea first for the cmd interfaces and after we have a first version of this I will also work on the objects that will generate the nipype interfaces automatically too.
If you want a sneak peek right here and right now here is an example of this idea implemented on the median_otsu interface
https://github.com/Garyfallidis/dipy/blob/introducing_workflows/bin/dipy_median_otsu
https://github.com/Garyfallidis/dipy/blob/introducing_workflows/dipy/workflows/base.py
https://github.com/Garyfallidis/dipy/blob/introducing_workflows/dipy/workflows/segment.py
Or you can wait until this PR is updated.

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Sep 14, 2015

@matthieudumont there is some weird fail with python 3. Can you rebase and see if you continue getting the same error?

import inspect
try:

This comment has been minimized.

@arokem

arokem Sep 18, 2015

Member

You might want to use the optional_package machinery here. For an example, see: https://github.com/nipy/dipy/blob/master/dipy/viz/fvtk.py#L32

def _select_dtype(self, text):
text = text.lower()
if text.find('str') != -1:

This comment has been minimized.

@arokem

arokem Sep 18, 2015

Member

Optionally:

'str' in text:
Parameters
----------
input_files : string
Path to the input volumes. This path may contain wildcards to process

This comment has been minimized.

@arokem

arokem Sep 18, 2015

Member

Only nifti files, right?

This comment has been minimized.

@matthieudumont

matthieudumont Sep 21, 2015

Contributor

I guess any format supported by Nibabel would do.

@arokem

This comment has been minimized.

Member

arokem commented Sep 18, 2015

Sorry - just a few things I noticed along the way. Looks like it's going in a good direction, with much less boilerplate code than before - that's good. Could you say a little bit more about how this also generates nipype workflows?

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Sep 18, 2015

Hi @arokem. The nipype generator is not yet in. But the idea is that now that we can get the exact input and output parameters from the introspectiveparser we can generate python files with the InputSpects and OutputSpecs automatically written. Without you writing a single piece of code. Clear? Please let me know if it is not.

@arokem

This comment has been minimized.

Member

arokem commented Sep 19, 2015

Yeah. More or less. So the plan is to have some script that goes through
the workflows tree and generates the nipype code?

On Fri, Sep 18, 2015 at 4:06 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Hi @arokem https://github.com/arokem. The nipype generator is not yet
in. But the idea is that now that we can get the exact input and output
parameters and from the introspectiveparser we can generate python files
with the InputSpects and OutputSpecs automatically written. Without you
writing a single piece of code. Clear?


Reply to this email directly or view it on GitHub
#695 (comment).

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Sep 19, 2015

Yes @arokem. And then we can do the same with GUIs, web interfaces etc. Let the flows generate the interfaces ;) It seems this will work pretty well. But lets not be overenthusiastic and see if it works in practice. But yeah things look very positive right now. It seems we can greatly speedup our development.

Hi @matthieudumont this looks great I would suggest to add a test function to test the IntrospectiveArgParser with a dummy workflow with a large range of different parameters where different action strategies would be required.

Also it would be nice to update the other bin/commands to the workflow design. So, some refactoring will be needed. The tensor_fit should probably change too because right now it is calling fsl's bet etc which is not relevant. I know you have a better version.

Finally, we need to investigate the use of wildcards along different operating systems and how we will deal with that. There were some suggestions from Chris on this. GGT!

@arokem

This comment has been minimized.

Member

arokem commented Sep 20, 2015

OK - that's cool. It seems to me that you should be able to generate the command line interfaces as well, just from introspection of the dipy.workflows module. That is bin/dipy_median_otsu should also possibly be auto-generated. Presumably they're all going to look the same?

@@ -8,7 +8,7 @@
from scipy.ndimage.filters import median_filter
try:
from skimage.filter import threshold_otsu as otsu
from skimage.filters import threshold_otsu as otsu

This comment has been minimized.

@arokem

arokem Sep 20, 2015

Member

Is this a change in the skimage API, or a BF in dipy?

This comment has been minimized.

@matthieudumont

matthieudumont Sep 21, 2015

Contributor

no, idk why I modified and committed that sorry!

This comment has been minimized.

@Garyfallidis

Garyfallidis Sep 21, 2015

Member

I did this change. There was a warning that skimage has deprecated skimage.filter and that now we should use skimage.filters.

This comment has been minimized.

@arokem

arokem Sep 21, 2015

Member

Cool. That answers my question.

On Mon, Sep 21, 2015 at 12:34 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

In dipy/segment/mask.py
#695 (comment):

@@ -8,7 +8,7 @@

from scipy.ndimage.filters import median_filter
try:

  • from skimage.filter import threshold_otsu as otsu
  • from skimage.filters import threshold_otsu as otsu

I did this change. There is a warning that skimage has deprecated
skimage.filter and that now we should use skimage.filters.


Reply to this email directly or view it on GitHub
https://github.com/nipy/dipy/pull/695/files#r40014939.

@arokem

This comment has been minimized.

Member

arokem commented Sep 20, 2015

And yes: +1 on thorough testing of the IAP!

@matthieudumont matthieudumont force-pushed the matthieudumont:introducing_workflows branch from 1deb76f to 0be28a9 Jan 11, 2016

@matthieudumont

This comment has been minimized.

Contributor

matthieudumont commented Jan 11, 2016

(huge changelog because I rebased)

matthieudumont added some commits Jan 11, 2016

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Jan 12, 2016

@arokem if you I agree I think I can merge this.

Next PR should be a cleanup of what is in dipy/bin to actually use the new design. And after that or in parallel start putting in the workflows that @matthieudumont and I have already written.

@arokem

This comment has been minimized.

Member

arokem commented Jan 12, 2016

It seems that utils and segment are mostly untested. I think we should have
tests in place for these before we merge.

On Tue, Jan 12, 2016 at 1:05 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

@arokem https://github.com/arokem if you I agree I think I can merge
this.

Next PR should be a cleanup of what is in dipy/bin to actually use the new
design. And after that or in parallel start putting in the workflows that
@matthieudumont https://github.com/matthieudumont and I have already
written.


Reply to this email directly or view it on GitHub
#695 (comment).

@Garyfallidis

This comment has been minimized.

Member

Garyfallidis commented Jan 13, 2016

Okay @matthieudumont added tests as requested. So, I will go ahead and merge this. But something very important to realize with the workflows is that we should have some buildbots which will run the tests of the command lines of the workflows with large and realistic datasets and create some reports. This is going towards a proper quality assurance direction where we do have Travis (and nipy.builders) that check that the internal functions (API) work correctly and then buildbots that check that a specific command line (workflow) or a series of command lines (workflows) generate the required results. This will be quite a task because we will need to generate some kind of QA reports or automated end-to-end tests. I think with the new dipy.viz module we will be able to have some of these things nicely done. But we do need to meet and discuss were we will set the buildbots. I think that @matthieudumont, @jchoude, @matthew-brett, @arokem and me should be in that meeting and share ideas on how to make this done. Can we have for the beginning a linux machine somewhere which will able to run the commands once day and generate some reports? Will the guys in Berkeley or Sherbrooke be willing to set a machine only for that? Or should we look into a cloud service?

Garyfallidis added a commit that referenced this pull request Jan 13, 2016

@Garyfallidis Garyfallidis merged commit a5dc8d3 into nipy:master Jan 13, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@arokem

This comment has been minimized.

Member

arokem commented Jan 13, 2016

Nice! Thanks for adding the tests -- will help to maintain this code in the
long run.

Re: end-to-end testing, I think that a potential solution would be cloud
machines running a docker container that we would maintain. For an example
of what that might look like, see the docker that I have for building the
docs:

https://github.com/arokem/dipy-docbuild

This gets automatically built into this dockerhub image:

https://hub.docker.com/r/arokem/dipy-docbuild/

Which can then be pulled down onto a cloud machine (e.g. on AWS) and run.
That's what I did on the 0.10 release. I am not sure how to set up an
automated build for this (though I am confident that could be done), but we
probably don't want to run this on every PR, do we? We could also store
some data on S3 (or equivalent), and get the data from there, instead of a
fetch from an http URL. Might help reduce costs.

At least initially, I can put some funding that I have towards running this
on AWS. In the short/medium time-scale, we might be able to get some more
support for this from UW eScience, but we should also think up a way to
maintain this in the longer run. A tip jar, or something :-)

On Wed, Jan 13, 2016 at 11:19 AM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

Merged #695 #695.


Reply to this email directly or view it on GitHub
#695 (comment).

@matthieudumont matthieudumont deleted the matthieudumont:introducing_workflows branch Jun 22, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment