Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REF] Split main workflow from CLI for Python access #442

Merged
merged 25 commits into from
Jun 16, 2020

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented Apr 18, 2020

Closes #440.

This refactor might be more aggressive than the devs might like, so I'm happy to scale it back if necessary.

Changes proposed:

  • Make a new workflows module.
  • Merge parts of heudiconv.cli.run.main() into heudiconv.cli.run.process_args() and move to heudiconv.workflows.run.heudiconv_workflow().
    • heudiconv.cli.run.main() now does as little as possible (i.e., parses the arguments and does a single check that requires the parser's help message.
  • Move the following functions from heudiconv.cli.run to heudiconv.workflows.run:
    • process_extra_commands
    • setup_exceptionhook
    • is_interactive
    • ensure_heuristic_arg
  • Split args structure into individual arguments when passed to heudiconv.workflows.run.heudiconv_workflow()
    • This makes it easier for users to call the workflow directly in Python, as they won't need to build args.
    • It should also make it easier to track updates to the parameters in the parser vs. those used in the workflow.
    • The drawback is that it means updating both the CLI and the workflow function in tandem when changing parameters.
  • Reformat the parser a little bit. I can change this back, of course, but I think it's more readable this way.

@codecov
Copy link

codecov bot commented Apr 18, 2020

Codecov Report

Merging #442 into master will increase coverage by 1.13%.
The diff coverage is 80.89%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #442      +/-   ##
==========================================
+ Coverage   74.99%   76.12%   +1.13%     
==========================================
  Files          35       37       +2     
  Lines        2863     2961      +98     
==========================================
+ Hits         2147     2254     +107     
+ Misses        716      707       -9     
Impacted Files Coverage Δ
heudiconv/main.py 76.19% <76.19%> (ø)
heudiconv/cli/run.py 91.48% <100.00%> (+11.84%) ⬆️
heudiconv/tests/test_main.py 100.00% <100.00%> (ø)
heudiconv/dicoms.py 81.14% <0.00%> (-0.41%) ⬇️
heudiconv/bids.py 82.97% <0.00%> (ø)
heudiconv/info.py 100.00% <0.00%> (ø)
heudiconv/tests/test_utils.py 100.00% <0.00%> (ø)
heudiconv/tests/test_convert.py 100.00% <0.00%> (ø)
heudiconv/utils.py 90.45% <0.00%> (+0.37%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1a7cb5...c49134e. Read the comment docs.

@yarikoptic
Copy link
Member

Thank you @tsalo! It is a worthwhile refactoring. I will review within next days. Cheers

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a "heavy" PR, hard grasp what was moved vs modified even with git diff -w --color-moved=zebra -W master... . But tests passing is a good sign! ;)

My other concern (besides other comments in a few spots) is: I do not like heudiconv.workflows and then even more heudiconv_workflow names. My thinking:

  • With "workflow" I got to immediate get a picture of something like nipype or other workflow engine. I do understand that it is my limited view on the world, and here it could indeed be considered a "workflow" but then every function calling other functions in a sequence could.
  • I do not see other "workflows" to be born under workflows/ so why bother with it.

So, why not to just have heudiconv/main.py with workflow (or run to parallel heudiconv.cli.run; or even heudiconv to parallel all those odd from glob import glob ;) )... ?

@@ -0,0 +1,343 @@
#!/usr/bin/env python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is not intended to be executed directly, so better not bother with shebang. That might interfere with some "smart" setups of IDEs/editors making those files executable for no good reason etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

from ..utils import load_heuristic, anonymize_sid, treat_infofile, SeqInfo
from ..convert import prep_conversion
from ..bids import populate_bids_templates, tuneup_bids_json_files
from ..queue import queue_conversion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while doing such large RFings anyways -- unless order is required due to some import order effects, best to sort those lines (within a group, e.g. here within from .. and separately above among system libraries) -- that would simplify later finding the correct imports etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good now.

queue_args : str or None, optional
Additional queue arguments passed as single string of space-separated
Argument=Value pairs. Default is None.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know an "easy" way around it ATM, may be you would find some, but such duplication of docstrings in here and in cmdline argparse setup eventually would diverge. I don't know an "easy" way since most of the time then people indeed either duplicate like you have done, or come up with some evolved way to generate both cmdline and python interfaces from the same specification (like we did in datalad, and I saw similar ones elsewhere).

In principle it should be possible for many (if not all -- didn't do matching which one is pertinent only here and not cmdline) to take argparse specification and use it to populate a docstring here, and while at it may be even verify consistency of the function. It feels like a common problem, so may be someone already came up with a similar solution? I was looking for the one for click based interfaces but found none... could you check @tsalo around to see if smth like that was already done, or may be just identify a showstopper which would forbid it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried looking around for tools to synchronize argument parsers with associated functions' docstrings, but couldn't find anything. If it's any consolation, we have the same separation in tedana and, while there's no automated control, keeping the function docstring up-to-date with the CLI is something we've been able to do. I could also add a line in the function description telling users to refer to the command-line help for the definitive argument descriptions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't the default click behavior work. see the top example here: https://click.palletsprojects.com/en/7.x/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for any developer it should be clear from looking at the decorators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that that would take whatever's in the function docstring and make it available in the command line help, but when you try help(function) in Python, you get a very long and unrelated string for click.Command instead of the function. I don't think click would help with within-Python documentation.

@tsalo
Copy link
Member Author

tsalo commented Apr 24, 2020

I've moved heudiconv.workflows.heudiconv_workflow() to heudiconv.main.workflow() and eliminated the workflows module. The reason I originally had it was that I assumed that heudiconv.cli.monitor.monitor() would also be converted to a workflow at some point, but I hadn't actually looked at it at the time.

@mgxd
Copy link
Member

mgxd commented May 11, 2020

this looks good to me, thanks!

i'll wait for @yarikoptic's blessing before merging

% ', '.join(get_known_heuristic_names()))


def workflow(dicom_dir_template=None, files=None, subjs=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no sane person should invoke workflow providing any of those as positional args, I will add a leading *, ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen that once, but never used it. It's awesome! I also added a numpydoc Notes section to the docstring to clarify usage for those who, like me, might be confused by the way it's defined. I can remove it if you want though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes are good, thank you!

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the use of former args. and their possible overlap with locally defined variables. (and add that leading *, to the signature of the workflow so nobody burns themselves ;))

if session is not None:
session = session
if locator is not None:
locator = locator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart pycharm greyed out locator in the signature of this function which raised my concern. Here code like

        if args.session is not None:
            session = args.session
        if args.locator is not None:
            locator = args.locator

was, so args. was providing defaults, whenever now loop variables mask away function arguments. This needs to be fixed up. It is pity that apparently we have no tests which would have triggered either of those cases!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a head-scratcher. I combed through the workflow with an eye out for this kind of pattern, but didn't find any other cases.

anon_outdir = args.conv_outdir or outdir
anon_study_outdir = op.join(anon_outdir, locator or '')

# TODO: --datalad cmdline option, which would take care about initiating
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a datalad cmdline option now. Can this be removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess -- good find

tsalo added 3 commits May 12, 2020 22:42
Also kind of hoping that the test failures were random and that they’ll
pass now.
@yarikoptic
Copy link
Member

4 failures in travis now, e.g.

___________________________ test_notop[bidsoptions1] ___________________________
heudiconv/tests/test_heuristics.py:167: in test_notop
    assert op.exists(pjoin(tmppath, 'Halchenko/Yarik/950_bids_test4'))
E   AssertionError: assert False
E    +  where False = <function exists at 0x7fc5bc715bf8>('/tmp/pytest-of-travis/pytest-0/test_notop_bidsoptions1_0/Halchenko/Yarik/950_bids_test4')
E    +    where <function exists at 0x7fc5bc715bf8> = op.exists
E    +    and   '/tmp/pytest-of-travis/pytest-0/test_notop_bidsoptions1_0/Halchenko/Yarik/950_bids_test4' = pjoin('/tmp/pytest-of-travis/pytest-0/test_notop_bidsoptions1_0', 'Halchenko/Yarik/950_bids_test4')

otherwise looks good. After failures addressed, I will try to comb through it locally a bit more and then merge

tsalo added 2 commits May 15, 2020 13:58
My local testing in Docker isn’t working for some reason.
@tsalo
Copy link
Member Author

tsalo commented May 15, 2020

I finally fixed the testing, so I think it's ready for your review.

@tsalo
Copy link
Member Author

tsalo commented Jun 16, 2020

I think that this PR could be helpful for the BIDS Curation GUI BrainHack project (ohbm/hackathon2020#188), so I thought I'd ping it. Is there anything I can do to move this forward or will it have to hold off until after the hackathon?

@yarikoptic
Copy link
Member

FWIW: reviewing it now... gimme some minutes

I guess it was a mental side effect from using lgr.error initially or smth like that
from .utils import get_heuristic_description
print(get_heuristic_description(heuristic, full=True))
else:
raise ValueError("Unknown command %s", command)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix this (present in original version too)

lgr.warning(
"We cannot setup exception hook since not in interactive mode")

sys.excepthook = _pdb_excepthook
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: we have this function not covered by tests -- tested manually seems to still work )

iterarg, iterables = ("files", len(files)) if files else \
("subjects", len(subjs))
queue_conversion(queue, iterarg, iterables, queue_args)
sys.exit(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we must not sys.exit(0) here now -- I will RF so we exit in the CLI invocation

@yarikoptic
Copy link
Member

ok, pushed those few tune ups. as far as I see it -- good to go. Let's wait for green and merge!

@tsalo
Copy link
Member Author

tsalo commented Jun 16, 2020

Thanks @yarikoptic!

@yarikoptic yarikoptic merged commit f4c2dd2 into nipy:master Jun 16, 2020
@yarikoptic yarikoptic added this to the 0.9.0 milestone Dec 23, 2020
yarikoptic added a commit that referenced this pull request Dec 23, 2020
Various improvements and compatibility/support (dcm2niix, datalad,
duecredit) changes.  Major change is placement of output files to the
target output directory during conversion.

- #454 zenodo referencing in README.rst and support for ducredit for
  heudiconv and reproin heuristic
- #445 more tutorial references in README.md

- [#485][] placed files during conversion right away into the target
  directory (with a `_heudiconv???` suffix, renamed into ultimate target
  name later on), which avoids hitting file size limits of /tmp ([#481][]) and
  helped to avoid a regression in dcm2nixx 1.0.20201102
- #477 replaced `rec-<magnitude|phase>` with `part-<mag|phase>` now
  that BIDS supports the part entity
- #473 made default for CogAtlasID to be a TODO URL
- #459 made AcquisitionTime used for acq_time scans file field
- #451 retained sub-second resolution in scans files
- #442 refactored code so there is now heudiconv.main.workflow for
  more convenient use as a Python module

- minimal version of nipype set to 1.2.3 to guarantee correct handling
  of DWI files ([#480][])
- `heudiconvDCM*` temporary directories are removed now ([#462][])
- compatibility with DataLad 0.13 ([#464][])

- #443 pathlib as a dependency (we are Python3 only now)

* tag 'v0.9.0':
  Add a helper rule to upload to pypi
  update changelog reference as part of prep release
  [DATALAD RUNCMD] prepare the release
  CHANGELOG entry for 0.9.0
yarikoptic added a commit that referenced this pull request Dec 23, 2020
Various improvements and compatibility/support (dcm2niix, datalad,
duecredit) changes.  Major change is placement of output files to the
target output directory during conversion.

- #454 zenodo referencing in README.rst and support for ducredit for
  heudiconv and reproin heuristic
- #445 more tutorial references in README.md

- [#485][] placed files during conversion right away into the target
  directory (with a `_heudiconv???` suffix, renamed into ultimate target
  name later on), which avoids hitting file size limits of /tmp ([#481][]) and
  helped to avoid a regression in dcm2nixx 1.0.20201102
- #477 replaced `rec-<magnitude|phase>` with `part-<mag|phase>` now
  that BIDS supports the part entity
- #473 made default for CogAtlasID to be a TODO URL
- #459 made AcquisitionTime used for acq_time scans file field
- #451 retained sub-second resolution in scans files
- #442 refactored code so there is now heudiconv.main.workflow for
  more convenient use as a Python module

- minimal version of nipype set to 1.2.3 to guarantee correct handling
  of DWI files ([#480][])
- `heudiconvDCM*` temporary directories are removed now ([#462][])
- compatibility with DataLad 0.13 ([#464][])

- #443 pathlib as a dependency (we are Python3 only now)

* tag 'v0.9.0':
  Do no bother ensuring that version changed - should be no changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make main workflow callable from Python
4 participants