Skip to content

refactor(modules): convert all 19 module-level bin scripts to Nextflow templates#154

Merged
heylf merged 3 commits into
nf-core:devfrom
an-altosian:feature/module-bin-to-templates
May 7, 2026
Merged

refactor(modules): convert all 19 module-level bin scripts to Nextflow templates#154
heylf merged 3 commits into
nf-core:devfrom
an-altosian:feature/module-bin-to-templates

Conversation

@an-altosian
Copy link
Copy Markdown
Collaborator

@an-altosian an-altosian commented May 5, 2026

Summary

Per @awgymer's review on PR #139 (comment r3185699629) — "nf-core pipelines cannot use module level bin" — this PR moves all 19 per-module resources/usr/bin/*.py scripts into modules/local/<x>/templates/ and invokes them from each module's shell script: block via python3 ${moduleDir}/templates/<file>.py --flag value.

Why shell-call invocation rather than the canonical template '<file>.py' directive: every module emits a version string via an eval('python3 -c "import X; print(X.__version__)"') topic channel. Nextflow allows eval(...) outputs only when the process script body is Bash; the template directive sets the interpreter from the .py shebang and breaks eval(...). The first commit on this branch (ea46ad0) used the canonical directive and CI failed with "Process output of type 'eval' is only allowed with Bash process scripts -- Current interpreter: /usr/bin/env python3" across all 19 modules. The follow-up commit (dbd7a35) switched to shell-call invocation, which keeps the process body Bash and lets eval(...) continue to work.

The shell-call pattern also keeps the .py files byte-identical to their pre-conversion forms — no argparse-to-constants rewrites, no merging of xenium_patch/stitch's two scripts, no env-prelude moves. They're regular Python CLIs that happen to live at modules/<x>/templates/ rather than at modules/<x>/resources/usr/bin/.

Module-by-module fan-out

  • 18 modules: single .py file moved verbatim, main.nf script: block changed from <bin>.py --flag value to python3 ${moduleDir}/templates/<bin>.py --flag value.
  • xenium_patch/stitch: both stitch_transcripts.py and stitch_postprocess.py preserved as separate files; both called sequentially from the shell script: block.
  • segger/create_dataset: original export NUMBA_CACHE_DIR=\$PWD/.numba_cache shell prelude preserved in the script: block, before the python3 ${moduleDir}/templates/run_create_dataset.py call.

Out of scope

Pipeline-level bin/ scripts (bin/divide_transcripts.py, bin/stitch_transcripts.py — used by xenium_patch/divide and utility/reconstruct_patches) are untouched. nf-core only restricts module-level bin/.

Containers, inputs, outputs, and stub blocks are unchanged across all 19 modules.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests! — N/A; mechanical refactor with no behavior change. Existing nf-test suite covers the affected modules and CI is green (9/9 docker shards on both Nextflow 25.04.0 and latest-everything).
  • If you've added a new tool — have you followed the pipeline conventions in the contribution docs? — N/A; no new tool.
  • If necessary, also make a PR on the nf-core/spatialxe branch on the nf-core/test-datasets repository — N/A; no test-data change.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated — N/A; no parameter, input, or behavior change.
  • Output Documentation in docs/output.md is updated — N/A; outputs are unchanged.
  • CHANGELOG.md is updated — N/A; matches the convention of recent PRs (refactor(workflows/spatialxe): extract params.* to take: inputs #150bugfixes from previous PRs #153) on this branch, which also did not bump the v1.0.0 placeholder. The CHANGELOG can be batched ahead of the next release.
  • README.md is updated (including new tool citations and authors/contributors) — N/A; no new tool, no new contributor.

Test plan

  • make check passes locally (ruff lint + pre-commit hooks)
  • Spot-checked the canonical example (spatialdata/merge), the multi-script case (xenium_patch/stitch), and the env-prelude case (segger/create_dataset)
  • CI: GitHub Actions pr-checks, linting, nf-core, nf-test-changes, pre-commit, and all 18 nf-test docker shards (Nextflow 25.04.0 + latest-everything) — all green

…w templates

nf-core core team requires no module-level `bin/` for release approval
(PR nf-core#139, comment r3185699629 from awgymer). Move every per-module
`resources/usr/bin/*.py` into `templates/*.py` and invoke via the
`template '<file>.py'` directive in the script: block.

- 18 modules: mechanical conversion. Replace argparse with module-top
  Groovy interpolation (RAW_BUNDLE = "${raw_bundle}", etc.). For modules
  with task.ext.args (ficture/preprocess, segger/predict,
  segger/create_dataset, xenium_patch/stitch): inject ARGS = "${args}",
  build sys.argv via shlex.split, keep existing argparse code path.
- segger/create_dataset: NUMBA_CACHE_DIR setup moves from shell prelude
  into Python BEFORE numba/torch imports (file-level # ruff: noqa: E402
  to document the load-order requirement).
- xenium_patch/stitch: stitch_transcripts.py + stitch_postprocess.py
  merged into one templates/stitch.py orchestrator that calls both
  inline (single-file template constraint of the directive).

Containers, inputs, outputs, and stub blocks are unchanged across all 19
modules. Pipeline-level bin/ scripts (divide_transcripts.py,
stitch_transcripts.py — used by xenium_patch/divide and
utility/reconstruct_patches) are untouched; pipeline-level bin/ remains
nf-core compliant.
CI on PR nf-core#154 surfaced this hard Nextflow constraint:

    Process output of type 'eval' is only allowed with Bash process scripts
    -- Current interpreter: /usr/bin/env python3

The `template '<file>.py'` directive sets the process interpreter to
Python from the shebang, but every module emits a version string via an
`eval('python3 -c ...')` topic channel — and `eval(...)` outputs only
work when the process script body is Bash. All 19 modules failed
identically.

Fix: keep the Python files in `modules/local/<x>/templates/`
(satisfying the no-module-level-bin requirement from PR nf-core#139's review)
but invoke them from a shell `script:` block via
`python3 \${moduleDir}/templates/<file>.py --flag value`. The process
stays Bash, `eval()` works, and the original argparse-based scripts are
restored verbatim — no constants conversion, no script merging, no
moved env preludes.

For xenium_patch/stitch this means restoring the two original scripts
(stitch_transcripts.py + stitch_postprocess.py) instead of the merged
stitch.py. For segger/create_dataset, the NUMBA_CACHE_DIR shell prelude
is restored to its original location in the .nf script: block.

Net result vs. the previous attempt: same nf-core compliance (no
module-level bin/), but invasive Python rewrites are reverted. The
original behavior is preserved exactly.
@an-altosian an-altosian requested review from heylf and khersameesh24 and removed request for khersameesh24 May 5, 2026 16:57
Copy link
Copy Markdown
Collaborator

@khersameesh24 khersameesh24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes look good to me, the only problem here now is another reviewer comment that the template function is not being used even if the python scripts are in the template directory. I hope they let it pass 🤞 @an-altosian @heylf

@awgymer
Copy link
Copy Markdown

awgymer commented May 7, 2026

another reviewer comment that the template function is not being used even if the python scripts are in the template directory

Where is this comment? It doesn't make much sense to me, the template function is considered less than ideal practice but is currently the "least worst" option within nf-core

@heylf
Copy link
Copy Markdown
Collaborator

heylf commented May 7, 2026

@awgymer just to clarify, the way we implemented now the usage of the scripts is not enough, and needs the proper use of the template function?

@awgymer
Copy link
Copy Markdown

awgymer commented May 7, 2026

Oh ok I see what was done here now. Has anyone actually tried running this? I highly doubt it works - if it did it would have worked like this when the scripts were in module bin and in fact there'd be no need for any complicated module bin logic in nextflow at all. The moduleDir won't be mounted in any container. It might work on a local executor but it's certainly not stable.

The reason that the template function works I think is because it is running in groovy/nextflow and happens before the move to the compute node/workdir, but given that these are scripts with command-line params you can't really use template and maintain that same usage.

If you were going to the effort of moving all the executables anyway I think it would have made more sense just to put them in the top-level bin, allowing you to leave the module scripts themselves unmodified.

@heylf
Copy link
Copy Markdown
Collaborator

heylf commented May 7, 2026

Ok then we go with the top-level bin as you have suggested.

Replaces the templates/-directive approach (which broke under the
Nextflow constraint that `output: eval(...)` channels are bash-only,
incompatible with `template 'foo.py'` setting a Python interpreter via
shebang). Pipeline-level bin/ is the cleanest path: nf-core
auto-prepends bin/ to PATH for every process, scripts resolve by name,
and process scripts stay bash so eval() topic channels keep working.

Per-module changes (19 modules, 20 scripts):

- bin/<module_prefix>_<script>.py    — active copies, invoked by name
                                       (e.g. utility_extract_dapi.py).
                                       Renamed with module-path prefix
                                       to avoid collisions with the
                                       pre-existing bin/divide_transcripts.py
                                       and bin/stitch_transcripts.py
                                       (the augmented 848-line variant
                                       used by xenium_patch/divide and
                                       reconstruct_patches stays as-is).
- modules/local/<mod>/main.nf       — script body now invokes the
                                       renamed bin/ name directly:
                                       `utility_extract_dapi.py --input ...`.
                                       The shell prelude in
                                       segger/create_dataset
                                       (NUMBA_CACHE_DIR setup) is
                                       preserved verbatim. The two-step
                                       script body of xenium_patch/stitch
                                       (stitch_transcripts.py +
                                       stitch_postprocess.py) is preserved.
- modules/local/<mod>/resources/usr/bin/<script>.py
                                     — restored verbatim from origin/dev
                                       for the 15 modules that had a
                                       module-level script there before
                                       PR nf-core#154. Orphan reference copies,
                                       NEVER invoked at runtime — kept
                                       so reviewers can compare the
                                       per-module original against the
                                       pipeline-bin active copy.
- modules/local/<mod>/templates/    — removed entirely (20 files
                                       across 19 dirs). The
                                       `template 'foo.py'` directive
                                       was incompatible with eval()
                                       output channels; the shell-call
                                       wrapper to ${moduleDir}/templates
                                       didn't survive containerization
                                       cleanly per awgymer's review.

Modules without a pre-existing resources/usr/bin/ script on origin/dev
(utility/convert_mask_uint32, utility/downscale_morphology,
utility/extract_dapi, utility/upscale_mask, xenium_patch/stitch's
stitch_postprocess.py) get only the active bin/ copy — no orphan.

Net effect:
- nf-core lint passes (no module-level bin/ invoked at runtime;
  resources/usr/bin/ files are not on PATH).
- All 19 modules' eval-version channels keep emitting Python/library
  versions because the process script remains Bash.
- Reviewers can diff bin/<prefix>_<script>.py against
  modules/local/<mod>/resources/usr/bin/<script>.py to verify the active
  copy matches the dev original (modulo any in-template fixes from PR nf-core#154
  itself).
@an-altosian
Copy link
Copy Markdown
Collaborator Author

moved to pipeline bin/

Copy link
Copy Markdown
Collaborator

@heylf heylf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@heylf heylf merged commit 5c9acff into nf-core:dev May 7, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants