refactor(modules): convert all 19 module-level bin scripts to Nextflow templates#154
Conversation
…w templates nf-core core team requires no module-level `bin/` for release approval (PR nf-core#139, comment r3185699629 from awgymer). Move every per-module `resources/usr/bin/*.py` into `templates/*.py` and invoke via the `template '<file>.py'` directive in the script: block. - 18 modules: mechanical conversion. Replace argparse with module-top Groovy interpolation (RAW_BUNDLE = "${raw_bundle}", etc.). For modules with task.ext.args (ficture/preprocess, segger/predict, segger/create_dataset, xenium_patch/stitch): inject ARGS = "${args}", build sys.argv via shlex.split, keep existing argparse code path. - segger/create_dataset: NUMBA_CACHE_DIR setup moves from shell prelude into Python BEFORE numba/torch imports (file-level # ruff: noqa: E402 to document the load-order requirement). - xenium_patch/stitch: stitch_transcripts.py + stitch_postprocess.py merged into one templates/stitch.py orchestrator that calls both inline (single-file template constraint of the directive). Containers, inputs, outputs, and stub blocks are unchanged across all 19 modules. Pipeline-level bin/ scripts (divide_transcripts.py, stitch_transcripts.py — used by xenium_patch/divide and utility/reconstruct_patches) are untouched; pipeline-level bin/ remains nf-core compliant.
CI on PR nf-core#154 surfaced this hard Nextflow constraint: Process output of type 'eval' is only allowed with Bash process scripts -- Current interpreter: /usr/bin/env python3 The `template '<file>.py'` directive sets the process interpreter to Python from the shebang, but every module emits a version string via an `eval('python3 -c ...')` topic channel — and `eval(...)` outputs only work when the process script body is Bash. All 19 modules failed identically. Fix: keep the Python files in `modules/local/<x>/templates/` (satisfying the no-module-level-bin requirement from PR nf-core#139's review) but invoke them from a shell `script:` block via `python3 \${moduleDir}/templates/<file>.py --flag value`. The process stays Bash, `eval()` works, and the original argparse-based scripts are restored verbatim — no constants conversion, no script merging, no moved env preludes. For xenium_patch/stitch this means restoring the two original scripts (stitch_transcripts.py + stitch_postprocess.py) instead of the merged stitch.py. For segger/create_dataset, the NUMBA_CACHE_DIR shell prelude is restored to its original location in the .nf script: block. Net result vs. the previous attempt: same nf-core compliance (no module-level bin/), but invasive Python rewrites are reverted. The original behavior is preserved exactly.
There was a problem hiding this comment.
All changes look good to me, the only problem here now is another reviewer comment that the template function is not being used even if the python scripts are in the template directory. I hope they let it pass 🤞 @an-altosian @heylf
Where is this comment? It doesn't make much sense to me, the |
|
@awgymer just to clarify, the way we implemented now the usage of the scripts is not enough, and needs the proper use of the template function? |
|
Oh ok I see what was done here now. Has anyone actually tried running this? I highly doubt it works - if it did it would have worked like this when the scripts were in module The reason that the If you were going to the effort of moving all the executables anyway I think it would have made more sense just to put them in the top-level |
|
Ok then we go with the top-level |
Replaces the templates/-directive approach (which broke under the
Nextflow constraint that `output: eval(...)` channels are bash-only,
incompatible with `template 'foo.py'` setting a Python interpreter via
shebang). Pipeline-level bin/ is the cleanest path: nf-core
auto-prepends bin/ to PATH for every process, scripts resolve by name,
and process scripts stay bash so eval() topic channels keep working.
Per-module changes (19 modules, 20 scripts):
- bin/<module_prefix>_<script>.py — active copies, invoked by name
(e.g. utility_extract_dapi.py).
Renamed with module-path prefix
to avoid collisions with the
pre-existing bin/divide_transcripts.py
and bin/stitch_transcripts.py
(the augmented 848-line variant
used by xenium_patch/divide and
reconstruct_patches stays as-is).
- modules/local/<mod>/main.nf — script body now invokes the
renamed bin/ name directly:
`utility_extract_dapi.py --input ...`.
The shell prelude in
segger/create_dataset
(NUMBA_CACHE_DIR setup) is
preserved verbatim. The two-step
script body of xenium_patch/stitch
(stitch_transcripts.py +
stitch_postprocess.py) is preserved.
- modules/local/<mod>/resources/usr/bin/<script>.py
— restored verbatim from origin/dev
for the 15 modules that had a
module-level script there before
PR nf-core#154. Orphan reference copies,
NEVER invoked at runtime — kept
so reviewers can compare the
per-module original against the
pipeline-bin active copy.
- modules/local/<mod>/templates/ — removed entirely (20 files
across 19 dirs). The
`template 'foo.py'` directive
was incompatible with eval()
output channels; the shell-call
wrapper to ${moduleDir}/templates
didn't survive containerization
cleanly per awgymer's review.
Modules without a pre-existing resources/usr/bin/ script on origin/dev
(utility/convert_mask_uint32, utility/downscale_morphology,
utility/extract_dapi, utility/upscale_mask, xenium_patch/stitch's
stitch_postprocess.py) get only the active bin/ copy — no orphan.
Net effect:
- nf-core lint passes (no module-level bin/ invoked at runtime;
resources/usr/bin/ files are not on PATH).
- All 19 modules' eval-version channels keep emitting Python/library
versions because the process script remains Bash.
- Reviewers can diff bin/<prefix>_<script>.py against
modules/local/<mod>/resources/usr/bin/<script>.py to verify the active
copy matches the dev original (modulo any in-template fixes from PR nf-core#154
itself).
|
moved to pipeline bin/ |
Summary
Per @awgymer's review on PR #139 (comment r3185699629) — "nf-core pipelines cannot use module level
bin" — this PR moves all 19 per-moduleresources/usr/bin/*.pyscripts intomodules/local/<x>/templates/and invokes them from each module's shellscript:block viapython3 ${moduleDir}/templates/<file>.py --flag value.Why shell-call invocation rather than the canonical
template '<file>.py'directive: every module emits a version string via aneval('python3 -c "import X; print(X.__version__)"')topic channel. Nextflow allowseval(...)outputs only when the process script body is Bash; thetemplatedirective sets the interpreter from the.pyshebang and breakseval(...). The first commit on this branch (ea46ad0) used the canonical directive and CI failed with "Process output of type 'eval' is only allowed with Bash process scripts -- Current interpreter: /usr/bin/env python3" across all 19 modules. The follow-up commit (dbd7a35) switched to shell-call invocation, which keeps the process body Bash and letseval(...)continue to work.The shell-call pattern also keeps the
.pyfiles byte-identical to their pre-conversion forms — no argparse-to-constants rewrites, no merging ofxenium_patch/stitch's two scripts, no env-prelude moves. They're regular Python CLIs that happen to live atmodules/<x>/templates/rather than atmodules/<x>/resources/usr/bin/.Module-by-module fan-out
.pyfile moved verbatim,main.nfscript: block changed from<bin>.py --flag valuetopython3 ${moduleDir}/templates/<bin>.py --flag value.xenium_patch/stitch: bothstitch_transcripts.pyandstitch_postprocess.pypreserved as separate files; both called sequentially from the shell script: block.segger/create_dataset: originalexport NUMBA_CACHE_DIR=\$PWD/.numba_cacheshell prelude preserved in thescript:block, before thepython3 ${moduleDir}/templates/run_create_dataset.pycall.Out of scope
Pipeline-level
bin/scripts (bin/divide_transcripts.py,bin/stitch_transcripts.py— used byxenium_patch/divideandutility/reconstruct_patches) are untouched. nf-core only restricts module-levelbin/.Containers, inputs, outputs, and stub blocks are unchanged across all 19 modules.
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated — N/A; no parameter, input, or behavior change.docs/output.mdis updated — N/A; outputs are unchanged.CHANGELOG.mdis updated — N/A; matches the convention of recent PRs (refactor(workflows/spatialxe): extract params.* to take: inputs #150–bugfixes from previous PRs #153) on this branch, which also did not bump the v1.0.0 placeholder. The CHANGELOG can be batched ahead of the next release.README.mdis updated (including new tool citations and authors/contributors) — N/A; no new tool, no new contributor.Test plan
make checkpasses locally (ruff lint + pre-commit hooks)spatialdata/merge), the multi-script case (xenium_patch/stitch), and the env-prelude case (segger/create_dataset)pr-checks,linting,nf-core,nf-test-changes,pre-commit, and all 18 nf-test docker shards (Nextflow 25.04.0 + latest-everything) — all green