refactor: extract inline Python from 5 modules to module binaries (P1/P2 sweep)#151
Open
an-altosian wants to merge 14 commits intonf-core:devfrom
Open
refactor: extract inline Python from 5 modules to module binaries (P1/P2 sweep)#151an-altosian wants to merge 14 commits intonf-core:devfrom
an-altosian wants to merge 14 commits intonf-core:devfrom
Conversation
Add inline comment documenting the semantics of each exit-code range in the errorStrategy retry list. Notably, 2147483647 is Integer.MAX_VALUE — Nextflow's sentinel for tasks that died before writing .exitcode (e.g. AWS Batch spot reclamation, kubernetes preemption, grid-scheduler cancellations) — not an AWS-specific exit code. Cite Nextflow docs/aws.md for the AWS case. Addresses PR nf-core#139 review comment r3165322845.
…m() for STARDIST ext.args
Replaces the Groovy-fallthrough `findAll()` (no closure) with the documented
stdlib `join` + `trim` chain. Internal double-spaces from empty-string entries
get collapsed by bash word-splitting at `${args}` interpolation; `.trim()`
removes leading/trailing whitespace.
Matches the existing project idiom used in
subworkflows/local/utils_nfcore_spatialxe_pipeline/main.nf:341,352
(toolCitationText / toolBibliographyText).
Addresses PR nf-core#139 review comment r3165345526.
This Dockerfile was unreferenced by any module: a repo-wide grep for "modules/local/utility/Dockerfile" returns zero hits. Each utility submodule (convert_mask_uint32, extract_dapi, resize_tif, segger2xr, etc.) declares its own container directly via `container "..."`. The associated `conda.yml` it expected (per the COPY instruction) does not exist next to it. Leftover from an earlier development phase; no longer needed. Addresses PR nf-core#139 review comment r3165438419.
…hema_input.json Add nf-schema `format` keys so the validator confirms paths exist: - bundle: format: directory-path - image: format: file-path These validate path existence at runtime, not name conventions — so the team's concern about varying 10x bundle/image naming doesn't conflict. The format key is documented at https://nextflow-io.github.io/nf-schema/latest/nextflow_schema/nextflow_schema_specification/ Addresses PR nf-core#139 review comments r3165291219 (C3) and r3165292878 (C4).
…ll via ext.args
Removes the inline `params.baysor_tiling_min_transcripts_per_cell` reference
from the stitch process script. Instead:
- The XENIUM_PATCH_STITCH withName: block in conf/modules.config sets the
flag via ext.args
- The .nf script declares `def args = task.ext.args ?: ''` and interpolates
`${args}` into the command
Per Florian's narrowing directive, only the cited param at line 40 is fixed.
Other params references in the file (if any) remain as-is for follow-up work.
Addresses PR nf-core#139 review comment r3165442902.
Replaces the three inline `python3 - <<'PYEOF'` heredoc blocks (parquet
column statistics, tile-split workaround, NaN bd.x fix) plus the bundle
symlink shell loop with a single module binary at
resources/usr/bin/run_create_dataset.py.
The .nf script block shrinks from ~190 lines to ~15. The container
(quay.io/dongzehe/segger:1.0.14) stays general — pipeline-specific
workarounds for upstream segger bugs now live in the module binary
where they can be reviewed, tested, and removed when fixed upstream.
Each WORKAROUND function in the binary has a section comment naming
the upstream bug and removal condition.
Pattern matches existing baysor reference modules
(modules/local/baysor/{create_dataset,preprocess}/) which use
resources/usr/bin/ scripts called without ${moduleDir}/ prefix.
Addresses PR nf-core#139 review comment r3165415041 (segger create_dataset only;
predict refactor in next commit; train kept as-is per single-line debug
P2 carve-out).
Replaces inline GPU enumeration (`python3 -c` for torch.cuda.device_count), predict_parquet.py path resolver, and two `sed -i` runtime patches against installed segger source with a single module binary at resources/usr/bin/run_predict.py. The .nf script block shrinks from ~50 lines to ~15. The two sed patches (torch.no_grad() for VRAM savings, deterministic GPU seed) move into a named `patch_predict_parquet()` function with a clear "remove once upstreamed to segger" comment. Container (quay.io/dongzehe/segger:1.0.14) stays general — patches live in the pipeline. Addresses PR nf-core#139 review comment r3165415041 (segger predict).
…dule binary Move the inline python heredoc that casts a segmentation mask to uint32 into resources/usr/bin/convert_mask_uint32.py. Containers stay general; pipeline-specific logic lives in module binaries on PATH via the resources/usr/bin/ mount.
…e binary Move the inline python heredoc that slices a single channel from a multi-channel OME-TIFF into resources/usr/bin/extract_dapi.py. Containers stay general; pipeline-specific logic lives in module binaries on PATH via the resources/usr/bin/ mount.
…module binary Move the multi-line python -c block that downscales a morphology image and writes scale_info.json into resources/usr/bin/downscale_morphology.py. Containers stay general; pipeline-specific logic lives in module binaries on PATH via the resources/usr/bin/ mount.
Move the multi-line python -c block that upscales a Cellpose mask back to original resolution into resources/usr/bin/upscale_mask.py. Containers stay general; pipeline-specific logic lives in module binaries on PATH via the resources/usr/bin/ mount.
…binary Move the inline python -c that forces every stitched GeoJSON feature to a single Polygon and reconciles dropped cells in the transcript CSV into resources/usr/bin/stitch_postprocess.py. Containers stay general; pipeline-specific logic lives in module binaries on PATH via the resources/usr/bin/ mount.
… with tar.gz test data The format: directory-path / file-path validators added in d59fac3 break CI because: - bundle field accepts EITHER a directory path (real Xenium bundle) OR a .tar.gz archive URL (test data, extracted at runtime by UNTAR). Neither 'directory-path' nor 'file-path' fits both, and both fail against remote URLs. - image field is similar — can be a local OME-TIFF path, a URL, or empty. The reviewer's suggestion in r3165291219 / r3165292878 was reasonable in isolation but conflicts with this pipeline's dual-purpose input semantics where the same field can be a path or a URL or an archive. Reverts the format keys; keeps the type / pattern / errorMessage entries.
5 tasks
khersameesh24
approved these changes
Apr 30, 2026
…dict The heredocs were inadvertently introduced when extracting the inline Python to module binaries. Project convention (CLAUDE.md "Version Reporting") is topic-channel only — no `versions.yml` files. Topic channels (versions_segger, versions_python, etc.) emit the actual versions; the hardcoded `segger: 0.1.0` heredoc was unused and risked going stale. No other modules/local/* writes a versions.yml file. Aligns the segger modules with the rest of the pipeline.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Applies the project's principle P1/P2 (containers stay general; pipeline-specific code lives in module binaries) to 5 modules that still had inline Python heredocs or multi-line
python3 -cblocks. Mirrors the reference pattern inmodules/local/baysor/preprocess/and the segger refactors in PR #148.5 commits, one per module:
utility/convert_mask_uint32resources/usr/bin/convert_mask_uint32.pyutility/extract_dapiresources/usr/bin/extract_dapi.pyutility/downscale_morphology-c)resources/usr/bin/downscale_morphology.pyutility/upscale_mask-c)resources/usr/bin/upscale_mask.pyxenium_patch/stitch-cfor GeoJSON cleanup)resources/usr/bin/stitch_postprocess.pyThe
parquet_to_csvmodule was originally in scope but already converted as part of PR #148's C13 (path flattening) — no commit needed here.Each
.nffile's script block now invokes the binary by basename (no${moduleDir}/prefix — Wave auto-mountsresources/usr/bin/onto PATH), matching the baysor reference style. Stub blocks unchanged.Notes for reviewers
devafter PR Address PR #139 review: segger refactor + 5 targeted fixes #148's commits landed there. The diff above includes PR Address PR #139 review: segger refactor + 5 targeted fixes #148's 7 commits + this PR's 5 + the cherry-picked schema revert.def args = task.ext.args ?: ''threading inxenium_patch/stitch/main.nfis preserved.ast.parsesyntax check passed on all 5 new.pyfiles.Test plan
.pyfiles — passes.nfscript blocks call binary by basename (matches baysor pattern)