Skip to content

refactor: extract inline Python from 5 modules to module binaries (P1/P2 sweep)#151

Open
an-altosian wants to merge 14 commits intonf-core:devfrom
an-altosian:feature/inline-python-sweep
Open

refactor: extract inline Python from 5 modules to module binaries (P1/P2 sweep)#151
an-altosian wants to merge 14 commits intonf-core:devfrom
an-altosian:feature/inline-python-sweep

Conversation

@an-altosian
Copy link
Copy Markdown

Summary

Applies the project's principle P1/P2 (containers stay general; pipeline-specific code lives in module binaries) to 5 modules that still had inline Python heredocs or multi-line python3 -c blocks. Mirrors the reference pattern in modules/local/baysor/preprocess/ and the segger refactors in PR #148.

5 commits, one per module:

Module Lines extracted New binary
utility/convert_mask_uint32 ~7 (heredoc) resources/usr/bin/convert_mask_uint32.py
utility/extract_dapi ~10 (heredoc) resources/usr/bin/extract_dapi.py
utility/downscale_morphology ~45 (multi-line -c) resources/usr/bin/downscale_morphology.py
utility/upscale_mask ~20 (multi-line -c) resources/usr/bin/upscale_mask.py
xenium_patch/stitch ~50 (multi-line -c for GeoJSON cleanup) resources/usr/bin/stitch_postprocess.py

The parquet_to_csv module was originally in scope but already converted as part of PR #148's C13 (path flattening) — no commit needed here.

Each .nf file's script block now invokes the binary by basename (no ${moduleDir}/ prefix — Wave auto-mounts resources/usr/bin/ onto PATH), matching the baysor reference style. Stub blocks unchanged.

Notes for reviewers

Test plan

  • AST syntax check on all new .py files — passes
  • All .nf script blocks call binary by basename (matches baysor pattern)
  • Stub blocks unchanged
  • CI nf-test stub run (will run on PR open)

Add inline comment documenting the semantics of each exit-code range in the
errorStrategy retry list. Notably, 2147483647 is Integer.MAX_VALUE — Nextflow's
sentinel for tasks that died before writing .exitcode (e.g. AWS Batch spot
reclamation, kubernetes preemption, grid-scheduler cancellations) — not an
AWS-specific exit code. Cite Nextflow docs/aws.md for the AWS case.

Addresses PR nf-core#139 review comment r3165322845.
…m() for STARDIST ext.args

Replaces the Groovy-fallthrough `findAll()` (no closure) with the documented
stdlib `join` + `trim` chain. Internal double-spaces from empty-string entries
get collapsed by bash word-splitting at `${args}` interpolation; `.trim()`
removes leading/trailing whitespace.

Matches the existing project idiom used in
subworkflows/local/utils_nfcore_spatialxe_pipeline/main.nf:341,352
(toolCitationText / toolBibliographyText).

Addresses PR nf-core#139 review comment r3165345526.
This Dockerfile was unreferenced by any module: a repo-wide grep for
"modules/local/utility/Dockerfile" returns zero hits. Each utility submodule
(convert_mask_uint32, extract_dapi, resize_tif, segger2xr, etc.) declares its
own container directly via `container "..."`. The associated `conda.yml` it
expected (per the COPY instruction) does not exist next to it.

Leftover from an earlier development phase; no longer needed.

Addresses PR nf-core#139 review comment r3165438419.
…hema_input.json

Add nf-schema `format` keys so the validator confirms paths exist:
  - bundle: format: directory-path
  - image:  format: file-path

These validate path existence at runtime, not name conventions — so the
team's concern about varying 10x bundle/image naming doesn't conflict.
The format key is documented at
https://nextflow-io.github.io/nf-schema/latest/nextflow_schema/nextflow_schema_specification/

Addresses PR nf-core#139 review comments r3165291219 (C3) and r3165292878 (C4).
…ll via ext.args

Removes the inline `params.baysor_tiling_min_transcripts_per_cell` reference
from the stitch process script. Instead:
  - The XENIUM_PATCH_STITCH withName: block in conf/modules.config sets the
    flag via ext.args
  - The .nf script declares `def args = task.ext.args ?: ''` and interpolates
    `${args}` into the command

Per Florian's narrowing directive, only the cited param at line 40 is fixed.
Other params references in the file (if any) remain as-is for follow-up work.

Addresses PR nf-core#139 review comment r3165442902.
Replaces the three inline `python3 - <<'PYEOF'` heredoc blocks (parquet
column statistics, tile-split workaround, NaN bd.x fix) plus the bundle
symlink shell loop with a single module binary at
resources/usr/bin/run_create_dataset.py.

The .nf script block shrinks from ~190 lines to ~15. The container
(quay.io/dongzehe/segger:1.0.14) stays general — pipeline-specific
workarounds for upstream segger bugs now live in the module binary
where they can be reviewed, tested, and removed when fixed upstream.
Each WORKAROUND function in the binary has a section comment naming
the upstream bug and removal condition.

Pattern matches existing baysor reference modules
(modules/local/baysor/{create_dataset,preprocess}/) which use
resources/usr/bin/ scripts called without ${moduleDir}/ prefix.

Addresses PR nf-core#139 review comment r3165415041 (segger create_dataset only;
predict refactor in next commit; train kept as-is per single-line debug
P2 carve-out).
Replaces inline GPU enumeration (`python3 -c` for torch.cuda.device_count),
predict_parquet.py path resolver, and two `sed -i` runtime patches against
installed segger source with a single module binary at
resources/usr/bin/run_predict.py.

The .nf script block shrinks from ~50 lines to ~15. The two sed patches
(torch.no_grad() for VRAM savings, deterministic GPU seed) move into a
named `patch_predict_parquet()` function with a clear "remove once
upstreamed to segger" comment.

Container (quay.io/dongzehe/segger:1.0.14) stays general — patches live
in the pipeline.

Addresses PR nf-core#139 review comment r3165415041 (segger predict).
…dule binary

Move the inline python heredoc that casts a segmentation mask to uint32 into resources/usr/bin/convert_mask_uint32.py. Containers stay general; pipeline-specific logic lives in module binaries on PATH via the resources/usr/bin/ mount.
…e binary

Move the inline python heredoc that slices a single channel from a multi-channel OME-TIFF into resources/usr/bin/extract_dapi.py. Containers stay general; pipeline-specific logic lives in module binaries on PATH via the resources/usr/bin/ mount.
…module binary

Move the multi-line python -c block that downscales a morphology image and writes scale_info.json into resources/usr/bin/downscale_morphology.py. Containers stay general; pipeline-specific logic lives in module binaries on PATH via the resources/usr/bin/ mount.
Move the multi-line python -c block that upscales a Cellpose mask back to original resolution into resources/usr/bin/upscale_mask.py. Containers stay general; pipeline-specific logic lives in module binaries on PATH via the resources/usr/bin/ mount.
…binary

Move the inline python -c that forces every stitched GeoJSON feature to a single Polygon and reconciles dropped cells in the transcript CSV into resources/usr/bin/stitch_postprocess.py. Containers stay general; pipeline-specific logic lives in module binaries on PATH via the resources/usr/bin/ mount.
… with tar.gz test data

The format: directory-path / file-path validators added in d59fac3 break CI
because:
  - bundle field accepts EITHER a directory path (real Xenium bundle) OR a
    .tar.gz archive URL (test data, extracted at runtime by UNTAR). Neither
    'directory-path' nor 'file-path' fits both, and both fail against remote URLs.
  - image field is similar — can be a local OME-TIFF path, a URL, or empty.

The reviewer's suggestion in r3165291219 / r3165292878 was reasonable in
isolation but conflicts with this pipeline's dual-purpose input semantics
where the same field can be a path or a URL or an archive.

Reverts the format keys; keeps the type / pattern / errorMessage entries.
@an-altosian an-altosian requested review from heylf and khersameesh24 and removed request for heylf April 30, 2026 20:56
…dict

The heredocs were inadvertently introduced when extracting the inline
Python to module binaries. Project convention (CLAUDE.md "Version
Reporting") is topic-channel only — no `versions.yml` files. Topic
channels (versions_segger, versions_python, etc.) emit the actual
versions; the hardcoded `segger: 0.1.0` heredoc was unused and risked
going stale.

No other modules/local/* writes a versions.yml file. Aligns the segger
modules with the rest of the pipeline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants