Skip to content

Address PR #139 review: segger refactor + 5 targeted fixes#148

Open
an-altosian wants to merge 9 commits intonf-core:devfrom
an-altosian:dev
Open

Address PR #139 review: segger refactor + 5 targeted fixes#148
an-altosian wants to merge 9 commits intonf-core:devfrom
an-altosian:dev

Conversation

@an-altosian
Copy link
Copy Markdown

@an-altosian an-altosian commented Apr 30, 2026

Summary

Addresses the remaining PR #139 review comments after Florian's PR #147 already landed ~15 of the items. This PR contains 7 focused commits, each mapped 1:1 to a specific review thread for clean SHA-to-thread traceability:

  • 7eedabd docs(base.config): explain errorStrategy exit codes — addresses r3165322845. Clarifies that 2147483647 is Integer.MAX_VALUE, the Nextflow sentinel for tasks killed before writing .exitcode (citing nextflow/processor/TaskRun.groovy and the AWS Batch troubleshooting docs).
  • 1585a9d style(modules.config): replace findAll().join(' ') with join(' ').trim() for STARDIST ext.args — addresses r3165345526. Matches existing project idiom in subworkflows/local/utils_nfcore_spatialxe_pipeline/main.nf:341,352.
  • 74aa917 chore: remove orphan modules/local/utility/Dockerfile — addresses r3165438419. Repo-wide grep confirmed zero references.
  • d59fac3 feat(schema): add format validators (directory-path, file-path) to schema_input.json — addresses r3165291219 (C3) and r3165292878 (C4). Validates path existence at runtime, not name conventions.
  • cf0d42d fix(xenium_patch/stitch): thread baysor_tiling_min_transcripts_per_cell via ext.args — addresses r3165442902. Per Florian's narrowing directive, only the cited param is fixed.
  • 1db7443 refactor(segger): extract create_dataset orchestration to module binary — addresses r3165415041 (segger create_dataset). 3 inline python3 - <<PYEOF heredocs extracted to a single resources/usr/bin/run_create_dataset.py (253 lines) following the baysor/preprocess/ reference pattern. The .nf script block shrinks from ~190 lines to ~15. Each WORKAROUND for upstream segger bugs is named and commented for easy removal when fixed upstream.
  • 381c9e3 refactor(segger): extract predict orchestration to module binary — addresses r3165415041 (segger predict). GPU enumeration + two sed -i runtime patches against predict_parquet.py extracted to resources/usr/bin/run_predict.py (137 lines). The .nf script block shrinks from ~50 lines to ~15.

Out of scope (separate follow-up PRs)

  • PR #151 (open): extract inline Python from 6 other modules (utility/{convert_mask_uint32,extract_dapi,downscale_morphology,upscale_mask}, xenium_patch/stitch postprocess, parquet_to_csv). Same pattern as the segger refactors here.
  • PR #150 (open): extract all 24 unique params.* references in workflows/spatialxe.nf to formal take: workflow inputs (addresses r3165543921 more thoroughly than the cited line alone).

Test plan

  • pre-commit on changed files — passes (prettier, trim trailing whitespace, fix end of files all green)
  • AST parse on both new Python module binaries — passes
  • nextflow config parses without errors after all 7 commits
  • nf-test stub run for test_image_mode (skipped locally due to image-pull timeout; CI will exercise this)
  • Real-data segger run (out of scope — requires GPU; will be exercised post-merge by the next pipeline run)

Notes for reviewers

  • Each commit is small and maps to exactly one PR Release version 1.0.0 #139 review thread. Mapping is in the commit messages and this PR description.
  • The segger refactor preserves stub blocks unchanged, so existing nf-test -stub snapshots should not regenerate.
  • A3.3 (segger train) was not extracted to a binary — its only inline Python is a single-line python3 -c "import torch; print(...)" debug print, which falls under principle P2's carve-out for single-line debug prints. Train has no heredocs, no sed patches, and no orchestration logic warranting a binary.

Add inline comment documenting the semantics of each exit-code range in the
errorStrategy retry list. Notably, 2147483647 is Integer.MAX_VALUE — Nextflow's
sentinel for tasks that died before writing .exitcode (e.g. AWS Batch spot
reclamation, kubernetes preemption, grid-scheduler cancellations) — not an
AWS-specific exit code. Cite Nextflow docs/aws.md for the AWS case.

Addresses PR nf-core#139 review comment r3165322845.
…m() for STARDIST ext.args

Replaces the Groovy-fallthrough `findAll()` (no closure) with the documented
stdlib `join` + `trim` chain. Internal double-spaces from empty-string entries
get collapsed by bash word-splitting at `${args}` interpolation; `.trim()`
removes leading/trailing whitespace.

Matches the existing project idiom used in
subworkflows/local/utils_nfcore_spatialxe_pipeline/main.nf:341,352
(toolCitationText / toolBibliographyText).

Addresses PR nf-core#139 review comment r3165345526.
This Dockerfile was unreferenced by any module: a repo-wide grep for
"modules/local/utility/Dockerfile" returns zero hits. Each utility submodule
(convert_mask_uint32, extract_dapi, resize_tif, segger2xr, etc.) declares its
own container directly via `container "..."`. The associated `conda.yml` it
expected (per the COPY instruction) does not exist next to it.

Leftover from an earlier development phase; no longer needed.

Addresses PR nf-core#139 review comment r3165438419.
…hema_input.json

Add nf-schema `format` keys so the validator confirms paths exist:
  - bundle: format: directory-path
  - image:  format: file-path

These validate path existence at runtime, not name conventions — so the
team's concern about varying 10x bundle/image naming doesn't conflict.
The format key is documented at
https://nextflow-io.github.io/nf-schema/latest/nextflow_schema/nextflow_schema_specification/

Addresses PR nf-core#139 review comments r3165291219 (C3) and r3165292878 (C4).
…ll via ext.args

Removes the inline `params.baysor_tiling_min_transcripts_per_cell` reference
from the stitch process script. Instead:
  - The XENIUM_PATCH_STITCH withName: block in conf/modules.config sets the
    flag via ext.args
  - The .nf script declares `def args = task.ext.args ?: ''` and interpolates
    `${args}` into the command

Per Florian's narrowing directive, only the cited param at line 40 is fixed.
Other params references in the file (if any) remain as-is for follow-up work.

Addresses PR nf-core#139 review comment r3165442902.
Replaces the three inline `python3 - <<'PYEOF'` heredoc blocks (parquet
column statistics, tile-split workaround, NaN bd.x fix) plus the bundle
symlink shell loop with a single module binary at
resources/usr/bin/run_create_dataset.py.

The .nf script block shrinks from ~190 lines to ~15. The container
(quay.io/dongzehe/segger:1.0.14) stays general — pipeline-specific
workarounds for upstream segger bugs now live in the module binary
where they can be reviewed, tested, and removed when fixed upstream.
Each WORKAROUND function in the binary has a section comment naming
the upstream bug and removal condition.

Pattern matches existing baysor reference modules
(modules/local/baysor/{create_dataset,preprocess}/) which use
resources/usr/bin/ scripts called without ${moduleDir}/ prefix.

Addresses PR nf-core#139 review comment r3165415041 (segger create_dataset only;
predict refactor in next commit; train kept as-is per single-line debug
P2 carve-out).
Replaces inline GPU enumeration (`python3 -c` for torch.cuda.device_count),
predict_parquet.py path resolver, and two `sed -i` runtime patches against
installed segger source with a single module binary at
resources/usr/bin/run_predict.py.

The .nf script block shrinks from ~50 lines to ~15. The two sed patches
(torch.no_grad() for VRAM savings, deterministic GPU seed) move into a
named `patch_predict_parquet()` function with a clear "remove once
upstreamed to segger" comment.

Container (quay.io/dongzehe/segger:1.0.14) stays general — patches live
in the pipeline.

Addresses PR nf-core#139 review comment r3165415041 (segger predict).
… with tar.gz test data

The format: directory-path / file-path validators added in d59fac3 break CI
because:
  - bundle field accepts EITHER a directory path (real Xenium bundle) OR a
    .tar.gz archive URL (test data, extracted at runtime by UNTAR). Neither
    'directory-path' nor 'file-path' fits both, and both fail against remote URLs.
  - image field is similar — can be a local OME-TIFF path, a URL, or empty.

The reviewer's suggestion in r3165291219 / r3165292878 was reasonable in
isolation but conflicts with this pipeline's dual-purpose input semantics
where the same field can be a path or a URL or an archive.

Reverts the format keys; keeps the type / pattern / errorMessage entries.
Copy link
Copy Markdown
Collaborator

@khersameesh24 khersameesh24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…dict

The heredocs were inadvertently introduced when extracting the inline
Python to module binaries. Project convention (CLAUDE.md "Version
Reporting") is topic-channel only — no `versions.yml` files. Topic
channels (versions_segger, versions_python, etc.) emit the actual
versions; the hardcoded `segger: 0.1.0` heredoc was unused and risked
going stale.

No other modules/local/* writes a versions.yml file. Aligns the segger
modules with the rest of the pipeline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants