Fix NETMHCSTABANDPAN to support samples without SV data#246
Merged
Conversation
The createNETMHCInput helper used an inner .join() on the SV fasta channel. When the upstream NEOSV process does not run (no structural variants provided in the samplesheet, a common case), the SV channel emits zero items and the inner join silently produces an empty channel, causing NETMHCSTABPAN, NETMHCPAN4, and all downstream subworkflow steps to be skipped. Switch to a left join with remainder: true and default to empty file lists when there is no SV match, so MUT/WT tuples are still emitted per sample whether or not SV data is present. Adds a regression test that drives input[1] = channel.empty() to lock in the SV-less code path.
pintoa1-mskcc
approved these changes
May 14, 2026
Contributor
pintoa1-mskcc
left a comment
There was a problem hiding this comment.
Good for our purposes, if we ever plan on using this in clinical production we would need to output empty files in cases like this, rather than skipping the step
Open
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes a silent failure in the
NETMHCSTABANDPANsubworkflow when samples are run without structural variants — a common case in MSK pipelines where the samplesheet only carries MAF/HLA/CNV inputs.Root cause
createNETMHCInputjoinsch_fasta_and_hlaagainstch_sv_fastawith an inner.join(...). When no SV files are provided in the samplesheet, the upstreamNEOSVprocess never runs, so the SV fasta channel emits zero items. The inner join therefore produces an empty channel, andNETMHCSTABPAN/NETMHCPAN4/NEOANTIGENUTILS_FORMATNETMHCPAN— and everything downstream of them in any pipeline using this subworkflow — silently get skipped.Existing tests cover the case where the SV channel emits
[meta, [], []](empty file lists with a meta), but not the "no items at all" case, which is what real pipelines hit when SVs aren't part of the sample.Change
Switch the join to a left join with
remainder: trueand treat a missing SV side as[null, [], []], so a MUT and WT tuple is still emitted per sample regardless of SV presence:Tests
Adds
netmhcstabandpan - empty SV channel - fa,hla_str - tsv - stub, which feedsinput[1] = channel.empty()to lock in the SV-less code path.All 7 tests pass locally (Nextflow 24.10.6, nf-test 0.9.5):
The fix was also verified end-to-end by running the
mskcc/neoantigenpipeline(-profile test,docker): before the fix the pipeline silently completed with 6 of 13 stages skipped afterGENERATEMUTFASTA; after the fix all 31 processes run to completion (0 failures), producing the final TSV and annotated JSON outputs.PR Checklist
feature/netmhcstabandpanversions.ymlemission untouched