Skip to content

feat: support multiple BAM files with parallel processing#5

Merged
ewels merged 2 commits into
mainfrom
multiple-inputs
Feb 13, 2026
Merged

feat: support multiple BAM files with parallel processing#5
ewels merged 2 commits into
mainfrom
multiple-inputs

Conversation

@ewels
Copy link
Copy Markdown
Member

@ewels ewels commented Feb 13, 2026

Summary

  • Accept one or more BAM files as positional arguments to rustqc rna <BAM>... --gtf <GTF>
  • When multiple BAMs are provided, they are processed in parallel using rayon, with threads distributed evenly across concurrent jobs
  • The GTF annotation is parsed once and shared (read-only) across all parallel BAM processing jobs

Breaking Change

GTF moved from a positional argument to a --gtf / -g named flag to support variadic BAM inputs:

# Before
rustqc rna sample.bam annotation.gtf --outdir results

# After
rustqc rna sample.bam --gtf annotation.gtf --outdir results

# Multiple BAMs
rustqc rna sample1.bam sample2.bam sample3.bam --gtf annotation.gtf --outdir results --threads 12

Details

  • Parallel processing: Multiple BAMs run concurrently via rayon par_iter. Thread allocation divides --threads evenly across parallel jobs (single BAM gets all threads).
  • Duplicate stem detection: Rejects inputs where two BAM files share the same filename stem (e.g. dir1/sample.bam and dir2/sample.bam) since outputs would collide.
  • Log clarity: All log messages prefixed with [bam_stem] when processing multiple BAMs.
  • Error handling: If one BAM fails, remaining BAMs still complete. A summary of failures is reported at the end.

Files Changed

File Change
src/cli.rs bamVec<String>, gtf--gtf flag, new unit test
src/main.rs Parallel BAM dispatch, process_single_bam() extraction, duplicate detection
tests/integration_test.rs Updated to --gtf flag, added multi-BAM and duplicate-stem tests
README.md Updated CLI usage, examples, Docker, PGO, and performance docs
benchmark/README.md Updated replication commands
AGENTS.md Updated CLI syntax description
CHANGELOG.md Added multi-BAM entry with breaking change note

Testing

  • 35/35 tests pass (25 unit + 10 integration)
  • cargo fmt --check clean
  • cargo clippy -- -D warnings clean
  • Manually verified with small benchmark (single + multiple BAMs)
  • Manually verified with large 10GB benchmark (single + mixed parallel)

Accept one or more BAM files as positional arguments to 'rustqc rna'.
When multiple BAMs are provided, they are processed in parallel using
rayon, with threads distributed evenly across concurrent jobs. The GTF
is parsed once and shared across all BAM files.

Breaking change: GTF moved from positional argument to --gtf/-g flag
to support variadic BAM inputs.

- Detect and reject duplicate BAM file stems to prevent output collisions
- Add [bam_stem] prefix to all log messages for multi-BAM clarity
- Add integration tests for multi-BAM processing and duplicate detection
- Update all documentation (README, AGENTS, benchmark, CHANGELOG)
@ewels ewels merged commit 4af33ac into main Feb 13, 2026
4 checks passed
@ewels ewels deleted the multiple-inputs branch February 13, 2026 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant