Skip to content

snpArcher v2.1

Latest

Choose a tag to compare

@tsackton tsackton released this 04 Jun 12:35
aaf7b2e

snpArcher v2.1

snpArcher v2.1 is the first public v2-series release. This release includes both the major v1-to-v2 workflow update and the fixes made while stabilizing v2.

For users upgrading from v1, start with the v1 to v2 migration guide. The new v2 input contracts are also documented in the sample-sheet reference and config field reference.

Highlights

  • Reworked the v2 sample sheet around a minimal manifest: sample_id, input_type, and input, with optional library_id and mark_duplicates.
  • Moved reference and workflow settings into a nested v2 config structure, with schema validation and clearer errors for legacy v1 configs.
  • Added or expanded support for GATK, Sentieon, bcftools, and DeepVariant. Experimental and untested Parabricks support also exists.
  • Added support for BAM and gVCF inputs, with gVCF input support for GATK and Sentieon joint-calling workflows.
  • Reworked interval and joint-calling logic, including staged concat/gather behavior for large VCF/gVCF merges.
  • Added callable-sites BED generation from coverage and mappability sources.
  • Updated QC and postprocess modules for the v2 workflow contract and retired unsupported legacy modules.

v2.1 fixes and stabilization

  • Improved v2 config compatibility, warnings, and legacy-config detection.
  • Added mixed SRA + FASTQ row support for the same sample.
  • Fixed SRA download fallback behavior.
  • Preserved numeric-like sample IDs in QC outputs.
  • Made QC more robust to annotation differences across variant callers.
  • Filtered sparse samples before PLINK GRM generation.
  • Fixed default target selection when the QC module is enabled.
  • Improved callable-sites behavior, including coverage-BED memory handling and safer behavior for mixed BAM/gVCF cohorts.
  • Added complexity-aware GenomicsDB interval splitting for large or fragmented references.
  • Auto-tuned GenMap indexing for large genomes.
  • Moved GATK memory resources into the workflow profile.
  • Fixed DeepVariant workflow behavior.
  • Added long-contig support using CSI indexes where TBI indexes are not sufficient.
  • Fixed long-contig GATK interval gather behavior and sorted staged interval VCFs.
  • Added standalone QC and postprocess runner scripts.

Upgrade notes

This is a breaking upgrade for v1 users. Existing v1 sample sheets and configs are not drop-in compatible with v2. In particular, reference fields now belong in the config, sample metadata is separated from the core sample manifest, and multi-reference sample sheets are no longer part of the v2 contract.

snpArcher v2 is tested with Snakemake 9-series releases. If you see unexpected resource parsing or checkpoint behavior, check the current README recommendation before running large jobs.