Skip to content

call -m (without -v): warn when emitting invariant sites on whole-genome input #2558

@carstenerickson

Description

@carstenerickson

Summary

bcftools call -m (no -v) emits a record at every position with reads. On a 33× WGS BAM that's ~1 billion records / ~23 GB output, vs ~10M records / ~1 GB with -v. The flag is documented, but the default-without--v behavior reliably surprises users following the canonical WGS pipeline; the 100× output blow-up then breaks every downstream consumer (plink2 --vcf OOMs building metadata; bcftools view -R becomes intractable — see #2557).

A one-time stderr warning when -v is omitted on whole-genome input would catch the trap with zero behavior change.

Reproducer

bcftools mpileup -f hg38.fa wgs.bam | bcftools call -m  -Oz -o out.m.vcf.gz   # ~1B / ~23 GB
bcftools mpileup -f hg38.fa wgs.bam | bcftools call -mv -Oz -o out.mv.vcf.gz  # ~10M / ~1 GB

Both are accepted; outputs diverge by 100×. The variant-calling howto uses -mv, but nothing flags the absence of -v as unusual.

Suggested fix

In vcfcall.c after flag parsing:

if (call_mode == MULTIALLELIC && !variants_only && !gvcf_mode) {
    fprintf(stderr,
        "WARNING: `bcftools call -m` invoked without -v; will emit a record at "
        "every callable position (often 100× larger than expected for WGS). "
        "Add -v for variant sites only.\n");
}

Skipping the warning under -g covers the intentional-invariant-output case. An optional second gate ("only warn if input regions cover > 1 contig and > 1 Mbp") would further suppress noise for capture-panel runs.

A warning preserves backward compatibility: existing scripts unchanged, new users get one chance to spot the trap.


bcftools 1.16, 1.23.1; help text in 1.23.1 confirms current behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions