Releases: samtools/bcftools
bcftools release 1.17:
Download the source code here: bcftools-1.17.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
Changes affecting the whole of bcftools, or multiple commands:
-
The
-i
/-e
filtering expressions-
Error checks were added to prevent incorrect use of vector arithmetics. For example, when evaluating the sum of two vectors A and B, the resulting vector could contain nonsense values when the input vectors were not of the same length. The fix introduces the following logic:
- evaluate to C_i = A_i + B_i when length(A)==B(A) and set length(C)=length(A)
- evaluate to C_i = A_i + B_0 when length(B)=1 and set length(C)=length(A)
- evaluate to C_i = A_0 + B_i when length(A)=1 and set length(C)=length(B)
- throw an error when length(A)!=length(B) AND length(A)!=1 AND length(B)!=1
-
Arrays in
Number=R tags
can be now subscripted by alleles found inFORMAT/GT
. For example,
FORMAT/AD[GT] > 10
.. require support of more than 10 reads for each allele
FORMAT/AD[0:GT] > 10
.. same as above, but in the first sample
sSUM(FORMAT/AD[GT]) > 20
.. require total sample depth bigger than 20
-
-
The commands
consensus -H
and+split-vep -H
- Drop unnecessary leading space in the first header column and newly print
#[1]columnName
instead of the previous# [1]columnName
(#1856)
- Drop unnecessary leading space in the first header column and newly print
Changes affecting specific commands:
-
bcftools +allele-length
- Fix overflow for indels longer than 512bp and aggregate alleles equal or larger than that in the same bin (#1837)
-
bcftools annotate
-
bcftools call
-
bcftools consensus
- BREAKING CHANGE: the option
-I, --iupac-codes
newly outputs IUPAC codes based onFORMAT/GT
of all samples. The-s, --samples
and-S, --samples-file
options can be used to subset samples. In order to ignore samples and consider only theREF
andALT
columns (the original behavior prior to 1.17), run with-s -
(#1828)
- BREAKING CHANGE: the option
-
bcftools convert
- Make variantkey conversion work for sites without an
ALT
allele (#1806)
- Make variantkey conversion work for sites without an
-
bcftool csq
-
Fix a bug where a MNV with multiple consequences (e.g. missense + stop_gained) would report only the less severe one (#1810)
-
GFF file parsing was made slightly more flexible, newly ids can be just
XXX
rather than, for example,gene:XXX
-
New
gff2gff
perl script to fix GFF formatting differences
-
-
bcftools +fill-tags
- More of the available annotations are now added by the
-t all
option
- More of the available annotations are now added by the
-
bcftools +fixref
-
New
INFO/FIXREF
annotation -
New
-m
swap mode
-
-
bcftools +mendelian
- The +mendelian plugin has been deprecated and replaced with +mendelian2. The function of the plugin is the same but the command line options and the output format has changed, and for this was introduced as a new plugin.
-
bcftools mpileup
-
Most of the annotations generated by mpileup are now optional via the
-a, --annotate
option and add several new (mostly experimental) annotations. -
New option
--indels-2.0
for an EXPERIMENTAL indel calling model. This model aims to address some known deficiencies of the current indel calling algorithm, specifically, it uses diploid reference consensus sequence. Note that in the current version it has the potential to increase sensitivity but at the cost of decreased specificity. -
Make the FS annotation (Fisher exact test strand bias) functional and remove it from the default annotations
-
-
bcftools norm
-
New
--multi-overlaps
option allows to set overlapping alleles either to the ref allele (the current default) or to a missing allele (#1764 and #1802) -
Fixed a bug in
-m -
which does not split missingFORMAT
values correctly and could lead to emptyFORMAT
fields such as::
instead of the correct:.:
(#1818) -
The
--atomize
option previously would not split complex indels such asC>GGG
. Newly these will be split into two recordsC>G
andC>CGG
(#1832)
-
-
bcftools query
- Fix a rare bug where the printing of
SAMPLE
field withquery
was incorrectly suppressed when the-e
option contained a sample expression while the formatting query did not. See #1783 for details.
- Fix a rare bug where the printing of
-
bcftools +setGT
-
bcftools +split-vep
-
New options
-g, --gene-list
and--gene-list-fields
which allow to prioritize consequences from a list of genes, or restrict output to the listed genes -
New
-H, --print-header
option to print the header with-f
-
Work around a bug in the LOFTEE VEP plugin used to annotate gnomAD VCFs. There the
LoF_info
subfield contains commas which, in general, makes it impossible to parse the VEP subfields. The+split-vep
plugin can now work with such files, replacing the offending commas with slash (/
) characters. See also Ensembl/ensembl-vep#1351 -
Newly the
-c, --columns
option can be omitted when a subfield is used in-i/-e
filtering expression. Note that-c
may still have to be given when it is not possible to infer the type of the subfield. Note that this is an experimental feature.
-
-
bcftools stats
- The per-sample stats (PSC) would not be computed when
-i/-e
filtering options and the-s -
option were given but the expression did not include sample columns (1835)
- The per-sample stats (PSC) would not be computed when
-
bcftools +tag2tag
- Revamp of the plugin to allow wider range of tag conversions, specifically all combinations from
FORMAT/GL,PL,GP
toFORMAT/GL,PL,GP,GT
- Revamp of the plugin to allow wider range of tag conversions, specifically all combinations from
-
bcftools +trio-dnm2
-
New
-n, --strictly-novel
option to downplay alleles which violate Mendelian inheritance but are not novel -
Allow to set the
--pn
and--pns
options separately for SNVs and indels and make the indel settings more strict by default -
Output missing
FORMAT/VAF
values in non-trio samples, rather than random nonsense values
-
-
bcftools +variant-distance
- New option
-d, --direction
to choose the directionality: forward, reverse, nearest (the default) or both (#1829)
- New option
1.16
Download the source code here: bcftools-1.16.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
- New plugin
bcftools +variant-distance
to annotate records with distance to the nearest variant (#1690)
Changes affecting the whole of bcftools, or multiple commands:
-
The
-i
/-e
filtering expressions-
Added support for querying of multiple filters, for example
-i 'FILTER="A;B"'
can be used to select sites with two filters "A" and "B" set. See the documentation for more examples. -
Added modulo arithmetic operator
-
Changes affecting specific commands:
-
bcftools annotate
- A bug introduced in 1.14 caused that records with
INFO/END
annotation would incorrectly trigger-c ~INFO/END
mode of comparison even when not explicitly requested, which would result in not transferring the annotation from a tab-delimited file (#1733)
- A bug introduced in 1.14 caused that records with
-
bcftools merge
- New
-m snp-ins-del
switch to merge SNVs, insertions and deletions separately (#1704)
- New
-
bcftools mpileup
-
New
NMBZ
annotation for Mann-Whitney U-z test on number of mismatches within supporting reads -
Suppress the output of
MQSBZ
andFS
annotations in absence of alternate allele
-
-
bcftools +scatter
- Fix erroneous addition of duplicate
PG
lines
- Fix erroneous addition of duplicate
-
bcftools +setGT
- Custom genotypes (e.g.
-n c:1/1
) now correctly override ploidy
- Custom genotypes (e.g.
1.15.1
Download the source code here: bcftools-1.15.1.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
-
bcftools annotate
- New
-H, --header-line
convenience option to pass a header line on command line, this complements the existing-h, --header-lines
option which requires a file with header lines
- New
-
bcftools csq
- A list of consequence types supported by
bcftools csq
has been added to the manual page. (#1671)
- A list of consequence types supported by
-
bcftools +fill-tags
-
Extend generalized functions so that
FORMAT
tags can be filled as well, for example:bcftools +fill-tags in.bcf -o out.bcf -- -t 'FORMAT/DP:1=int(smpl_sum(FORMAT/AD))'
-
Allow multiple custom functions in a single run. Previously the program would silently go with the last one, assigning the same values to all (#1684)
-
-
bcftools norm
-
Fix an assertion failure triggered when a faulty VCF file with a '-' character in the REF allele was used with
bcftools norm --atomize
. This option now checks that the REF allele only includes the allowed characters A, C, G, T and N. (#1668) -
Fix the loss of phasing in half-missing genotypes in variant atomization (#1689)
-
-
bcftools roh
- Fix a bug that could result in an endless loop or incorrect AF estimate when missing genotypes are present and the
--estimate-AF -
option was used (#1687)
- Fix a bug that could result in an endless loop or incorrect AF estimate when missing genotypes are present and the
-
bcftools +split-vep
- VEP fields with characters disallowed in VCF tag names by the specification (such as
-
inM-CAP
) couldn't be queried. This has been fixed, the program now sanitizes the field names, replacing invalid characters with underscore (#1686)
- VEP fields with characters disallowed in VCF tag names by the specification (such as
1.15
Download the source code here: bcftools-1.15.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
-
New
bcftools head
subcommand for conveniently displaying the headers of a VCF or BCF file. Without any options, this is equivalent tobcftools view --header-only --no-version
but more succinct and memorable. -
The
-T, --targets-file
option had the following bug originating in HTSlib code: when an uncompressed file with multiple columnsCHR
,POS
,REF
was provided, theREF
would be interpreted as 0 gigabases (#1598)
Changes affecting specific commands:
-
bcftools annotate
-
In addition to
--rename-annots
, which requires a file with name mappings, it is now possible to do the same on the command line-c NEW_TAG:=OLD_TAG
-
Add new option
--min-overlap
which allows to specify the minimum required overlap of intersecting regions -
Allow to transfer
ALT
from VCF with or without replacement using:
bcftools annotate -a annots.vcf.gz -c ALT file.vcf.gz
bcftools annotate -a annots.vcf.gz -c +ALT file.vcf.gz
-
-
bcftools convert
-
Revamp of
--gensample
,--hapsample
and--haplegendsample
family of options which includes the following changes: -
New
--3N6
option to output/input the new version of the.gen
file format, see https://www.cog-genomics.org/plink/2.0/formats#gen -
Deprecate the
--chrom
option in favor of--3N6
. A simplecut
command can be used to convert from the new3*M+6
column format to the format printed with--chrom
(cut -d' ' -f1,3-
). -
The
CHROM:POS_REF_ALT
IDs which are used to detect strand swaps are required and must appear either in the "SNP ID" column or the "rsID" column. The column is autodetected for--gensample2vcf
, can be the first or the second for--hapsample2vcf
(depending on whether the--vcf-ids
option is given), must be the first for--haplegendsample2vcf
.
-
-
bcftools csq
- Allow GFF files with phase column unset
-
bcftools filter
- New
--mask
,--mask-file
and--mask-overlap
options to soft filter variants in regions (#1635)
- New
-
bcftools +fixref
-
The
-m id
option now works also for non-dbSNP ids, i.e. not justrsINT
-
New
-m flip-all
mode for flipping all sites, including ambiguous A/T and C/G sites
-
-
bcftools isec
- Prevent segfault on sites filtered with
-i
/-e
in all files (#1632)
- Prevent segfault on sites filtered with
-
bcftools mpileup
-
More flexible read filtering using the options:
--ls
,--skip-all-set
.. skip reads with all of the FLAG bits set
--ns
,--skip-any-set
.. skip reads with any of the FLAG bits set
--lu
,--skip-all-unset
.. skip reads with all of the FLAG bits unset
--nu
,--skip-any-unset
.. skip reads with any of the FLAG bits unsetThe existing synonymous options will continue to function but their use is discouraged:
--rf
,--incl-flags
STR|INT
Required flags: skip reads with mask bits unset
--ff
,--excl-flags
STR|INT
Filter flags: skip reads with mask bits set
-
-
bcftools query
- Make the
--samples
and--samples-file
options work also in the--list-samples
mode. Add a new--force-samples
option which allows to proceed even when some of the requested samples are not present in the VCF (#1631)
- Make the
-
bcftools +setGT
- Fix a bug in
-t q -e EXPR
logic applied onFORMAT
fields, sites with all samples failing the expressionEXPR
were incorrectly skipped. This problem affected only the use of-e
logic, not the-i
expressions (#1607)
- Fix a bug in
-
bcftools sort
- make use of the
TMPDIR
environment variable when defined
- make use of the
-
bcftools +trio-dnm2
- The
--use-NAIVE
mode now also adds the de novo allele inFORMAT/VA
- The
1.14
Download the source code here: bcftools-1.14.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
Changes affecting the whole of bcftools, or multiple commands
-
New
--regions-overlap
and--targets-overlap
options which address a long-standing design problem with subsetting VCF files by region. BCFtools recognize two sets of options, one for streaming (-t/-T
) and one for index-gumping (-r/-R
). They behave differently, the first includes only records with POS coordinate within the regions, the other includes overlapping regions. The two new options allow to modify the default behaviour, see the man page for more details. -
The
--output-type
option can be used to override the default compression level
Changes affecting specific commands
-
bcftools annotate
-
when
--set-id
and--remove
are combined,--set-id
cannot use tags deleted by--remove
. This is now detected and the program exists with an informative error message instead of segfaulting (#1540) -
while non-symbolic variation are uniquely identified by
POS
,REF
,ALT
, symbolic alleles starting at the same position were indistinguishable. This prevented correct matching of records with the same positions and variant type but different length given byINFO
/END
(samtools/htslib@60977f2). When annotating from a VCF/BCF, the matching is done automatically. When annotating from a tab-delimited text file, this feature can be invoked by using-c INFO/END
. -
add a new
.
modifier to control whether missing values should be carried over from a tab-delimited file or not. For example:-c TAG ..
addsTAG
if the source value is not missing. IfTAG
exists in the target file, it will be overwritten.
-c .TAG ..
addsTAG
even if the source value is missing. This can overwrite non-missing values with a missing value and can create empty VCF fields (TAG=.
)
-
-
bcftools +check-ploidy
- by default missing genotypes are not used when determining ploidy. With the new option
-m, --use-missing
it is possible to use the information carried in the missing and half-missing genotypes (e.g..
,./.
or./1
)
- by default missing genotypes are not used when determining ploidy. With the new option
-
bcftools concat
:- new
--ligate-force
and--ligate-warn
options for finer control of-l, --ligate
behavior in imperfect overlaps. The new default is to throw an error when sites present in one chunk but absent in the other are encountered. To drop such sites and proceed, use the new--ligate-warn
option (previously this was the default). To keep such sites, use the new--ligate-force
option (#1567).
- new
-
bcftools consensus
:- Apply mask even when the VCF has no notion about the chromosome. It was possible to encounter this problem when
contig
lines were not present in the VCF header and no variants were called on that chromosome (#1592)
- Apply mask even when the VCF has no notion about the chromosome. It was possible to encounter this problem when
-
bcftools +contrast
:- support for chunking within map/reduce framework allowing to collect
NASSOC
counts even for empty case/control sample sets (#1566)
- support for chunking within map/reduce framework allowing to collect
-
bcftools csq
:-
bug fix, compound indels were not recognised in some cases (#1536)
-
compound variants were incorrectly marked as 'inframe' even when stop codon would occur before the frame was restored (#1551)
-
bug fix,
FORMAT/BCSQ
bitmasks could have been assigned incorrectly to some samples at multiallelic sites, a superset of the correct consequences would have been set (#1539) -
bug fix, the upstream stop could be falsely assigned to all samples in a multi-sample VCF even if the stop was relevant for a single sample only (#1578)
-
further improve the detection of mismatching chromosome naming (e.g. "chrX" vs "X") in the GFF, VCF and fasta files
-
-
bcftools merge
:- keep (sum)
INFO/AN,AC
values when merging VCFs with no samples (#1394)
- keep (sum)
-
bcftools mpileup
:- new
--indel-size
option which allows to increase the maximum considered indel size considered, large deletions in long read data are otherwise lost.
- new
-
bcftools norm
:-
atomization now supports
Number=A,R
string annotations (#1503) -
assign as many alternate alleles to genotypes at multiallelic sites in the
-m +
mode, disregarding the phase. Previously the program assumed to be executed as an inverse operation of-m -
, but when that was not the case, reference alleles would have been filled instead of multiple alternate alleles (#1542)
-
-
bcftools sort
:- increase accuracy of the
--max-mem
option limit, previously the limit could be exceeded by more than 20% (#1576)
- increase accuracy of the
-
bcftools +trio-dnm
:- new
--with-pAD
option to allow processing of VCFs without FORMAT/QS. The existing--ppl
option was changed to the analogous--with-pPL
- new
-
bcftools view
:- the functionality of the option
--compression-level
lost in 1.12 has been restored
- the functionality of the option
1.13
Download the source code here: bcftools-1.13.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
This release brings new options and significant changes in BAQ parametrization in bcftools mpileup
. The previous behaviour can be triggered by providing the --config 1.12
option. Please see #1474 for details.
Changes affecting the whole of bcftools, or multiple commands:
- Improved build system
Changes affecting specific commands:
-
bcftools annotate:
-
Fix rare a bug when INFO/END is present, all INFO fields are removed with
bcftools annotate -x INFO
and BCF output is produced. Then the removed INFO/END continues to inform the end coordinate and causes incorrect retrieval of records with the -r option (#1483) -
Support for matching annotation line by ID, in addition to CHROM,POS,REF, and ALT (#1461)
bcftools annotate -a annots.tab.gz -c CHROM,POS,~ID,REF,ALT,INFO/END input.vcf
-
-
bcftools csq:
-
When GFF and VCF/fasta use a different chromosome naming convention (e.g. chrX vs X), no consequences would be added. Newly the program attempts to detect these differences and remove/add the "chr" prefix to chromosome name to match the GFF and VCF/fasta (#1507)
-
Parametrize brief-predictions parameter to allow explicit number of amino acids to be printed. Note that the
-b, --brief-predictions
option is being replaced with-B, --trim-protein-seq INT
-
-
bcftools +fill-tags:
-
Generalization and better support for custom functions that allow adding new INFO tags based on arbitrary
-i, --include
type of expressions. For example, to calculate a missing INFO/DP annotation from FORMAT/AD, it is possible to use:
-t 'DP:1=int(sum(FORMAT/AD))'
Here the optional ":1" part specifies that a single value will be added (by default Number=. is used) and the optionalint(...)
adds an integer value (by default Type=Float is used). -
When FORMAT/GT is not present, the INFO/AF tag will be newly calculated from INFO/AC and INFO/AN.
-
-
bcftools gtcheck:
-
Switch between FORMAT/GT or FORMAT/PL when one is (implicitly) requested but only the other is available
-
Improve diagnostics, printing warnings when a line cannot be matched and the number of lines skipped for various reasons (#1444)
-
Minor bug fix, with PLs being the default, the
--distinctive-sites
option started to require explicit--error-probability 0
-
-
bcftools index:
- The program now accepts both data file name and the index file name. This adds to user convenience when running index statistics (-n, -s)
-
bcftools isec:
- Always generate sites.txt with isec -p (#1462)
-
bcftools +mendelian:
- Consider only complete trios, do not crash on sample name typos (#1520)
-
bcftools mpileup:
-
New
--seed
option for reproducibility of subsampling code in HTSlib -
The SCR annotation which shows the number of soft-clipped reads now correctly pools reads together regardless of the variant type. Previously only reads with indels were included at indel sites.
-
Major revamp of BAQ. Please see #1474 for details. The previous behaviour can be triggered by providing the
--config 1.12
option. -
Thanks to improvements in HTSlib, the removal of overlapping reads (which can be disabled with the
-x, --ignore-overlaps
options) is not systematically biased any more (samtools/htslib#1273) -
Modified scale of Mann-Whitney U tests. Newly INFO/*Z annotations will be printed, for example MQBZ replaces MQB.
-
-
bcftools norm:
-
Fix Type=Flag output in
norm --atomize
(#1472) -
Atomization must not discard ALT=. records
-
Atomization of AD and QS tags now correctly updates occurrences of duplicate alleles within different haplotypes
-
Fix a bug in atomization of Number=A,R tags
-
-
bcftools reheader:
- Add
-T, --temp-prefix
option
- Add
-
bcftools +setGT:
- A wider range of genotypes can be set by the plugin by allowing specifying custom genotypes. For example, to force a heterozygous genotype it is now possible to use expressions like: c:'m|M' c:0/1 c:0
-
bcftools +split-vep:
-
New
-u, --allow-undef-tags
option -
Better handling of ambiguous keys such as INFO/AF and CSQ/AD. The
-p, --annot-prefix
option is now applied before doing anything else which allows its use with-f, --format
and-c, --columns
options. -
Some consequence field names may not constitute a valid tag name, such as "pos(1-based)". Newly field names are trimmed to exclude brackets.
-
-
bcftools +tag2tag:
- New --QR-QA-to-QS option to convert annotations generated by Freebayes to QS used by BCFtools
-
bcftools +trio-dnm:
-
Add support for sites with more than four alleles. Note that only the four most frequent alleles are considered, the model remains unchanged. Previously such sites were skipped.
-
New --use-NAIVE option for a naive DNM calling based solely on FORMAT/GT and expected Mendelian inheritance. This option is suitable for pre-filtering.
-
Fix behaviour to match the documentation, the
--dnm-tag DNG
option now correctly outputs log scaled values by default, not phred scaled. -
Fix bug in VAF calculation, homozygous de novo variants were incorrectly reported as having VAF=50%
-
Fix arithmetic underflow which could lead to imprecise scores and improve sensitivity in high coverage regions
-
Allow combining --pn and --pns to set the noise thresholds independently
-
1.12
Download the source code here: bcftools-1.12.tar.bz2.
(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
Changes affecting the whole of bcftools, or multiple commands:
-
The output file type is determined from the output file name suffix, where available, so the
-O/--output-type
option is often no longer necessary. -
Make
F_MISSING
in filtering expressions work for sites with multipleALT
alleles (#1343) -
Fix
N_PASS
andF_PASS
to behave according to expectation when reverse logic is used (#1397). This fix has the side effect ofquery
(or programs like+trio-stats
) behaving differently with these expressions, operating now in site-oriented rather than sample-oriented mode. For example, the new behavior could be:bcftools query -f'[%POS %SAMPLE %GT\n]' -i'N_PASS(GT="alt")==1' 11 A 0/0 11 B 0/0 11 C 1/1
while previously the same expression would return:
11 C 1/1
The original mode can be mimicked by splitting the filtering into two steps:
bcftools view -i'N_PASS(GT="alt")==1' | bcftools query -f'[%POS %SAMPLE %GT\n]' -i'GT="alt"'
Changes affecting specific commands:
-
bcftools annotate
:-
New
--rename-annots
option to help fix broken VCFs (#1335) -
New
-C
option allows to read a long list of options from a file to prevent very long command lines. -
New
append-missing
logic allows annotations to be added for eachALT
allele in the same order as they appear in the VCF. Note that this is not bullet proof. In order for this to work:-
the annotation file must have one line per
ALT
allele -
fields must contain a single value as multiple values are appended as they are and would break the correspondence between the alleles and values
-
-
-
bcftools concat
:- Do not phase genotypes by mistake if they are not already phased with
-l
(#1346)
- Do not phase genotypes by mistake if they are not already phased with
-
bcftools consensus
:-
New
--mask-with
,--mark-del
,--mark-ins
,--mark-snv
options (#1382, #1381, #1170) -
Symbolic
<DEL>
should have only oneREF
base. If there are multiple, takePOS+1
as the first deleted base. -
Make consensus work when the first base of the reference genome is deleted. In this situation the VCF record has
POS=1
and the firstREF
base cannot precede the event. (#1330)
-
-
bcftools +contrast
:- The
NOVELGT
annotation was previously not added when requested.
- The
-
bcftools convert
:- Make the
--hapsample
and--hapsample2vcf
options consistent with each other and with the documentation.
- Make the
-
bcftools call
:-
Revamp of
call -G
, previously sample grouping by population was not truly independent and could still be influenced by the presence of other sample groups. -
Optional addition of
INFO/PV4
annotation withcall -a INFO/PV4
-
Remove generation of useless
HOB
andICB
annotation; use+fill-tags -- -t HWE,ExcHet
instead -
The
call -f
option was renamed to-a
to (1) make it consistent withmpileup
and (2) to indicate that it includes bothINFO
andFORMAT
annotations, not justFORMAT
as previously -
Any sensible
Number=R,Type=Integer
annotation can be used with-G
, such asAD
orQS
-
Don't trim
QUAL
; although usefulness of this change is questionable for true probabilistic interpretation (such high precision is unrealistic), usingQUAL
as a score rather than probability is helpful and permits more fine-grained filtering -
Fix a suspected bug in
call -F
in the worst case, for certain improve readability -
call -C trio
is temporarily disabled
-
-
bcftools csq
:-
Fix a bug wich caused incorrect
FORMAT/BCSQ
formatting at sites with too many per-sample consequences -
Fix a bug which incorrectly handled the
--ncsq
parameter and could clash with reserved BCF values, consequently producing truncated or even incorrect output of the%TBCSQ
formatting expression inbcftools query
. To account for the reserved values, the new default value is--ncsq 15
(#1428)
-
-
bcftools +fill-tags
:-
MAF
definition revised for multiallelic sites, the second most common allele is considered to be the minor allele (#1313) -
New
FORMAT/VAF
,VAF1
annotations to set the fraction of alternate reads providedFORMAT/AD
is present
-
-
bcftools gtcheck
:- support matching of a single sample against all other samples in the file with
-s qry:sample -s gt:-
. This was previously not possible, either full cross-check mode had to be run or a list of pairs/samples had to be created explicitly
- support matching of a single sample against all other samples in the file with
-
bcftools merge
: -
bcftools mpileup
:- Add new optional tag
mpileup -a FORMAT/QS
- Add new optional tag
-
bcftools norm
:-
New
-a, --atomize
functionality to decompose complex variants, for example MNVs into consecutive SNVs -
New option
--old-rec-tag
to indicate the original variant
-
-
bcftools query
:- Incorrect fields were printed in the per-sample output when subset of samples was requested via
-s
/-S
and the order of samples in the header was different from the requested-s
/-S
order (#1435)
- Incorrect fields were printed in the per-sample output when subset of samples was requested via
-
bcftools +prune
:- New options
--random-seed
and--nsites-per-win-mode
(#1050)
- New options
-
bcftools +split-vep
:-
Transcript selection now works also on the raw
CSQ
/BCSQ
annotation. -
Bug fix, samples were dropped on VCF input and VCF/BCF output (#1349)
-
-
bcftools stats
:-
Changes to
QUAL
and ts/tv plotting stats: avoid cappingQUAL
to predefined bins, use an open-range logarithmic binning instead -
plot dual ts/tv stats: per quality bin and cumulative as if threshold applied on the whole dataset
-
-
bcftools +trio-dnm2
:- Major revamp of
+trio-dnm
plugin, which is now deprecated and replaced by+trio-dnm2
.
The originaltrio-dnm
calling model used genotype likelihoods (PL
s) as the input for calling. However, that is flawed becausePL
s make assumptions which are unsuitable for de novo calling:PL(RR)
can become bigger thanPL(RA)
even when theALT
allele is present in the parents. Note that this is true also for other programs such as DeNovoGear which rely on the same samtools calculation.
The new recommended workflow is:This new version also implements the DeNovoGear model. The original behavior of trio-dnm is no longer supported.bcftools mpileup -a AD,QS -f ref.fa -Ou proband.bam father.bam mother.bam | \ bcftools call -mv -Ou | \ bcftools +trio-dnm -p proband,father,mother -Oz -o output.vcf.gz
For more details see http://samtools.github.io/bcftools/trio-dnm.pdf
- Major revamp of
1.11
Download the source code here: bcftools-1.11.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
Changes affecting the whole of bcftools, or multiple commands:
-
Filtering
-i
/-e
expressions-
Breaking change in
-i
/-e
expressions on theFILTER
column. Originally it was possible to query only a subset of filters, but not an exact match. The new behaviour is:Expression Result FILTER="A"
Exact match, for example "A;B" does not pass FILTER!="A"
Exact match, for example "A;B" does pass FILTER~"A"
Both "A" and "A;B" pass FILTER!~"A"
Neither "A" nor "A;B" pass -
Fix in commutative comparison operators, in some cases reversing sides would produce incorrect results (#1224; #1266)
-
Better support for filtering on sample subsests
-
Add
SMPL_*/S*
family of functions that evaluate within rather than across all samples. (#1180)
-
-
Improvements in the build system
Changes affecting specific commands:
-
bcftools annotate
:-
Previously it was not possible to use
--columns =TAG
withINFO
tags and the--merge-logic
feature was restricted to tab files withBEG
,END
columns, now extended to work also withREF
,ALT
. -
Make
annotate -TAG/+TAG
work also withFORMAT
fields. (#1259) -
ID
andFILTER
can be transferred toINFO
andID
can be populated fromINFO
. However, theFILTER
column still cannot be populated from anINFO
tag because all possibleFILTER
values must be known at the time of writing the header (#947; #1187)
-
-
bcftools consensus
:-
Fix in handling symbolic deletions and overlapping variants. (#1149; #1155; #1295)
-
Fix
--iupac-codes
crash on REF-only positions withALT="."
. (#1273) -
Fix
--chain
crash. (#1245) -
Preserve the case of the genome reference. (#1150)
-
Add new
-a, --absent
option which allows to set positions with no supporting evidence to "N" (or any other character). (#848; #940)
-
-
bcftools convert
:-
The option
--vcf-ids
now works also with-haplegendsample2vcf
. (#1217) -
New option
--keep-duplicates
-
-
bcftools csq
:-
Add
misc/gff2gff.py
script for conversion between various flavors of GFF files. The initial commit supports only one type and was contributed by @flashton2003. (#530) -
Allow overlapping CDS to support ribosomal slippage. (#1208)
-
-
bcftools +fill-tags
:- Added new annotations:
INFO/END
,TYPE
,F_MISSING
.
- Added new annotations:
-
bcftools filter
:- Make
--SnpGap
optionally filter also SNPs close to other variant types. (#1126)
- Make
-
bcftools gtcheck
:- Complete revamp of the command. The new version is faster and allows N:M sample comparisons, not just 1:N or NxN comparisons. Some functionality was lost (plotting and clustering) but may be added back on popular demand.
-
bcftools +mendelian
:- Revamp of user options, output VCFs with mendelian errors annotation, read PED files (thanks to Giulio Genovese).
-
bcftools merge
:-
Update headers when appropriate with the '--info-rules *:join' INFO rule. (#1282)
-
Local alleles merging that produce
LAA
andLPL
when requested, a draft implementation of samtools/hts-specs#434 (#1138) -
New
--no-index
which allows to merge unindexed files. Requires the input files to have chromosomes in th same order and consistent with the order of sequences in the header. (PR #1253; samtools/htslib#1089)
-
-
bcftools norm
: -
bcftools +prune
:- Extend to allow annotating with various LD metrics: r^2, Lewontin's D' (PMID:19433632), or Ragsdale's D (PMID:31697386).
-
bcftools query
:- New
%N_PASS()
formatting expression to output the number of samples that pass the filtering expression.
- New
-
bcftools reheader
:- Improved error reporting to prevent user mistakes. (#1288)
-
bcftools roh
: -
bcftools scatter
:- New plugin intended as a convenient inverse to
concat
(thanks to Giulio Genovese, PR #1249)
- New plugin intended as a convenient inverse to
-
bcftools +split
:-
New
--groups-file
option for more flexibility of defining desired output. (#1240) -
New
--hts-opts
option to reduce required memory by reusing one output header and allow overriding the default hFile's block size with--hts-opts block_size=XXX
. On some file systems (lustre) the default size can be 4M which becomes a problem when splitting files with 10+ samples. -
Add support for multisample output and sample renaming
-
-
bcftools +split-vep
:- Add default types (Integer, Float, String) for VEP subfields and make
--columns -
extract all subfields into INFO tags in one go.
- Add default types (Integer, Float, String) for VEP subfields and make
1.10.2
Download the source code here: bcftools-1.10.2.tar.bz2.
(The “Source code” downloads links below are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
This release fixes crashes reported on files including integer INFO tags with values outside the range officially supported by VCF. It also fixes a bug where invalid BCF files would be created if such values were present.