Releases: kevlar-dev/kevlar
Releases · kevlar-dev/kevlar
Kevlar version 0.7
Added
- A new Snakemake workflow for preprocessing BAM inputs for analysis with kevlar (see #305, #355).
- A new Snakemake workflow for kevlar's standard processing procedure (see #306, #355).
- New
unband
module to merge augmented Fastq files produced with a k-mer banding strategy (see #316). - New
varfilter
module to filter out preliminary variant calls overlapping with problematic/unwanted loci or features (see #318, #342, #354). - New dependency:
intervaltree
package (see #318). - A new
sandbox
directory with convenience scripts for development and analysis (see #335). - A new
--min-like-score
filter for thesimlike
module (see #343). - A new
--drop-outliers
filter for thesimlike
module (see #350).
Changed
- Added a new flag to print to the terminal (stderr) and a logfile simultaneously (see #308).
- The functionality of the previous
filter
module is now split between the newunband
module and a reimplementation of thefilter
module (see #316). - Added a "fast mode" to the
simlike
module, prematurely halting computations for calls already marked for filtering (see #328). - Added a filter for problematic short indels adjacent to homopolymers (see #336, #338, #339).
- Implemented new filters in the
simlike
module based on thresholds and k-mer abundances: theControlAbundance
filter for predictions with too many high-abundance parent/control k-mers spanning the variant, and theCaseAbundance
filter for predictions with too many consecutive proband/child k-mers spanning the variant (see #327, #339).
Fixed
- Corrected a bug that reported the reference target sequence instead of the assembled contig sequence in the
CONTIG
attribute of indel calls in the VCF (see #304). - Corrected a bug that called adjacent substitutions as independent SNVs rather than an aggregate MNV (see #332).
Removed
Kevlar version 0.6.1
Fixed
- Updated
setup.py
so that the README markdown is included in the long description attribute for rendering on PyPI (see commit 9f51024). - Removed direct calls to fixures that are no longer supported by pytest (see commit dab6418).
- Updated the Makefile so that
kevlar/tests/__init__.py
is not included when running the test suite. Now compatible with pytest>=4.0.0 (see commit 965bd0d).
Kevlar version 0.6
Added
- The
kevlar count
operation now supports masks and 8-, 4-, or 1-bit counters (see #277 and #291). - A Jupyter notebook and supporting code and data for evaluating kevlar's performance on a simulated data set (see #271).
- New flags for filtering gDNA cutouts or calls from specified sequences (see #285).
- New filter that discards any contig/gDNA alignment with more than 4 mismatches (see #288).
- A new feature that generates a Nodetable containing only variant-spanning k-mers to support re-counting k-mers and computing likelihood scores in low memory (see #289, #292, #302).
- A new
ProgressIndicator
class that provides gradually less frequent updates over time (see #299).
Changed
- Ported augfastx handling from
kevlar.seqio
module to a new Cython module (see #279). - Dynamic error model for likelihood calculations is now an configurable option (see #286).
- Cleaned up overlap-related code with a new
ReadPair
class (see #283). - Updated
kevlar assemble
,kevlar localize
, andkevlar call
to accept streams of partitioned reads; previously, only reads for a single partition were permitted (see #294). - Overhauled the
kevlar localize
command to compute seed locations for all seeds in all partitions with a single BWA call, massively improving efficiency (see #294 and #301). - Updated the variant calling procedure to discard alignment blocks less than
ksize
in length (see #303).
Fixed
- Minor bug with .gml output due to a change in the networkx package (see #278).
Removed
Kevlar version 0.5
Fixed
Added
- Multithreading is now supported natively in
kevlar alac
(see #249 and unmergedfeed-thread
branch). - A limited-scope VCF reader (see #256).
- Script for computing likelihood scores is now a first-class kevlar citizen as
kevlar simlike
(see #259). - New
kevlar dist
subcommand for computing average and standard deviation of k-mer abundances for likelihood calculations (see #264). - Paired-end awareness for
kevlar dump
(see #265). - New
LikelihoodFail
filter for variant calls with a negative likelihood score (see #266).
Kevlar version 0.4.2
Kevlar version 0.4.1
Kevlar version 0.4
Added
- New
kevlar gentrio
command for a more realistic similation of trios for testing and evaluation (#171). - New filter for
kevlar alac
for discarding partitions with a small number of interesting k-mers (#189). - New
kevlar split
subcommand for splitting a partitioned augfastq file into N chunks (see #206). - New
-p/--part-id
flag inkevlar alac
for processing a single partition in a partitioned augfastq file (see #206). - New reader/parser for parititioned augfastx files (see #206).
- New strategy for discriminating between variants and off-target calls using pairing information (see #210).
- New optional "fallback" assembly strategy: if fermi-lite fails, try our homegrown greedy assembly algorithm (see #214 and #219).
- New parameter for excluding SNV calls too near to the end of a contig (see #222).
Changed
- Replaced
pep8
withpycodestyle
for enforcing code style in development (see #167). - The
--refr
argument of thekevlar dump
command is now optional, and when no reference is explicitly specifiedkevlar dump
acts primarily as a BAM to Fastq converter (see #170). - Split the functionality of the
count
subcommand: simple single-sample k-mer counting was kept incount
with a much simplified interface, while the memory efficient multi-sample "masked counting" strategy was split out to a new subcommandeffcount
(see #185). - Replaced
kevlar reaugment
with a more generalizablekevlar augment
subcommand (see #188). - Replaced
--ksize
with--seed-size
inkevlar localize
so thatkevlar alac
can now support different values for k-mers and localizing seeds/anchors (see #198). - Improved variant sorting, scoring, and reporting strategy (see #199).
- The augmented Fastx format now permits annotation of 1 or more mate sequences (see #210).
- Split
vcf.py
andvarmap.py
modules off from thecall.py
module (see #229).
Fixed
- Incorrect file names in the quick start documentation page (see 9f6bec0).
- The
kevlar alac
procedure now accepts a stream of read partitions (instead of a stream of reads) at the Python API level, and correctly handles a single partition-labeled sequence file at the CLI level (see #165). - CIGARs that begin with I blocks (alternate allele contig is longer than reference locus) are now handled properly (see #191).
- Bug with how
kevlar alac
handles "no reference match" scenarios resolved (see #192). - Bug with
kevlar count
when reading from multiple input files (see #202). - Can now call SNVs near INDELs (see #229).
Removed
- The JCA assembly mode is no longer supported (see #231).
Kevlar version 0.3.0
This release includes many new features, some refactoring of the core codebase, and the first end-to-end analysis workflow implemented in a single command.
Details are included below.
Fixed
- Abundances reported by
kevlar filter
now correctly show re-computed proband k-mer abundances, not pre-filtering abundances (see #111). - The
kevlar localize
andkevlar call
procedures now handle multiple assembled contigs, calling variants from the best reference match for each contig (see #124, #126, and #147).
Added
- New abundance screen now a part of
kevlar novel
. If any k-mer in a read is below some abundance threshold, the entire read is discarded (see #106). - Better error reporting and handling of various issues with assembly, localization, and alignment (see #113, #114).
- Support for VCF output (see #130 and #144), including "windows" with all k-mers containing the reference allele (RW) and alternate allele (VW) to facilitate distinguishing inherited mutations from novel mutations (see #144 and #152).
- New subcommands
alac
: assembles, localizes, aligns, and calls variants on a single partition basissimplex
: invokes the entire simplex analysis workflow
Changed
- The
kevlar filter
procedure now handles both contamination and reference matches under a single "mask" interface (see #103). - Explicitly dropped support for Python 2.7. Now supports only Python >=3.5 (see #125).
- Main methods for each core subcommand are now implemented as minimal wrappers around generator functions, to facilitate composing different steps of the workflow or invoking them from third-party Python code (see #95, #126, #133, #148, #149, #150, #159, #161).
- The home-grown greedy assembly implementation has been replaced by calls to the
fermi-lite
library, which is now bundled with kevlar (see #156). - The default behavior of
kevlar partition
is now to output a single stream of reads.
Writing each partition to a distinct file is still supported with the--split
option.
Removed
- The
kevlar collect
command and associated tests. Its functionality has now been fully distributed to other subcommands.- Read filtering to
kevlar filter
- Junction count contig assembly to
kevlar filter
as an optional mode
- Read filtering to
Kevlar version 0.2.0
Kevlar release v0.2 adds new subcommands for read partitioning and variant calling, fixes a major bug with contig assembly, and introduces many minor fixes, improvements, and code refactoring.
Added
- New subcommands
partition
: group reads by shared interesting k-merslocalize
: determine an assembled contig's location in the reference genomecall
: align assembled contigs to reference and call variant
- Documentation suite in
docs/
, hosted at https://kevlar.readthedocs.io - New third-party dependency
ksw2
for computing alignments. Wrapped with Cython, which is a new development-time dependency (but not install or run time). - The
pandas
package is now a dependency, andpysam
andnetworkx
are now hard dependencies (rather than conditional).
Fixed
- Bug with assembly when the order of a read pair was swapped and they had the opposite orientation (see #85).
Kevlar version 0.2.0 (release candidate 2)
Fixing some issues with packaging, and updating the installation docs.