@nh13 nh13 released this Nov 6, 2018 · 12 commits to master since this release

Assets 3

Release 0.7.0 introduces the following changes to existing tools:

  • GroupReadsByUmi
    • check that the raw UMI tag is found foreach read (#406)
    • Fix log message in GroupReadsByUmi to be more accurate / less misleading (#436)
  • DemuxFastqs: enable --quality-encoding to be used on the command line (#417)
  • HapCutToVcf
    • fix ambiguous (IUPAC) reference bases on the fly #418)
    • add an option to skip indexing the output file (ex. when the input does not have CONTIG lines) #418)

In addition, the following new tools were added:

  • FindSwitchbackReads: Tool to detect templates with strand-switch events in them (#438)

The following API changes were also introduced:

  • FastqSource can handle read numbers > 2 (#408)
  • Fixed writing and parsing of Double.Nan, Double.PositiveInfinity and Double.NegativeInfinity in Metric classes (#411)
  • SamBuilder should accept missing bases and quals with a cigar (#424)
  • Add message to require() call in Sample (#425)
  • ReadStructure to allow and strip out whitespace within the read structure during parsing (#425)
  • ProgressLogger.record should return if logging was triggered and a method to log the last record (#421)
  • Bug fix: Metric.write was not closing its writer (#421)
  • Adding a few useful methods to Sequences (#421)
  • Metric now extends Commons Writer so we can use AsyncWriter on it (#437)
  • Improve the error message when validating a sample shee. (#412)

@tfenne tfenne released this May 18, 2018

Assets 3

Bug fix release which resolves a problem introduced in a dependency that caused fgbio to be unable to read BAM files from stdin or named pipes. All users of 0.6.0 should upgrade to 0.6.1.

@tfenne tfenne released this Apr 5, 2018

Assets 3

Release 0.6.0 introduces the following changes to existing tools:

  • ReviewConsensusVariants: output PASS when there are no filters on the variant; fix format of bases output
  • MaskPrimers: improved usage documentation to make primer file format clearer

The following API changes were also introduced:

  • Added constants to SamRecord for SAM/BAM related constant values
  • NeedlemanWunchAligner renamed to Aligner (old name deprecated by still works)
    • Implemented Glocal (or semi-global) alignment mode
    • Impleemnted Local alignment mode
    • Fixed affine gap implementation
    • Fixed Alignment.subByQuery/subByTarget to correctly handle adjacent deletions
  • In metrics files, ensure 0.0 always formats as 0 and not 0E0
  • Updated how Rscript finds resources in the classpath to support local paths and absolute paths with and without leading slashes

@tfenne tfenne released this Feb 27, 2018

Assets 3

Release 0.5.1 is a minor bug-fix release and introduces the following changes:

  • ExtractUmisFromBam
    • Improved error messaging
    • Fixed bug that prevented it from working when only one read per pair contained a UMI
  • GroupReadsByUmi now adds the sub-sort SS tag to the header of BAMs produced
  • CallMolecularConsensusReads and CallDuplexConensusReads attempt to detect the sort order of input data and will fail if the sort order is incompatible
  • DemuxFastqs changed some output metrics from 32-bit Int to 64-bit Long to avoid overflows on NovaSeq data

@nh13 nh13 released this Feb 11, 2018

Assets 3

Release 0.5.0 introduces the following changes to existing tools:

  • CallDuplexConsensusReads: Fixed a rare bug where the consensus base quality could be zero or one if the two strands' base qualities differ by two or less.
  • FilterConsensusReads: Fix for bug where duplex reads formed from raw reads from a single strand only could be incorrectly filtered.
  • CorrectUmis: Now stores the original UMI sequences in the OX tag upon correction.
  • DemuxFastqs: Bug fix to correct quality scores in output BAM files
  • ClipOverlappingReads: Removed previously deprecated tool. Use ClipBam instead.
  • ClipBam:
    • Now optionally outputs metrics about clipping present in reads before and after execution.
    • New option to "upgrade" clipping, e.g. replace existing soft-clipping with hard-clipping

Changes to APIs were as follows:

  • Various deprecated methods were removed this release.
  • Metric formatting now prints smaller Doubles in scientific notation, and the formatting is generally more efficient.
  • NeedlemanWunchAligner gained a Glocal alignment mode for aligning all of a query sequence to a sub-region of a target sequence

@nh13 nh13 released this Nov 15, 2017 · 100 commits to master since this release

Assets 3

Release 0.4.0 introduces the following changes to existing tools:

  • CallDuplexConsensusReads
    • The single strand consensus bases and quals for each duplex consensus read are output into tags on the duplex consensus read
    • Added option to output consensus reads that are formed from only a single strand
  • FilterConsensusReads
    • New option to filter out reads with low mean base quality
    • New option to filter out reads whose minimum depth is too low
    • New option to filter duplex consensus reads where the single strand consensuses disagree
    • New optional tags will store the the single-strand consensus bases and qualities for duplex consensus reads.
  • DemuxFastqs
    • will no longer output /1 and /2 on read names when running in Illumina standards mode
    • fixed a bug causing an exception when the sample barcode is found in multiple reads (ex. i5 and i7)
  • ErrorRateByReadPosition - fixed bug that resulted in C>G errors being counted as A>G errors
  • GroupReadsByUmi
    • Reads with UMIs with Ns in them are now rejected
    • Log messages added with counts of reads filtered out by reason
    • Memory usage improvements when grouping reads at very, very high depth.
    • Supports enforcing a minimum UMI length and partial UMIs except for the paired strategy (duplex sequencing).

Finally, changes to various APIs were as follows:

  • Method in Bams to sort records by tag, or by a function applied to a tag
  • Improve speed of Metric.read for loading large numbers of rows from metrics files
  • Changed SamSource to extend IterableView instead of Iterable so that map(), filter(), etc. return lazy views
  • Fixed a bug where the specified temporary directory was not being used for sorting.
  • Added a BinomialDistribution class implemented using unlimited precision decimal math which is slower, but allows computation of cumulative probabilities where other implementations overflow or underflow

@nh13 nh13 released this Oct 5, 2017 · 122 commits to master since this release

Assets 3

Release 0.3.0 introduces the following changes to existing tools:

  • ClipBam - The --overlapping-reads option was not being used internally and is deprecated in favor of --clip-overlapping-reads. This caused overlapping reads to always be clipped.
  • CollectDuplexSeqMetrics - Added the optional output of duplex-umi frequencies with DuplexUmiMetrics.
  • DemuxFastqs - The default output sort order is changed from Unsorted to Queryname. Add an option --illumina-standards to output file names using Illumina naming conventions. Tuned the amount of memory used, especially for a large # of samples (>96).
  • CallDuplexConsensusReads - Do not except when we find potential collisions in duplex molecules, instead, do not generate a consensus read.
  • FilterBam - adding a few more filters.
  • Added a global parameter for log-level.

In addition, the following new tools were added:

  • CollectErccMetrics - This will collect metrics for analyzing ERCC spike-ins in
    RNA-Seq experiments for dose response but not fold-change

Finally, changes to various APIs were as follows:

  • ReferenceSetBuilder - Moved to the testing packages for use in projects that extend fgbio.
  • Alignment - Added subByQuery() and subByTarget() methods to Alignment.

@nh13 nh13 released this Jun 22, 2017 · 146 commits to master since this release

Assets 3

Release 0.2.0 introduces the following changes to existing tools:

  • added global arguments accessible to all tools, which are given as arguments prior to the tool name:
    • --tmp-dir: directory to use for temporary files.
    • --compression: default GZIP compression level, BAM compression level.
    • --async-io: use asynchronous I/O where possible, e.g. for SAM and BAM files.
  • numerous changes to the tool documentation to support output in MarkDown format.
  • DuplexConsensusCaller:
    • adding logging statistics for DuplexConsensusCaller.
    • adding quality trimming.
    • improved method to find the set of "compatible" cigars to filter which reads from which to build a consensus
  • DemuxFastqs:
    • the output directory should be created if it does not exist
    • change to the new quality format detector caused the detected encoding
      not to be printed
  • ClipOverlappingReads is deprecated in favor of ClipBam.
  • SampleSheet and ExtractBasecallingParamsForPicard
    • if the library identifier (Library_Id column) does not exist, it will default to the sample identifier (Sample_d column); previously it defaulted to the sample name (Sample_Name column).
  • HapCutToVcf: updated to support updated HapCut2 outputs.
    • the full FORMAT field in the VCF is printed, including trailing missing values.

In addition, the following new tools were added:

  • FastqToBam: generates an unmapped BAM (or SAM or CRAM) file from fastq files.
  • BuildToolDocs: generates the suite of per-tool MarkDown documents.
  • SplitBam: splits a BAM into multiple BAMs, one per-read group (or library).
  • ClipBam: clips reads from the same template; replaces ClipOverlappingReads.
  • CollectDuplexSeqMetrics: generates metrics for duplex sequencing quality control.

Next, a new API for reading and writing SAM/BAM files built for scala idioms:

  • SamRecord: a replacement for htsjdk's SAMRecord with more scala-esque fields and methods.
  • SamSource: a class for reading SAM/BAM/CRAM files and for querying them.
  • SamWriter: a class for writing SAM/BAM/CRAM files and sorting them.
  • SamOrder: a trait for specifying SAM/BAM orderings; in addition to coordinate and queryname sort orders, includes useful and novel sorts such as:
    • random: generates a random order over all the reads.
    • randomquery: generates a random order with queryname grouping.
    • templatecoordinate: the sort order used by GroupReadByUmi; sorts reads by the earlier unclipped 5' coordinate of the read pair, followed by the higher unclipped 5' coordinate of the read pair.
    • unsorted: the official "unsorted" ordering.
    • unknown: he official "unknown" ordering.
  • Bams: methods for manipulating sequences of SamRecords and other useful utility methods.
    • contains sorting methods that have better disk-backed sorting than htsjdk's for faster sorting of SAM/BAM files.
  • SamBuilder: a class for building SAM/BAM files and records; useful for generating test-cases for unit tests.

Finally the following other changes were made:

  • support for scala 2.12.2; we use this version by default.
  • support for cross-building and publishing of scala 2.11.8 and 2.12.2
  • uses 0.2.0 release of sopt and commons.

@tfenne tfenne released this May 8, 2017 · 180 commits to master since this release

Assets 3

Release 0.1.4 introduces the following changes to existing tools:

  • CallMolecularConsensusReads
    • Added the ability to filter the maximum number of reads going into a consensus read
  • CallMolecularConsensusReads and FilterConsensusReads
    • No longer have default values for their --min-reads and --min-consensus-base-quality/--min-base-quality parameters. The correct values for these parameters is highly library/coverage dependent and is best set by the user.
  • CallMolecularConsensusReads and CallDuplexConsensusReads
    • Raw reads are end-trimmed for Ns after low-quality masking, prior to consensus calling
    • Raw reads that are FR pairs with read length > insert size are trimmed to the insert size prior to consensus calling
  • ErrorRateByReadPosition
    • Fixed a bug whereby the cumulative error plot produced in the PDF incorrectly started the R2 error count at the cumulative sum of the R1 error count.
    • Added the count of errors (in addition to error rate) to the output file
  • FilterSomaticVcf
    • Now gracefully handles reads who's insert size and mapping information disagree. Warnings will be logged for all such reads, but the tool will not stop/exit upon finding such reads. Should reduce the frequency of "genomicPosition is outside of template" error messages
    • Works with VCFs that do not contain #contig lines in the header

In addition the following new tools were added:

  • DemuxFastqs: Performs sample demultiplexing on FASTQs
  • CorrectUmis: Corrects UMI sequences in BAM files when a set of fixed UMIs (not randommers) are used


  • Added support for cross-building scala 2.11 and 2.12
  • Tools that invoke R scripts will now produce less noisy output

@tfenne tfenne released this Feb 22, 2017 · 207 commits to master since this release

Assets 3

Release 0.1.3 introduces the following changes to existing tools:

  • CallMolecularConsensusReads now produces detailed information about consensus reads in new optional tags
  • MakeTwoSampleMixtureVcf now propogates the ID field from the source VCF into the mixutre VCF
  • ErrorRateByReadPosition now masks out known variants, provides per-substitution type error rates and produces summary plots
  • ReviewConsensusVariants now generates a detailed output file with a row per variant-supporting-read

In addition the following new tools were added:

  • ClipOverlappingReads: clips alignments from read pairs whose alignments overlap
  • FilterConsensusReads: filters consensus reads generated by CallMolecularConsensusReads
  • EstimatePoolingFractions: estimates the fractional contribution of individual samples with known genotypes to a pooled sample
  • EstimateRnaSeqInsertSize: estimates insert size distributions of RNA sequencing experiments in the presence of splicing
  • CallDuplexConsensusReads: generates consensus reads from duplex-sequencing protocols that embed a UMI at the start of each read in a pair
  • MakeMixtureVcf: generates a VCF for a mixture sample created from many individual samples
  • FilterSomaticVcf: applies filters to VCFs of somatic variants
  • RemoveSamTags: strips out optional tags/attributes from a SAM/BAM file to reduce size
  • ExtractBasecallingParamsForPicard: parses an Illumina Experiment Manager sample sheet and generates the files needed to run Picard's basecalling tools
  • ExtractIlluminaRunInfo: extracts information from Illumina's RunInfo.xml file into a simple tab-delimited table