Skip to content

The Herbarium Update

Choose a tag to compare

@mossmatters mossmatters released this 30 Jan 17:15
· 730 commits to master since this release

1.3 The Herbarium Update January, 2018

Bug fixes and features related to the use of targeted sequencing from herbarium material. These samples tend to have short contigs that can cause issues when trying to assemble full-length genes.

Features

  • Added --exclude flag to be the inverse of --target: all sequences with the specified string will not be used as targets for exon extraction (they will still be used for read-mapping). Useful if you want to add supercontig sequence to the target file, but not use it for exon extraction.

  • Added --addN to intronerate.py. This feature will add 10 N characters in between joined contig when recovering the supercontig. This is useful for identifying where the intron recovery fails, and for annotation processing (i.e. for GenBank).

  • Added a new version of the heatmap script, gene_recovery_heatmap_ggplot.R. This script is much simpler and produces nice color PNG images, but struggles a bit on PDF output. The original heatmap script is stil included. Thanks to Paul Wolf for the ggplot code!

Bug Fixes

  • Fixed misassembly of supercontigs when there are multiple alignments to different parts of the same exon.
  • Fixed poor filtering of GFF results to produce intron/exon annotation.
  • Fixed non-propogation of exonerate parameters