v0.5.0
-
Added option to use prosplign instead of miniprot to align proteins as annotation evidence
- Prosplign is most likely to help in organisms with less RNA-seq, and/or those that are more distant from other genomes with protein sets
- Prosplign is more computationally expensive than miniprot
-
Added new protein sets with many more species and protein selection logic
- User-specified number of taxa to use from a protein set, selecting the taxonomically-closest
- Can exclude protein sets by taxid
- Can supplement default sets with user-provided proteins
-
Added tRNA feature annotation using tRNAscan
-
Added rRNA feature annotation using Rfam
-
Added other short ncRNA feature annotation (with the exception of miRNAs) using cmsearch
-
Revisions to ortholog reference names to better meet GenBank standards
-
Improvements for plant gene naming
-
Adjust CPU settings for some tasks (e.g. STAR) to improve efficiency
-
Reduce footprint of EGAPx working directories
-
Fixed an issue where minimap2 fails by renaming long reads #166
-
Fixed an issue where BUSCO wasn't producing output in some situations
-
Fixed an issue where EGAPx failed when sequence titles are absent in genome FASTA files #173
-
Fixed an issue where EGAPx failed when run with no RNA-seq
-
Fixed an issue where EGAPx was downloading extra SRA runs
-
Added asn_adjust to image, which can be used to edit ASN in some situations
-
Improvements to prepare_submission for GenBank submission:
- annotation provider can be specified GP-40924
- handling for multispecies BioProjects
- outputs final GFF3 file which can incorporate adjustments to structural/functional annotation
-
Documentation updates
- Added documentation for new features
- Added documentation for recommended genome FASTA formatting
- Updated documentation for recommended RNAseq formatting
- Clarified EGAPx does not annotate organelles
- Added language about prepare_submission errors that can be ignored
- Added EGAPx diagram