Skip to content

Releases: ncbi/pgap

2024-07-18.build7555

24 Jul 15:55
Compare
Choose a tag to compare

This release, as well as the previous one, is based on PGAP-6.7.

  • Incorporation of GeneOntology 2024-06-10
  • BUG FIX: restore genus-only functionality: #311, #305, #304, #303
  • structural annotation algorithm improvements and bug fixes
  • TaxCheck fixes:
    • Synchronize code in TaxCheck buildruns with code used in internal processes
    • Introduce handling of plasmids in contamination (i.e., prevent promiscuous plasmids from counting as contaminants)
    • Introduce lineage match options for MAGs
  • REQUEST: #273. PGAP now writes only to the output directory, including transient files.
  • DATA FIX: #308. We added an HMM NF046015 to correct the name of the protein

Third Party software versions

  • tRNAscan-SE 2.0.12
  • hmmer v.3.4
  • infernal 1.1.5
  • CRISPR v.1.03
  • AntiFam v.3.0
  • Rfam v.14.4
  • GeneMarkS-2 v.1.14_1.25

2024-04-27.build7426

02 May 17:08
Compare
Choose a tag to compare

This release is based on PGAP-6.7.

  • Updated software:
    • hmmer v.3.4
    • infernal 1.1.5
  • Protein Family Models now includes PFam 36
  • Incorporation of GeneOntology 2024-01-17
  • Support for Apple Silicon using Docker Desktop / Rosetta
  • Revised report generation for ANI Taxonomy checks, including significant overhaul of how ANI contamination is computed and reported

Third Party software versions

  • tRNAscan-SE 2.0.12
  • hmmer v.3.4
  • infernal 1.1.5
  • CRISPR v.1.03
  • AntiFam v.3.0
  • Rfam v.14.4
  • GeneMarkS-2 v.1.14_1.25

2023-10-03.build7061

05 Oct 14:58
Compare
Choose a tag to compare

This release is based on PGAP-6.6. It includes the following features and bug fixes:

  • Lowered pseudogene false positive rate by improving protein alignment handling during structural annotation
  • Designed new hidden Markov models (HMMs) for validated small proteins, for improving structural annotation
  • Adopted PFAM release 35, for help in structural and functional annotation
  • Bug fixes:
    • Podman container name is now respected, #270
    • Empty undocumented files now deleted from output
    • Removed extra # in annot_with_genomic_fasta.gff
  • pgap.py -g option now supports absolute paths
  • If output directory exists, exit rather than add a suffix

Third Party Software and Data versions used:

  • GeneOntology 2023-07-27
  • tRNAscan-SE 2.0.12
  • hmmer v.3.1b2
  • CRISPR v.1.02
  • AntiFam v.3.0
  • Rfam v.14.4
  • GeneMarkS-2 v.1.14_1.25
  • infernal v.1.1.4

2023-05-17.build6771

23 May 20:27
Compare
Choose a tag to compare

This release is based on PGAP-6.5. It includes the following features and bug fixes:

  • Addition of attributes (Gene Ontology terms, EC numbers, gene symbols) to more protein-coding features, by propagation from curated conserved domain architectures (CDD architectures)
  • Incremental improvements in structural algorithm
  • Addition of simple option for passing fasta and organism to pgap.py via parameters instead of a yaml file. This option is not sufficient and should NOT be used if the annotated assembly is intended for submission to GenBank.
  • Improved help text for pgap.py
  • Bug fixes:
    • for small circular plasmids, elimination of crashes when alignment covers almost all of the sequence of the plasmid
    • consistency of partiality and pseudo-status of features between cdregions and genes
    • fixed CPU handling in SLURM environment
    • use of fasta file name instead of 'gc_assm_name' as assembly name in ani-tax-report files

Third Party Software and Data versions used (no changes since last release):

  • GeneOntology 2023-01-01
  • tRNAscan-SE 2.0.12
  • hmmer v.3.1b2
  • CRISPR v.1.02
  • AntiFam v.3.0
  • Rfam v.14.4
  • GeneMarkS2 v.1.14_1.25
  • infernal v.1.1.1

2022-12-13.build6494

20 Dec 10:34
Compare
Choose a tag to compare

This release is based on PGAP-6.4. It includes the following features and bug fixes

New features:

  • More stringent filtering of alignments of trusted proteins, resulting in improvements in the structural annotation of long proteins
  • New outputs: nucleotide and protein sequences of CDS features and enhanced Roary-ready GFF output
  • Upgrade to tRNAscan-SE 2.0.12
  • Changes in the reference data:
    • Incorporation of GeneOntology 2022-11-03 changes
    • Switch to CDD 3.20 architectures

Bug fixes

  • Serious publication retrieval bug introduced by changes in third party service during the lifetime of the previous build fixed

Third Party Software Versions Used

  • tRNAscan-SE 2.0.12
  • hmmer v.3.1b2
  • CRISPR v.1.02
  • AntiFam v.3.0
  • Rfam v.14.4
  • GeneMarkS-2 v.1.14_1.25

2022-10-03.build6384

05 Oct 20:41
Compare
Choose a tag to compare

This release is based on PGAP-6.3. It includes the following features and bug fixes:

  • Added more stringent filtering of low coverage and complexity protein alignments, resulting in better annotation of long protein models
  • Incorporated CheckM (Parks, Donovan H et al. Genome research vol. 25,7 (2015): 1043-55) for calculating the completeness and contamination of the assembly based on the presence/absence of lineage-specific markers in the set of PGAP-predicted models
  • Added import traceback #224
  • Bug fix: switched to newer docker hub repository version v2
  • Bug fix: better handling of the "isolate" FASTA modifier.

Third Party Software Versions Used

No changes since the previous release.

  • tRNAScan-SE v.2.0.9
  • hmmer v.3.1b2
  • CRISPR v.1.02
  • AntiFam v.3.0
  • Rfam v.14.4
  • GeneMarkS-2 v.1.14_1.25
  • CheckM v.1.2.1

2022-08-11.build6275

15 Aug 14:02
Compare
Choose a tag to compare

Used: PGAP 6.2

This release is based on PGAP-6.2. It includes the following features and bug fixes:

  • Update to the structural annotation algorithm: increased trust in HMM alignments resulting in better choice of start sites
  • Lowering of the acceptance criteria for ab initio hypothetical ORFs from 45 aa to 40 aa
  • Update tRNAScan-SE from v.2.0.7 to 2.0.9
  • Fixed handling some FASTA modifiers github issue 210
  • Support apptainer - new singularity
  • Fixed support of home directory installation for Windows users
  • Updated align_filter usage in CWL

Third Party Software Versions Used

  • tRNAScan-SE v.2.0.9
  • hmmer v.3.1b2
  • CRISPR v.1.02
  • AntiFam v.3.0
  • Rfam v.14.4
  • GeneMarkS-2 v.1.14_1.25

2022-04-14.build6021

22 Apr 11:53
Compare
Choose a tag to compare

This release is based on PGAP-6.1. It includes the following improvements and bug fixes:

  • Improvement: faster installation achieved with parallel download and decompression of PGAP and taxcheck data packages
  • Improvement: PGAP can now be installed at a configurable location, different from the home directory. By default it will install in $HOME/.pgap, but this location can be changed by setting the environmental variable PGAP_INPUT_DIR.
  • Bug fix: assemblies for organisms without a genus in their lineage can now be annotated.
  • Bug fix: incomplete installation caused by race condition in directory creation fixed
  • Bug fix: mapping of gene symbols by orthology to genes in reference genomes is now correct. Fixes the assignments of gene symbols (e.g. recA) to features in the annotations of: Acinetobacter pittii, Bacillus subtilis, Campylobacter jejuni, Escherichia coli and Mycobacterium tuberculosis genomes. Annotation of other species is unaffected.

2022-02-10.build5872

15 Feb 22:21
Compare
Choose a tag to compare

This release is based on PGAP-6.0. It includes the following features and bug fixes:

  • Gene Ontology terms are now added to CDSs and proteins, when known. Like EC numbers, these are propagated from HMMs and BlastRules used to name the proteins.
  • Incorporated 17 RFAM models for the annotation of more riboswitches
  • Introduced the --auto-correct-tax flag in pgap.py, to override the organism provided in the input YAML file, if the taxcheck predicts a different organism with high confidence. Use in combination with the --taxcheck flag
  • Introduced a minimum coverage threshold of 20% to taxcheck  - if the query assembly doesn't match any type assembly over 20%, taxcheck will return inconclusive results (not predict an organism)
  • Added support for Debian 10
  • Bug fix: assemblies for organisms without a genus in their lineage can now be annotated.
  • Bug fix: running PGAP with Singularity without internet access (--no-internet) is now possible. Users need to point pgap.py to a local SIF image (converted from Docker) using the --container-path argument.

2021-11-29.build5742

02 Dec 15:32
Compare
Choose a tag to compare

This release is based on pgap-5.3. It includes the following features and bug fixes:

  • Updated the structural annotation algorithm to facilitate future extensibility. This change results in improvements in structural annotation, driven by higher weight of GeneMarkS2+ ab initio models at loci where only weak evidence are found (such as low identity and coverage protein alignments or partial HMM hits).
  • Switched to Linux kernel 3 compilation of GeneMarkS2+
  • Upgraded PFAM models to PFAM 34
  • Adjusted the minimum percent identity thresholds used by the Average Nucleotide Identity tool for several species, including Listeria monocytogenes, Campylobacter lari, and Vibrio vulnificus.
  • Improved reporting of errors in input YAML files