Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] dnaA truncated by PGAP gene calling #252

Closed
tuspjo opened this issue Apr 4, 2023 · 20 comments
Closed

[BUG] dnaA truncated by PGAP gene calling #252

tuspjo opened this issue Apr 4, 2023 · 20 comments

Comments

@tuspjo
Copy link

tuspjo commented Apr 4, 2023

Describe the bug
Depositing some bacterial genomes, I noticed that 9 of them come back with a /pseudo tag on the dnaA gene after PGAP annotation. They all have this field
/note="incomplete; partial on complete genome; missing
N-terminus; Derived by automated computational analysis
using gene prediction method: Protein Homology."
However these genes were not truncated in a prodigal annotation of the genes, and the closest database reference determined by autoMLST also has the full dnaA sequence (see AA alignment below).
image

For most of the strains with the /pseudo tag, Streptomyces niveus strains have the highest %ANI from autoMLST (Streptomyces_niveus_GCF_002009175, Streptomyces_niveus_NCIMB_11891_GCF_000497425, )

The submitted genomes are not yet publicly available but I can supply you with the genbank files if necessary.

Since this is annotation performed at NCBI, I don't have the log files and software versions requested in the bug report form.
The issue in 9 genomes is not consistent as many of the Streptomyces genomes I deposited do not have the /pseudo tag.

Best

Tue Sparholt Jørgensen

@azat-badretdin
Copy link
Contributor

Thank you, Tue, for your report. We will investigate this issue in internal ticket.

@tuspjo
Copy link
Author

tuspjo commented Apr 4, 2023

great, please let me know if I can be of any help.
I'm thinking this could be relevant information: Several more genomes have a similarly suspecious gene calling, without the /pseudo tag:
image

@azat-badretdin
Copy link
Contributor

I'm thinking this could be relevant information

Agreed. Thanks!

@azat-badretdin
Copy link
Contributor

Could you please post some of the input genomes?

@tuspjo
Copy link
Author

tuspjo commented Apr 4, 2023

I can 't post them here unfortunately, as they are "embargoed" but I can send a safe download link to your email address? Do you want only the ones with /pseudo or also some of the ones with the same gene calling but not /pseudo on dnaA?

@azat-badretdin
Copy link
Contributor

as they are "embargoed" but I can send a safe download link to your email address?

Sure.

Do you want only the ones with /pseudo or also some of the ones with the same gene calling but not /pseudo on dnaA?

The more examples the better.

Thanks!

@tuspjo
Copy link
Author

tuspjo commented Apr 4, 2023 via email

@azat-badretdin
Copy link
Contributor

Looking forward to it, Tue!

@azat-badretdin
Copy link
Contributor

I got the genomes, Tue, thanks!

@azat-badretdin
Copy link
Contributor

Unfortunately, they post output data, not the input data. We need input FASTA files.

@tuspjo
Copy link
Author

tuspjo commented Apr 13, 2023

Dear Azat,
I've sent the input genomes in fasta format, did you receive them?
Best,
Tue

@azat-badretdin
Copy link
Contributor

Thanks, Tue! Not yet. So far I got only the original package from 4/5. The data goes through a different group, they will notify us when it comes.

@tuspjo
Copy link
Author

tuspjo commented Apr 13, 2023

Ok. I sent a link to the fasta files on Tuesday so hopefully they will make their way to you soon.

@azat-badretdin
Copy link
Contributor

Tue, judging by the output you sent us in the first tarball, it looks like you did not use standalone PGAP for these annotation, but GenBank submission service. Could you please confirm?

@tuspjo
Copy link
Author

tuspjo commented Apr 13, 2023

yes, that is correct, the annotation was performed at NCBI, not by the standalone PGAP CLI.

@azat-badretdin
Copy link
Contributor

Thank you for confirming, Tue. That explains the confusion.

@tuspjo
Copy link
Author

tuspjo commented Apr 19, 2023

Hi again,

The genomes in PGAP input format (fasta w info in header) I sent weren't received, do you want me to reupload them, and how do I get the download link to you if the genomes@ncbi.nlm.nih.gov is not a good channel?
Best,

Tue

@azat-badretdin
Copy link
Contributor

Since they were submitted via Genbank, we have the input data already. Thanks!

@tuspjo
Copy link
Author

tuspjo commented Nov 24, 2023

A quick followup on this, in case anyone stumbles on this bug report: the dnaA genecalling was modified/improved, which resolved all the observed issues by identifying the complete dnaA genes rather than partial genes.

@azat-badretdin
Copy link
Contributor

Thank you, Tue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants