Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extra-long incorrect prophage calls #6

Open
k6logc opened this issue Apr 2, 2024 · 6 comments
Open

extra-long incorrect prophage calls #6

k6logc opened this issue Apr 2, 2024 · 6 comments

Comments

@k6logc
Copy link

k6logc commented Apr 2, 2024

Hi Mike,

Thank you for Cenote-Taker3! We liked Cenote-Taker2 and are excited to explore your update. In our early runs we've noticed some examples of extra-long regions (>2,000,000 bases) being incorrectly called as prophages. In general our runs look reasonable and we haven't yet sorted out why this is happening in some cases. Do you have any insight into what might be going on?

We have v3.3.0 and the updated databases, and are running as:
cenotetaker3 -c $inDIR/$1 -r $modified_base -p T --lin_minimum_hallmark_genes 2 --cpu 6 --cenote-dbs $ct3DBpath

Examples of genomes that yield wonky hits:
https://www.homd.org/ftp/genomes/PROKKA/V10.1/fna/SEQF1065.1.fna
https://www.homd.org/ftp/genomes/PROKKA/V10.1/fna/SEQF9972.1.fna

Any feedback or guidance much appreciated, thank you!

-Kathryn (& looping in @AmrutaIdagunji who is working on this)

@mtisza1
Copy link
Owner

mtisza1 commented Apr 3, 2024

Hi Kathryn,

Thanks for opening this issue. I just tested the SEQF1065.1.fna file, and I found the problem. CT3 is behaving as expected, in this case, the sequence "SEQF1065.1|CP003503.1 HMT-643 Prevotella intermedia 17" has DTRs/Direct Terminal Repeats. Therefore, CT3 treats it as circular and therefore does not prune. See log message "assess_virus_genes1.py: No non-DTR virus contigs >= 10,000 nt. So pruning will not happen" when running this file.

Would it be useful to you if I added a flag to force pruning on DTR sequences? I'm planning some small updates and I believe I can implement this.

Let me know,

Mike

@k6logc
Copy link
Author

k6logc commented Apr 3, 2024

Hi Mike,

Thank you very much for checking this out and for your quick diagnosis. Yes, it would be fantastic if you could add the option to force a prune on DTR seqs. In the case of SEQF1065.1 we expect to have two 30-40kb prophages predicted on that contig and it will be great to see if the force prune recovers them. Do you have a ballpark sense on timeline for your next planned update?

Many thanks!

Best,
Kathryn

@mtisza1
Copy link
Owner

mtisza1 commented Apr 3, 2024

Kathryn,

I'm happy to help. I would anticipate the update to be live in about 1 week.

I know that this doesn't help from a pipeline perspective, but if you are just curious about the few cases you mentioned in your earlier post, you can remove the DTR sequence from the beginning or end of your contigs before feeding it to CT3. You can find these sequences in CT3 output files, e.g. "final_genes_to_contigs_annotation_summary.tsv"

Mike

@k6logc
Copy link
Author

k6logc commented Apr 4, 2024

Hi Mike,

Great about the timing on the update! We are looking forward to reflecting those changes in our pipeline as well. And thank you for mentioning about trimming the DTRs for seeing what's going on in those cases, makes sense.

Thank you!

Best,
Kathryn

@mtisza1
Copy link
Owner

mtisza1 commented Apr 12, 2024

Hi Kathryn,

I've put a new version v3.3.1 on Bioconda. see release.

I've added the flag --max_dtr_assess (which defaults to 1,000,000). This allows you to set the length at which CT3 stops looking for DTRs. This way, assemblies of complete circular bacterial chromosomes with prophage will get pruned instead of treated like circular/DTR viruses.

Please update to v3.3.1. This should do it:

conda activate ct3_env

mamba install bioconda::cenote-taker3=3.3.1

Let me know if this fixes your issue.

Mike

@k6logc
Copy link
Author

k6logc commented Apr 14, 2024

Hi Mike,
Fantastic, thank you!! Will report back.
Best,
Kathryn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants