Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VirSorter predicts prophage that is longer than the actual contig size #68

Closed
hoelzer opened this issue Apr 1, 2020 · 3 comments
Closed

Comments

@hoelzer
Copy link

hoelzer commented Apr 1, 2020

Hi! I just observed a possible issue where a predicted prophage sequence's stop is larger than the actual contig size:

Test assembly:

kleiner_2015.fasta.gz

Command used:

wrapper_phage_contigs_sorter_iPlant.pl -f ${fasta} --db 2 --wdir virsorter --ncpu ${task.cpus} --data-dir ${database} --virome

Observation

It seems that VirSorter predicts a prophage in a range that is actually larger than the contig size. Example:

>NODE_51_length_63443_cov_50.479870

So contig NODE_51 has a length of 63443 nt.

Now VirSorter predicts a prophage for this contig from position 19922-63493:

(base) [mhoelzer@hh-yoda-11-01 ~]$ grep NODE_51 virsorter/Predicted_viral_sequences/VIRSorter_prophages_cat-4.fasta 
>VIRSorter_NODE_51_gene_20_gene_72-19922-63493-cat_4

So the predicted prophage's stop position is larger than the actual contig size when I understand the output correctly?

@simroux
Copy link
Owner

simroux commented Apr 1, 2020

Hi ! You're right, something's wrong here.. :-)
It is however relatively innocuous and an easy fix: by default, VirSorter extends the prophage sequence by 50 nucleotide beyond the last gene in 5' and 3' (to include potential att sites and not end a contig right on a start / stop codon). I just forgot to include a check to make sure we don't extend past the contig, i.e. if the last gene of the prophage is at the end of the contig, the coordinate from VirSorter will be 50 nucleotides beyond the contig (hence 63493 vs 63443).

I'll fix it asap, but in the meantime you can safely use these results anyway, as the prediction is overall correct (also, the genbank file you get from this prophage is accurate too, since it automatically adjust to the actual sequence length).

Best,
Simon

@hoelzer
Copy link
Author

hoelzer commented Apr 1, 2020

I see! Thanks for the explanation.

@simroux
Copy link
Owner

simroux commented Apr 1, 2020

Should be fixed now, thanks again for reporting the bug !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants