Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roary including non-protein coding features? #398

Closed
tseemann opened this issue May 4, 2018 · 2 comments
Closed

Roary including non-protein coding features? #398

tseemann opened this issue May 4, 2018 · 2 comments

Comments

@tseemann
Copy link
Contributor

tseemann commented May 4, 2018

As I understand it, Roary uses cd-hit and blastp and is designed for proteins only.

However, it seems it allows other non-protein-coding features into the mix:

default => '(CDS|ncRNA|tRNA|tmRNA|rRNA)'

The default should be CDS only.

It seems that there is a size filtering step which tends to remove "most" of these features but they can still leak through.

@embatty
Copy link
Contributor

embatty commented May 7, 2018

I changed this default to just CDS throughout, and it removes the tRNA groups I was seeing in the output. The other issue is that the RNA sequences were getting translated even though they aren't really open reading frames - you could add -complete => 1 to the translate function to perform these checks and it would give a warning on anything that doesn't look like a valid protein. That might not be necessary as I assume most people are using sequences annotated by Prokka, and so they will have sensible ORFs in the input files.

@andrewjpage
Copy link
Member

Thank you @embatty for the pull request which fixes this. I've merged it just now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants