Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is these results fine to use? #282

Closed
maesaar opened this issue Oct 2, 2016 · 5 comments
Closed

Is these results fine to use? #282

maesaar opened this issue Oct 2, 2016 · 5 comments

Comments

@maesaar
Copy link

maesaar commented Oct 2, 2016

I have few problems with output:

  1. I have 189 cases where same gene have ended up in different groups;
  2. I have 15 cases where the paralogs couldn’t be split;
  3. I have 36 cases where QC gives:
    "Hypothetical protein with no hits to refseq/uniprot/clusters/cdd/tigrfams/pfam overlapping another protein with hits"
  4. I have 27 cases where QC gives:
    "Investigate"

Is it okay to use Roary with that kind of results or could You give pointers what to do with the situation?

Thanks

@maesaar maesaar changed the title Is results ok to use? Is these results fine to use? Oct 2, 2016
@tseemann
Copy link
Contributor

tseemann commented Oct 2, 2016

For 1) how do you know it was the "same gene" ?

Roary was designed for within-species studies, some of its speed ups and heuristics do not work with divergenet data.

@maesaar
Copy link
Author

maesaar commented Oct 2, 2016

In the github page it is said:

"A non unique gene name, where sequences with the same gene name have ended up in different groups."

I have 189 non unique gene names - so I thought it was what that meant.

Is it not correct?

@maesaar
Copy link
Author

maesaar commented Oct 2, 2016

I used Roary core alignment for BNG analysis and my main concern is if it is ok to use when I have three genes with QC:

Hypothetical protein with no hits to refseq/uniprot/clusters/cdd/tigrfams/pfam overlapping another protein with hits

Should I exclude these genes with QC column with values Hypothetical... and Investigate?

@andrewjpage
Copy link
Member

Hi,
Roary does a bit of QC and flags things that may be errors. Its up to you to look at your data and to decide if theres something wrong (like erroneous annotation or host contamination). We try and split paralogs but sometimes theres not enough information available to split them. When paralogs are split, it can cause identical genes to end up in different clusters (based on the syntany). You can turn this off by using the '-s' parameter.

@maesaar
Copy link
Author

maesaar commented Oct 3, 2016

Thank You for really quick feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants