-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how ontology issues over-inflation of annotation affects GO analyses #1869
Comments
@ValWood |
I was thinking that @vanaukenk . I was going to tag you because I remembered you asking about evidence for ontology changes affecting analysis. It's very easy to do....with GO term mapper you can import legacy gafs to look at historical changes... It's really nice also to have a slim constantly available on our web site |
I can add other species if I get the protein coding gene lists. That is the trickiest part! |
ACTION ITEM NYC 2018 GOC: Create report based on species slims |
Should this be closed? The tool accompanying Marcs paper will help a lot to interrogate slim set differences. |
Closing... |
This table (for a paper on unknowns) shows changes in pombe slim over time:
the orange fluctuations are due to stricter annotation criteria, the peach ones due to ontology changes. There are lots of ontology fixes that are not picked up by this snapshot, because they are added and fixed between the snapshots. Slim totals have been quite stable for the past 6-12 months apart from an occasional ontology glitch.
This figure shows the "cellular process slim" totals for pombe, cerevisiae and human. This isn't part of the unknowns paper, I did it out of interest. The green shading are cellular processes where the gene products are (largely) conserved 1:1:1. The blue are where I expected the human numbers to be higher (lots of on to many). Some term totals are a lot higher for human than expected. I have had a glance through some of the lists and identified some obvious errors. There are a lot more.
Personally I think it is critical that the annotation errors are addressed because their presence will obscure true enrichments and make GO unusable for human analysis.
Overinflated numbers appear to be largely due to 2 recurring annotations error types
i) annotations of target genes to a process and
ii) experimental readouts.
The text was updated successfully, but these errors were encountered: