Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how ontology issues over-inflation of annotation affects GO analyses #1869

Closed
ValWood opened this issue Mar 12, 2018 · 6 comments
Closed

how ontology issues over-inflation of annotation affects GO analyses #1869

ValWood opened this issue Mar 12, 2018 · 6 comments

Comments

@ValWood
Copy link
Contributor

ValWood commented Mar 12, 2018

This table (for a paper on unknowns) shows changes in pombe slim over time:

the orange fluctuations are due to stricter annotation criteria, the peach ones due to ontology changes. There are lots of ontology fixes that are not picked up by this snapshot, because they are added and fixed between the snapshots. Slim totals have been quite stable for the past 6-12 months apart from an occasional ontology glitch.

pombe sheet1

This figure shows the "cellular process slim" totals for pombe, cerevisiae and human. This isn't part of the unknowns paper, I did it out of interest. The green shading are cellular processes where the gene products are (largely) conserved 1:1:1. The blue are where I expected the human numbers to be higher (lots of on to many). Some term totals are a lot higher for human than expected. I have had a glance through some of the lists and identified some obvious errors. There are a lot more.

Personally I think it is critical that the annotation errors are addressed because their presence will obscure true enrichments and make GO unusable for human analysis.

comparison sheet1

Overinflated numbers appear to be largely due to 2 recurring annotations error types
i) annotations of target genes to a process and
ii) experimental readouts.

@vanaukenk
Copy link
Contributor

@ValWood
This is really interesting. In terms of actionable items, I see there are already tickets to review some of the human annotations, but it would be nice to generate the second table for more species and have it available as an ongoing report that curators could systematically check.

@ValWood
Copy link
Contributor Author

ValWood commented Mar 12, 2018

I was thinking that @vanaukenk . I was going to tag you because I remembered you asking about evidence for ontology changes affecting analysis.

It's very easy to do....with GO term mapper you can import legacy gafs to look at historical changes...
I waited until now to share because the pombe slim should not have so many large fluctuations from now on (there will be increases over time, but the ontology is much more stable for slim totals, at least for cellular processes).

It's really nice also to have a slim constantly available on our web site
https://www.pombase.org/browse-curation/fission-yeast-go-slim-terms
it's one of our most highly accessed pages.
I makes sanity checking specific process lists very easy.....

@ValWood
Copy link
Contributor Author

ValWood commented Mar 12, 2018

I can add other species if I get the protein coding gene lists. That is the trickiest part!

@pgaudet
Copy link
Contributor

pgaudet commented Jun 6, 2018

ACTION ITEM NYC 2018 GOC: Create report based on species slims

@ValWood
Copy link
Contributor Author

ValWood commented Aug 29, 2023

Should this be closed? The tool accompanying Marcs paper will help a lot to interrogate slim set differences.

@ValWood
Copy link
Contributor Author

ValWood commented Aug 29, 2023

Closing...

@ValWood ValWood closed this as completed Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants