Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"full text search" retrieving spurious GPs #387

Closed
ValWood opened this issue Sep 19, 2016 · 6 comments
Closed

"full text search" retrieving spurious GPs #387

ValWood opened this issue Sep 19, 2016 · 6 comments
Labels

Comments

@ValWood
Copy link

ValWood commented Sep 19, 2016

There are 648 pombe genes annotated to "cell cycle"
When I do an ontology search on "cell cycle" I retrieve these 648 genes.

However when I do a full text search I retrieve 688 genes.
All of the real cell cycle genes are included

overlap

@ValWood
Copy link
Author

ValWood commented Sep 19, 2016

These are some of the GPs which are not "cell cycle associated
SPAC3A12.09c
SPBC8E4.03
SPAC1952.11c
SPAC17A2.05
SPAC24C9.06c
SPCC18.18c
SPAC29A4.13

This is my best guess is that these have some annotations which contain the word "cell" and some which contain the word "cycle"
(all of the ones I looked at were nitrogen cycle, or TCA cycle)

@ValWood
Copy link
Author

ValWood commented Sep 19, 2016

Similarly if I search on "mitotic cell cycle"
The free text search gives my 40 spurious hits

mitotic_cell_cycle

@ValWood
Copy link
Author

ValWood commented Sep 19, 2016

Hmm for the mitotic one most are probably caused by this missing parent:
http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0071851#term=ancchart

now reported
geneontology/go-ontology#12666

@ValWood
Copy link
Author

ValWood commented Sep 19, 2016

Most of the rest are due to annotations to "mitotic spindle pole body"
and "meiotic cell cycle"

Some of these we know are meiosis specific. The "meiotic spindle pole body" annotations are because this ORFeome data was expressing these genes during vegetative growth

I am working on a clean up of these
pombase/curation#791

Although in general, I wonder if it makes sense to take a search string "mitotic cell cycle" and
look for separate occurrences of the composite strings "mitotic" and "cell cycle" for the same gene product ?

@kltm
Copy link
Member

kltm commented Sep 19, 2016

@ValWood If you are looking for things with text, that is the free text filter and it may pick up things that are not specific to a term. You can use quotation marks to lock in that specific phrase, but you should really be using the "inferred annotation" filer to the term "cell cycle", e.g.:
http://amigo.geneontology.org/amigo/search/bioentity?q=*:*&fq=taxon_subset_closure_label:%22Schizosaccharomyces%20pombe%22&fq=regulates_closure_label:%22cell%20cycle%22&sfq=document_category:%22bioentity%22

@cmungall
Copy link
Member

The concern is valid, and addressed here #158

in particular @dosumis' comments: #158 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants