Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow search for internal identifiers (was: "Cannot search for dictyBase gene IDs") #120

Open
pfey03 opened this issue Jun 3, 2014 · 7 comments
Milestone

Comments

@pfey03
Copy link

pfey03 commented Jun 3, 2014

In the Genes and Gene products search I searched for a dictyBase gene ID, such as DDB_G0291352, but get no result. However, the link to the gene is http://amigo2-test.stanford.edu/amigo/gene_product/dictyBase:DDB_G0291352

Can you add DDB_G IDs to be searchable?

@kltm kltm added this to the 2.1 milestone Jun 3, 2014
@kltm kltm added the upstream label Jun 3, 2014
@kltm kltm changed the title cannot search for dictyBase gene IDs Cannot search for dictyBase gene IDs Jun 3, 2014
@kltm kltm modified the milestones: 2.3, 2.1 Jun 3, 2014
@kltm
Copy link
Member

kltm commented Jun 3, 2014

It is searchable on the landing page and quick search boxes, but the id-part-only fields are not normally loaded into the other search indexes, probably since they're not guaranteed to be unique. This can be revisited when we go into 2.3 (the next time we're planning on playing with the indexing), but for now you can just use the full ID in non-quicksearch contexts.

@kltm kltm removed the upstream label Jun 3, 2014
@kltm kltm modified the milestones: 2.3, 2.4 Aug 26, 2015
@rachhuntley
Copy link

Similar example moved from GO help;

When I search in AmiGO (Advanced search: genes and gene products in the free text filtering box) for a UniProt accession,
e.g. Q4VCS5, there are no results returned. I have to add the prefix
"UniProtKB:" for it to find anything. However, when I search for an
RNAcentral identifier the opposite is true, e.g. URS000039ED8D does get a
result, but RNAcentral:URS000039ED8D does not.

I don't think we can assume that people will know what the prefixes for all
databases are, or whether or not they should add one to get a result, so can
this search be made to work with all options?

@kltm
Copy link
Member

kltm commented Nov 1, 2015

Similar, from @ValWood on #269:

When I search on a gene product ID (eg SPAC23A1.08c)
from here
http://amigo.geneontology.org/amigo

http://amigo2.berkeleybop.org/amigo/medial_search?q=SPAC23A1.08c

the result is under "general' rather than
Genes and gene products
or
Associations

@cmungall
Copy link
Member

cmungall commented Nov 1, 2015

@kltm, I don't understand this part:

It is searchable on the landing page and quick search boxes, but the id-part-only fields are not normally loaded into the other search indexes, probably since they're not guaranteed to be unique

Why is there any uniqueness assumption here? Why isn't the ID just another piece of text Solr can search on?

What's "id-part-only" fields?

The phenotype seems to be that medial_searches for what I call the "global ID" work fine, but the "local ID" do not. See the Identifiers wiki page for an explanation of my terminology.

So a medial_search for http://amigo2.berkeleybop.org/amigo/search/bioentity?q=PomBase:SPAC23A1.08c will yield the desired result, but http://amigo2.berkeleybop.org/amigo/search/bioentity?q=SPAC23A1.08c will not.

This is in contrast to autocomplete, in which either local ID or global ID can be used, with expected results.

So the solution would be to have the medial_search use the same index or indexing strategy as the autocomplete, so behavior is consistent (here by consistent I mean that any result obtainable in autocomplete should probably be a subset of the result obtained by hitting return and getting medial search).

but for now you can just use the full ID in non-quicksearch contexts

By quicksearch you mean autocomplete?

This is OK for GOC members as we should all have all gene database prefixes memorized. But we can't expect the average user to do this of course.

The simpler solution is to use autocomplete if you know where it is you want to end up, and to use search for searching (and expect a set of results). I had never encountered this quirk in search behavior before, because if I want to end up on a gene page, and I know some ID, ID portion or symbol, I use autocomplete. I just type "SPAC23A1.08c" in the box marked "quick search", and a few milliseconds later the gene shows up in autocomplete and I just navigate there directly, no medial search. I'm wondering if a translatlantic lag means that UK users are more likely to hit return before autocomplete has a chance to work?

@RLovering
Copy link

Hi
I was using AmiGO 2 today and I wanted to look up the annotations associated
with human PARK2. I pasted O60260 into the search field and got no genes or
gene products, and no annotations, but I did get 6 GO terms (due to comments
like An example of this is PARK2 / parkin in human (O60260) in PMID:22314364
(inferred from mutant phenotype).

So then I tried pasting PARK2 in the search field, this time I got 33
gene/gene products, 1400 annotations and 7 GO terms. Looking at the 33 genes
none looked obviously like O60260. So I filtered on human and by opening each
individual record I finally retrieved the record for O60260 with 297
annotations.

Why isn't it possible to search AmiGO using the UniProtKB ID? and why can't
you include the gold star system to indicate which gene ID is the manually
curated record (as UniProtKB does with the gold stars)? and QuickGO does
using tabs for TrEMBL v UniProtKB records. There is a massive difference between the annotations associated with UniProtKB and TrEMBL entries and users looking at TrEMBL IDs will not keep looking for the UniProtKB entry but will think this is the data available and might even be confused about why the protein was present in their enrichment analysis results, and may make them question the accuracy/value etc of GO.

I would suggest that this issue needs to be addressed sooner rather than later

Thanks

Ruth

@cmungall
Copy link
Member

Searching by ID local parts will be fixed with #157

multiple IDs for the same gene: we don't intend to use the star system, we'll only load the GCRPs. See my announcement to go-discuss about UniProtKB switching to using the GCRP, and this geneontology/go-site#185 - will be fixed June 6

@kltm
Copy link
Member

kltm commented Jul 6, 2018

Note medial search issues as well from #514

@kltm kltm changed the title Cannot search for dictyBase gene IDs Allow search for internal identifiers (was "cannot search for dictyBase gene IDs") Feb 22, 2023
@kltm kltm changed the title Allow search for internal identifiers (was "cannot search for dictyBase gene IDs") Allow search for internal identifiers (was: "Cannot search for dictyBase gene IDs") Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants