Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order gene list on similarity to entered term (not alphabetically) #1074

Open
AlistairNWard opened this issue Jun 3, 2024 · 2 comments
Open

Comments

@AlistairNWard
Copy link
Member

Search for the gene F5 and there are so many genes returned that contain F5, that the dropdown is cutoff well before genes beginning with "F" are displayed. This means it's impossible to search for F5 and select it.

Instead of ordering genes in the search dropdown alphabetically, they should be ordered according to their similarity to the entered term. As an exact match, F5 (or genes with a synonym of exactly F5) would be then be the first in list when F5 is entered.

@tonydisera
Copy link
Collaborator

tonydisera commented Jun 17, 2024

Agreed. I have reworked the gene search typeahead functionality in release 4.11. This new behavior will try to match on the gene symbol first. If there are no matches, it will search the gene aliases. In your example, the current behavior results in this long list of 'hits':
Screenshot 2024-06-17 at 5 19 00 PM

In gene.iobio 4.11, the search on F5 will return only that gene:
Screenshot 2024-06-17 at 5 18 54 PM

Here is a more complicated example. If we type in 'MA', we get a long list of genes starting with 'MA'. Type in 'MAY' and we only get one hit, and it is for the gene alias 'MAYA'. In other words, not gene names starting with 'MAY' were found in the database based on the names populated from RefSeq and Gencode, but there is an alias starting with 'MAY' that points to the GenCode gene MNX1-AS1.

Screenshot 2024-06-17 at 5 24 06 PM

@tonydisera
Copy link
Collaborator

tonydisera commented Jun 17, 2024

@AlistairNWard, you bring up an interesting point about the order of the gene names returned. In the new release, the search looks for a match based on the beginning of the gene name. Hopefully, this isn't too restrictive. If we returned all genes that match the term anywhere in the gene name, then the order of the genes returned is more relevant. And I agree with you. The user would want to see the 'closest' matches first. And the new behavior does satisfy this. For example, if the user searches on gene TAT, that exact match appears first in the list:
Screenshot 2024-06-17 at 5 39 44 PM

And on a related note, gene list order is dictated by the gene name, not the gene alias. So, for example, if the user enters MGC445, there are not any genes with this name, but there are gene aliases that start with this term. Notice that the genes are ordered alphabetically by the gene name (the name designated by RefSeq or Gencode).
Screenshot 2024-06-17 at 5 43 45 PM

There are many nuances to the gene search, so please feel free @AlistairNWard to play around with the new functionality on https://stage.gene.iobio.io. Overall, I'm happy with the new behavior, but my guess is that it may still need some refinement. Hopefully, this gets us closer to a solid gene search.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants