galaxyproject / galaxy Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve tool panel search #2272
Comments
|
@erasche since recently Galaxy now searches tool IDs I think improvements might be made regarding the interchangeability of ' ', '_', '-' |
|
@martenson that's great! +1 for allowing users to substitute in ' ' for the _. I know I look for my tools by ID sometimes and fail to find them. |
|
+1 to that, requiring users to know |
|
@martenson RFC: Things I would like to see indexed and available to search, along with my feelings on their boosts:
I just find myself frustrated when I cannot find the tool I want or the results are very limited because of what is searched upon. Of course, I do not know what the state of 16.07/dev is, have not gotten there yet. |
|
@erasche we have these boosts on Main and these are the defaults |
|
@martenson hey that's most of the things I need. In that case, then it would be nice to have more space to display this and where the search actually "hit". Apologies, have not been following along with this stuff closely enough to make informed comments. |
|
we have the 'hit' information but I did not figure out a good place to display it - related to the limited canvas |
|
I would add in that it would be nice to have the underlying tool (binary) name be part of the search, if wrapped under a slightly different tool name or short label of some type. Related utilized/dependent binaries would be included in this. (lower "boost" probably) |
|
xref #1084 |
|
from @erasche https://usegalaxy.eu/api/tools?q=compute+an+expression - 0 results |
|
Another concrete issue: https://usegalaxy.eu/api/tools?q=peakachu returns 2 results, neither are displayed on frontend. client issue. |
|
@erasche I cannot reproduce |
|
Firefox on linux, cannot repro in chrome. |
|
two subsequent searches for ["toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.1", "toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.0"]["toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.1", "toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.0", "toolshed.g2.bx.psu.edu/repos/rnateam/peakachu/peakachu/0.1.0.2"] |
|
@erasche I also can reproduce ~50% of the times on the UI, on both Firefox and Chrome on Linux. One of the web handler hasn't reloaded the toolbox probably. |
|
xref new issue for the display bug: #7238 |
|
another search term returning unexpected results: browser might not matter, same results using chrome or safari under mac osx (but didn't test firefox)
|
|
another search term:
It works Tries made with Firefox. I discover boosters! What can I set in order to find a result as usegalaxy.eu? |
|
@FredericBGA .eu's boosts are here https://github.com/usegalaxy-eu/infrastructure-playbook/blob/master/group_vars/gxconfig.yml#L1076 but they're pretty aggressive / strange compared to other sites' |
@erasche thank you! The link in Martin post above is broken. I will try with something between default and .eu |
|
@FredericBGA we have this on Main atm We should probably experiment with I created a PR to mimic EU and enable ngram too: galaxyproject/usegalaxy-playbook#228 |
|
We use |
|
thank you all for sharing your config with me! |
Wonder if Main would benefit from that much higher boost, specifically. Searches are still a bit unpredictable and result too limited imho. martin probably is on that already... is not new and we've tried a few variations already but still could use some tuning. Has to be frustrating to search for a tool and not find it -- as the stats above he posted backup. |
|
a question: is there really no way to express an AND between search terms? |
|
@wm75 not at the moment, can you please provide examples of searches that don't behave as you'd expect? |
|
xref: #10030 |
|
"UCSC main" is unfindable on EU: https://usegalaxy.eu/api/tools?q=ucsc+main doesn't include ucsc_table_direct1, but it does include 150 other things. @bgruening It does on .org, but not nearly the top hit for a search on the exact tool title |
|
on EU searching |
|
right? saw that one too, and |
|
@hexylena maybe https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/webapps/galaxy/config_schema.yml#L1955 could improve things? |
|
Possibly! I just expected the tool_name boost to have the biggest effect. I would love to debug the internals sometime, and see what scores x boost are being returned for each of these results that are doing 'better' than the direct text match. Like, if those are returning first, clearly they say "ucsc main" dozens of time in their descriptions or something? |
|
@mvdbeek neat! How did you obtain that? |
|
Ahh ok, wondered if it was a secret api I was missing. |
|
so I booted up a copy of the app against EU because I always feel worried about reproducing locally with the v. different toolboxes. This looks odd to me: why does only or without trying out the individual fields of a search, seems like description is a negative in this case:
feels very odd that vcf scores higher. |
|
Some more debugging So that's matching maintaing (hmm. I get why but. surely that should score lower than an exact word boundary match?) and doc 2546 which hits both main + ucsc is indeed our tool: aha (ish) orgroup changed from 0.1 to 0.9 doesn't produce a big different. Oddly I've specified but old_id isn't anywhere there? It's when so they're all matching on the term So constructing my own weightings vs so name boost of 2 is worse than a name boost of 1? ucsc_table_direct1 goes from 8 to 5? Swapping the weights for name=1, help=2 like, are boosts inverse? Fixing description to 40, name=1 returns ucsc_table_direct1 with the same score but vcf_to_maf_customtrack1 is finally gone? Got ucsc main above for the first time: With... both terms boosted to 0.1. This seems like black magic? |
|
Boosts shouldn't be inverse: https://whoosh.readthedocs.io/en/latest/schema.html?highlight=boost#field-boosts (I am sorry I do not have time atm to dive into this) |
|
My thought too after reading the doc!! but, it definitely seems to be behaving like it is? it's the only time I can get ucsc_table_direct1 to have a high score (10+) is whenever I do name=0.1, desc=0.1, rest=1 |
|
I am circling around a bug in whoosh's MultiWeighting class, which alters the scores in a non-sense way. Haven't finished this thoug. |
|
Compare the results for 'snpeff eff': 0.1name/desc → 25 vs 10.0 name/desc → 19 edit: sorry, had an old help boost. Or the query "select lines that match an expression" 0.1/0.1 → Grep1 = 40.0, 1st place 40/40 → Grep1 = 16, 2nd place |
|
@mvdbeek did you have any more information about what that issue was with whoosh? |
|
So we deployed the new boosts on eu, to see how those work. I.... think they're a huge improvement? I was discussing with @shiltemann and her test query was 'group', expecting the full match of Grouping1 to be found. We need some way to rank by "this term or terms constitutes the entire name field", but I'm not sure how we'd accomplish that given that we currently break into individual words :/ |
|
@bgruening provides 'tail-to-head' which doesn't return useful things (but don't know about before.) and same for @wm75 provides
|
|
Not sure if this is the right place to report issues with the search, but trying to find |
|
Putting this here after some interactions at CoFest. Search functions have definitely improved with updates. There are still cases where the search results could be improved. I think that it is intentional for the results to include potentially less relevant hits, to assist with tool discovery and to help with spelling errors/choices. But I think if the weighting more obvious biased the tool name over the description, this would help a lot with generic search terms. Our users at the CPT Galaxy would prefer a stricter (smaller) search result, and we don't even have as many tools as the larger public Galaxy's! Perhaps this is already implemented but it isn't terribly transparent how the tool search works and it is hard to pick out logical patterns in the return list order by eye (tool name relevance, alphabetical, popularity/use?) For example, in our CPT Galaxy where we're running 20.05 (I realize that this does not have all the latest fixes discussed in this issue, @hexylena ngram searching is enabled), when I searched fasta looking for a tool called Remove FASTA Sequences from .gff3 File, it is 19th in the list and various tools with that string NOT in the name are before it. At usegalaxy.org when I search align, I get this, At usegalaxy.eu when I search for genome, the list includes the assembly tools like Bowtie2 and Spades pretty far down into the results. That is still the case when I search for genome assembly. Maybe all these issues can be ameliorated with better tool metadata. New users, and users doing new analyses, will search for tools that they don't necessarily know the names of. Tool organization is pretty good, and while it is great to have many tool options, having very many tools also makes it hard to discover new ones without consistent help from the search function. Perhaps a ‘close match’ and ‘related match’ scenario to vastly improve the overall user experience, and make it easier to discover just the right tools? |












reported by @jennaj
given the number of tools on Main the results of search needs to be better, mainly:
I am trying to address the first two (for Main) with: galaxyproject/usegalaxy-playbook#19
The text was updated successfully, but these errors were encountered: