Bulk access to documents using bulk input #69

kltm · 2014-02-26T18:37:50Z

The bulk download of annotations with gene product input is an occasionally used feature of AmiGO 1.x that I suspect might be missed.

The easiest implementation would be a series of queries to the GOlr server, with the results hashed on the perl side as we go to get rid of dupes.

kltm · 2014-03-18T15:52:47Z

Similarly, there has been a user that expressed interest in the possibility of creating term ID lists:

http://jira.geneontology.org/browse/GO-360
http://amigo1.geneontology.org/cgi-bin/amigo/search.cgi?search_constraint=term&termfields=acc&ont=all&gptype=all&speciesdb=all&taxid=all&evcode=all&action=new-search&search_query=GO:0000166%0A%0DGO:0001893%0A%0DGO:0005080%0A%0DGO:0005625%0A%0DGO:0005819%0A%0DGO:0005975%0A%0DGO:0006417

kltm · 2014-03-28T01:13:25Z

Also:
http://jira.geneontology.org/browse/GO-359

kltm · 2014-04-14T23:10:33Z

I think this would be better done with the batching on the server side--more robust and we wouldn't have to deal with sync, as well as giving easier access in a more traditional way (bulk remote clients and bots). I'll have to check if the perl API is up for it. I would also have to consider if the linking library was sufficient in the perl API.

kltm · 2014-04-18T17:56:22Z

http://jira.geneontology.org/browse/GO-389

…ible; wired in bulk search stubs modelled on live search (menus, etc.); #69

kltm · 2014-05-03T00:29:57Z

Looks like at least part of the functionality is going to be frustrated by berkeleybop/bbop-js#17.
A separate accordion widget is a bit of a thing on its own, but I'm not sure how to proceed with complete functionality without it...

kltm · 2014-05-05T03:48:37Z

Creating something that makes sense in the context of AmiGO 2 is taking rather more work than expected, mostly due to a digression with berkeleybop/bbop-js#17, which itself is more work than expected. Thinking about bumping this to its own milestone and getting what's in there out as fast as possible.

…ough a lot of niceties in the ui and bs3 conversion still needs to be done; #17; kltm/amigo#69 #26

kltm · 2014-05-05T23:37:33Z

[Will edit this list in place.]
I would guess another very full day or two days for the JS frontend; still need:

~~live_filters spinner~~
new bs3 filter_shield (can use bs3 progress as spinner here)
search field selection for IDs
results selection
- in-browser results in light form (a la the primitive search's handler)
- TSV download (proxied through the perl)
  After all of that, the backend work should be easier:
clumping (or not) of the queries
results production
I want to say three days total, but am a little leery after how much time had to be sunk into the accordion.

…o#69

kltm · 2014-05-22T20:55:14Z

http://jira.geneontology.org/browse/GO-429

kltm · 2014-08-29T00:25:28Z

After some discussion with @cmungall and @hdietze, will explore (again?) if batching is really necessary. And the perl client?

To get this out the door faster, it may be that we should just switch to POST and set the bar fairly low, putting all of the responsibility on the JS client to compose a large query and then setting the limits to something reasonable through experimentation.

kltm · 2014-08-29T06:37:02Z

Well, looking at the issues and requests above again, I think I can explain the rationale better this time around.

One set of issues that users are having seem to be along the lines of: I want to put in a bunch of term ids and get a listing of those terms I can work with (link, download, etc.). This is essentially the way the current bulk interface prototype is heading, and would probably be workable for non-huge numbers by composing large queries (will require some new stuff in the manager, but probably not too bad).

That said, this is not the actual issue open here. The other set of issues (this issue) is along the lines of: I want to put in random gps and get annotation data; I want to put in terms and get gp (annotation) data. This would require the Solr equivalent of an RDB join, and I think can only practically be done with an initial query to get the "key ids" from one doctype and then using another query to get the wanted data from the target doctype.

In large or complicated cases, maybe not something that one would want to handle in a single pass from the client (time), and breaking it up (a la the matrix tool) is rather unwieldy in practice. Reading this thread though, it seems my final deciding factor for wanting to do it on the server was to make it easier to directly create links and kick-ins for the service, so that these bulk pages could be treated in much the same way that term and gp pages are currently treated.

Shelving those reasons for the moment though, if one was willing to sacrifice easy kick-ins and the single-step satisfaction of going straight from gp symbols to terms, a possible workable interface would be:

nice bulk search: live filters, search fields, and bulk input (essentially the current bulk mock-up)
- maybe needs a download field selector too?
clicking search will produce the results (maybe just a preview, 100?)
these results will include buttons or links (a la current live search pages) that have options like: download, get direct annotations for these IDs, get all annotations for these IDs

One thing would be lost in an interface like this, at least in the beginning, would be filtering on the second step (e.g. with these term ids, give me all annotations with this evidence); but you can image either extending this or feeding it into itself (we'd need kick in to get things like TE results links to work) so there somebody could take multiple steps through the different document types, filtering and joining with the next one. Sort of a shopping cart of the moment.

If this makes sense, I think this might be a way to go for now: we'd get some parts running immediately, and could grow it out into other needed functionality (at the cost of single steps). It would also mean that the perl bits could be ignored for a while longer (possibly allowing us to stall long enough to get rid of them completely). If it sounds right, I'll add another issue for basic bulk search, and let this be the second step.

cmungall · 2014-08-29T19:15:24Z

Re: joins - the golr documents are denormalized and should not require joins for non-boutique queries. E.g. fetching annotations or entities by term ids would be achieved via the closure field. It may be the case that performance would be poor, but no join required.

kltm · 2014-08-29T20:53:00Z

Talked again; the current plan is to 1) get the bulk download working and 2) get more gp information into the association doctype (which should meet most of the needs for most of the users without needing complicated "join" code). Will change title to reflect this.

Addresses some parts of #69 Removed dead link to wiki (in any case all help should be inline). Changed text to remove reference to personalities. Still need to better explain how this works

cmungall · 2016-04-28T00:45:41Z

Some comments on current status of bulk search (not sure if this is redundant with some of the extensive info above).

GO ID use case

This is v useful when you have a set of GO IDs
It's not completely clear what box to click on the right. "GO class (direct) (annotation_class)" is pretty opaque. And it doesn't do what is expected. It appears to use the closure (which is actually what most users want, even if they don't have the language to express it)
Arriving at a list of GO IDs is obviously a challenge for many users, be awesome when this is hooked up, e.g. shopping carts
Entering in GO term labels is fun, but again most people don't know the strings by memory, and getting the required list may be a challenge for some. Hooking up to the SG annotator would be awesome here.

Gene use case

there are checkboxes for bioentity label and bioentity name, but none for the ID. Most will have IDs
Guaranteed that all kinds of mad IDs will be entered. If we are serious about this functionality we should be sure we have similar behavior for term enrichment (the gene use case can be seen as a degenerate case of TE)

Genercity

It's great that this is driven by the schema metadata... from a CS perspective. But I think the experience will be alienating for a user here. Perhaps a more fruitful long term approach would be grebe with the ability to enter lists?

Breaking resolution into a separate step

There is no way for the user to see which subset of entered IDs resolve. Really resolution and bulk query are separate concerns. There are many cases where we might want to plug in the resolution part (e.g. TE) and many places where we want to include the bulk query part.

Text annotation can be seen as a special case of resolution. For example, try the default query here:
https://monarchinitiative.org/annotate/text

You should end up with a box that says "35 terms found" and a button to search with these 35.

Another scenario is where the user enters IDs one at a time, e.g.
https://monarchinitiative.org/analyze/phenotypes/

Both text annotation (for terms or genes) and autocomplete based list building are equally useful for GO

kltm · 2016-04-28T00:58:12Z

(The plain ID is just "bioentity"--it's being driven off of the metadata for other displays.)

There is a bit of question of scope here. There is the unarguably power tool aspect here--this obviously needs inside knowledge--but it does possibly fill a niche as is. It was relatively easy to create given that it is bootstrapped from the metadata, but it is obviously not particularly useful to most users (hence it never graduating from labs).

I think it would be good to determine if the current tool, or something similar, has any prospects (it was made for an initial narrow use case) and then try and work out from there. If a grebe-ified version would be more useful to users, we should probably just branch it off and try again now that the troubling parts work better.

Once label and ID resolution is brought in, I think we should scrape and try for a new tool--there is just so much packed into that there is unlikely much that could be salvaged from here.

The cart chugging along again would be so nice...

kltm · 2016-05-23T22:08:19Z

Still interest here: http://jira.geneontology.org/browse/GO-1224
We really should just go ahead and add this at some point, or something; @mcourtot does the tool as it stands look usable to you for some use cases?
http://tomodachi.berkeleybop.org/amigo/bulk_search/annotation
http://tomodachi.berkeleybop.org/amigo/bulk_search/ontology
http://tomodachi.berkeleybop.org/amigo/bulk_search/bioentity

mcourtot · 2016-05-24T10:45:03Z

Hi Seth,

I had a quick look and it is quite unclear what the categories are (in the attached screenshot); what is the difference between the first and fourth checkboxes for example?

Also re download, it would be good to have 'download as GAF, GPAD, txt or else instead of the columns choice maybe? This is a bit overwhelming.

Finally, when searching for GO IDs, QuickGO allows to search for the exact terms, or for their terms and descendants (the latter being the default, typically desired, behaviour), while AmiGO seems to search only the exact ID.

In short, the functionality seems to be pretty much there, but the UI is a bit too complex IMHO.

Cheers,
Melanie

kltm · 2016-05-24T18:11:41Z

There could be some simplification added (e.g. GAF download on the annotation bulk download), and certainly better explanatory text for the different fields, but fundamentally the reason we essentially get these bulk widgets and pages for "free" is that they run right along the software patterns that are used underneath.

vanaukenk · 2016-07-08T15:13:29Z

There is still interest in this feature:
http://jira.geneontology.org/browse/GO-1277

kltm · 2018-05-14T14:56:10Z

talking to @hattrill , for the annotation search we should defualt to checked boxes for going from gene names/ids to terms.

kltm · 2018-07-12T17:58:37Z

Note that these exist in a hidden "live" form:
Gene/product/bioentity: http://amigo.geneontology.org/amigo/bulk_search/bioentity
Annotation: http://amigo.geneontology.org/amigo/bulk_search/annotation
Ontology term/class: http://amigo.geneontology.org/amigo/bulk_search/ontology

pgaudet · 2018-07-12T20:16:54Z

Nice ! Can we add links from the 'Search' menu?

pgaudet · 2018-07-12T20:18:19Z

Although you might want to remove 'This should not be displayed (bioentity_internal_id)' ...
(although it's also in the AmiGO download)

kltm · 2018-07-12T20:39:26Z

@pgaudet This feature is still in development--this is still the same as shown at the meeting.

ValWood · 2018-07-13T08:57:21Z

did not work with input
SPAC23D3.14c
SPCC63.02c
SPAC4F10.02
SPAC3H5.04
SPBC359.03c
SPBC2D10.18
SPAC3F10.11c
SPBC359.05

kltm · 2018-07-13T17:13:37Z

Hm. Looking at just "SPBC359.05", I couldn't find it in the "normal" AmiGO either. Is it possible we currently don't have those identifiers? Could you give me an ID for "SPBC359.05"?

ValWood · 2018-07-13T21:44:57Z

yes, thinking about it I didn't see yeast either when I was looking for C. elegans, so it was probably the same issue...

ukemi · 2018-09-28T15:17:23Z

[edited by @kltm ]

Testing: Gene/product/bioentity: http://amigo.geneontology.org/amigo/bulk_search/bioentity

The search should be made case insensitive. Using the Gene/product (bioentity_label) and searching on Trib3 should return the human TRIB3 gene as well as mouse and rat.
To make things clearer, I would relabel the choices:
Gene/product (bioentity)----Gene Identifier using a prefix:identifier format (eg MGI:MGI:1345675 or PomBase:SPCC63.02c)
Gene/product (bioentity_label)---- Gene Symbol (eg Trib3)
Gene/product name (bioentity_name)---- Gene or protein name
This should not be displayed (bioentity_internal_id)------- get rid of this choice
Synonyms (synonym)----- Synonyms
Involved in (isa_partof_closure_label)----- 'GO term' without genes that are annotated to 'regulation of GO term' I tried 'heart development' and 'limb development' no results. Also tried GO identifiers with no results returned.
Inferred annotation (regulates_closure)---- 'GO term' including genes annotated to 'regulation of GO term'. I tried GO:0043405 and got back a list of genes that don't make sense.
Inferred annotation (regulates_closure_label)---- What is the difference between this and the choice above this?
PANTHER family (panther_family)---- This is useful, but doesn't seem to be working
PANTHER family (panther_family_label)-----This is useful, but doesn't seem to be working
Organism (taxon_label)----- How useful is this? If it works, won't it return all the genes for a given organism?

kltm · 2018-09-28T20:25:42Z

@ukemi to make it easier, I've numbered your list for reference.

Hm, that's a bit nasty--thank you for catching it. I've spun it out into its own ticket: Bulk search's search narrowing fails due to missing "*_searchable" #540 . There is a possibility this will not be ready before the meeting; I'll dig in later today and see what the options are.
n/a
Changes to these descriptions will change them uniformly in all AmiGO displays and hovers (e.g. hovering over a facet/filter in the main annotation search); would you be okay with that?
As 3 + include the "Trib3" example?
As 3
note: strike
As 3
Hm...would you find it acceptable to not have free-form entry of GO term labels, but just IDs? The mechanism I used here is not great, but I don't see much way around it for the moment (suggest: remove this not necessary for functionality); this is tangentially related to the issues in 1.
Well, fundamentally, the idea is that we have different closures available and this allows choice in search; it practice, maybe it would make more sense to only offer the one that is currently default in AmiGO (regulates)? Either way, looking at "GO:0043405", I get the same results as I do with http://amigo.geneontology.org/amigo/search/bioentity?q=*:*&fq=regulates_closure_label:%22regulation%20of%20MAP%20kinase%20activity%22&sfq=document_category:%22bioentity%22 , and not a million miles off of the annotation data it is derived from http://amigo.geneontology.org/amigo/search/annotation?q=*:*&fq=regulates_closure:%22GO:0043405%22&sfq=document_category:%22annotation%22
This works with "full" identifiers, e.g. "PANTHER:PTHR24249". As 3 and maybe 9.
This is the same issue as 8, etc. and I would suggest removal (see below)
The idea behind 12 is to give people who are using a "bag of symbols" a way of filtering them down to the species that they want if there are symbol collisions

Okay, you've found a lot of interesting quirks that may or may not be showstoppers depending in the use case we're attacking.
In my original vision, this is a tool for people who already know what they want: e.g. all annotations to species X to term set [A, B, C], all terms to symbol set [D, E, F]. Looking at this again, the more flexible this system its, the more quirky/weird/unusable it gets--it gets very hard to differentiate free-form strings in this context and have the end user predict what they may get back in certain cases.
Would this tool still be useful if it restricted itself to dealing with just symbols and identifiers (entities that are single token strings)?

kltm added enhancement and removed bug (C: surface issue) labels Feb 26, 2014

kltm added this to the 2.1 milestone Apr 24, 2014

kltm added a commit that referenced this issue May 2, 2014

changed internal rep of search personalities to be a little more flex…

47e7cfe

…ible; wired in bulk search stubs modelled on live search (menus, etc.); #69

kltm added a commit that referenced this issue May 2, 2014

changed internal rep of search personalities to be a little more flex…

dd5430f

…ible; wired in bulk search stubs modelled on live search (menus, etc.); #69

kltm mentioned this issue May 5, 2014

make the filter accordion an independent widget berkeleybop/bbop-js#17

Closed

kltm referenced this issue in berkeleybop/bbop-js May 5, 2014

technically a fully working independent accordion filter widget, alth…

97a73e6

…ough a lot of niceties in the ui and bs3 conversion still needs to be done; #17; kltm/amigo#69 #26

kltm modified the milestones: 2.2, 2.1 May 5, 2014

kltm referenced this issue in berkeleybop/bbop-js May 6, 2014

optional in-widget wait spinner; working towards completing kltm/amig…

1e52775

…o#69

kltm added a commit that referenced this issue May 6, 2014

optional in-widget wait spinner; working towards completing #69

809aaf2

This was referenced Aug 13, 2014

Add links to gene product list for sample frequency count in RTE (term enrichment) #140

Open

GO Slim mapper error (slimmer) #143

Closed

Reimplement slimmer functionality #144

Open

kltm changed the title ~~Bulk download of annotations with gene product input~~ Bulk access to documents using bulk input Aug 29, 2014

kltm mentioned this issue Sep 16, 2014

list upload feature in AmiGO #148

Closed

kltm added a commit that referenced this issue Sep 16, 2014

frontend input probably done for #69

7858432

kltm removed this from the 2.3 milestone Aug 14, 2015

kltm modified the milestones: 2.4, 2.5 Mar 2, 2016

cmungall mentioned this issue Apr 28, 2016

Updating inline docs for bulk search #351

Merged

kltm mentioned this issue Jun 6, 2018

Performing multi-gene searches in AmiGO #508

Closed

kltm mentioned this issue Jul 12, 2018

Get list of genes associated with at least one of the input GO ID list? geneontology/helpdesk#142

Closed

kltm mentioned this issue Jul 13, 2018

C. elegans data missing from AmiGO #516

Closed

kltm removed this from the 2.5 milestone Sep 13, 2018

kltm mentioned this issue Sep 28, 2018

Bulk search's search narrowing fails due to missing "*_searchable" #540

Open

kltm added this to TODO in AmiGO 2.7 via automation Oct 23, 2018

kltm moved this from TODO to In progress in AmiGO 2.7 Oct 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk access to documents using bulk input #69

Bulk access to documents using bulk input #69

kltm commented Feb 26, 2014

kltm commented Mar 18, 2014

kltm commented Mar 28, 2014

kltm commented Apr 14, 2014

kltm commented Apr 18, 2014

kltm commented May 3, 2014

kltm commented May 5, 2014

kltm commented May 5, 2014

kltm commented May 22, 2014

kltm commented Aug 29, 2014

kltm commented Aug 29, 2014

cmungall commented Aug 29, 2014

kltm commented Aug 29, 2014

cmungall commented Apr 28, 2016

kltm commented Apr 28, 2016

kltm commented May 23, 2016

mcourtot commented May 24, 2016 •

edited

kltm commented May 24, 2016

vanaukenk commented Jul 8, 2016

kltm commented May 14, 2018 •

edited

kltm commented Jul 12, 2018 •

edited

pgaudet commented Jul 12, 2018

pgaudet commented Jul 12, 2018

kltm commented Jul 12, 2018

ValWood commented Jul 13, 2018

kltm commented Jul 13, 2018

ValWood commented Jul 13, 2018

ukemi commented Sep 28, 2018 •

edited by kltm

kltm commented Sep 28, 2018

Bulk access to documents using bulk input #69

Bulk access to documents using bulk input #69

Comments

kltm commented Feb 26, 2014

kltm commented Mar 18, 2014

kltm commented Mar 28, 2014

kltm commented Apr 14, 2014

kltm commented Apr 18, 2014

kltm commented May 3, 2014

kltm commented May 5, 2014

kltm commented May 5, 2014

kltm commented May 22, 2014

kltm commented Aug 29, 2014

kltm commented Aug 29, 2014

cmungall commented Aug 29, 2014

kltm commented Aug 29, 2014

cmungall commented Apr 28, 2016

GO ID use case

Gene use case

Genercity

Breaking resolution into a separate step

kltm commented Apr 28, 2016

kltm commented May 23, 2016

mcourtot commented May 24, 2016 • edited

kltm commented May 24, 2016

vanaukenk commented Jul 8, 2016

kltm commented May 14, 2018 • edited

kltm commented Jul 12, 2018 • edited

pgaudet commented Jul 12, 2018

pgaudet commented Jul 12, 2018

kltm commented Jul 12, 2018

ValWood commented Jul 13, 2018

kltm commented Jul 13, 2018

ValWood commented Jul 13, 2018

ukemi commented Sep 28, 2018 • edited by kltm

kltm commented Sep 28, 2018

mcourtot commented May 24, 2016 •

edited

kltm commented May 14, 2018 •

edited

kltm commented Jul 12, 2018 •

edited

ukemi commented Sep 28, 2018 •

edited by kltm