Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk access to documents using bulk input #69

Open
kltm opened this issue Feb 26, 2014 · 36 comments
Open

Bulk access to documents using bulk input #69

kltm opened this issue Feb 26, 2014 · 36 comments

Comments

@kltm
Copy link
Member

kltm commented Feb 26, 2014

The bulk download of annotations with gene product input is an occasionally used feature of AmiGO 1.x that I suspect might be missed.

The easiest implementation would be a series of queries to the GOlr server, with the results hashed on the perl side as we go to get rid of dupes.

@kltm
Copy link
Member Author

kltm commented Mar 28, 2014

@kltm
Copy link
Member Author

kltm commented Apr 14, 2014

I think this would be better done with the batching on the server side--more robust and we wouldn't have to deal with sync, as well as giving easier access in a more traditional way (bulk remote clients and bots). I'll have to check if the perl API is up for it. I would also have to consider if the linking library was sufficient in the perl API.

@kltm
Copy link
Member Author

kltm commented Apr 18, 2014

@kltm kltm added this to the 2.1 milestone Apr 24, 2014
kltm added a commit that referenced this issue May 2, 2014
…ible; wired in bulk search stubs modelled on live search (menus, etc.); #69
kltm added a commit that referenced this issue May 2, 2014
…ible; wired in bulk search stubs modelled on live search (menus, etc.); #69
@kltm
Copy link
Member Author

kltm commented May 3, 2014

Looks like at least part of the functionality is going to be frustrated by berkeleybop/bbop-js#17.
A separate accordion widget is a bit of a thing on its own, but I'm not sure how to proceed with complete functionality without it...

@kltm
Copy link
Member Author

kltm commented May 5, 2014

Creating something that makes sense in the context of AmiGO 2 is taking rather more work than expected, mostly due to a digression with berkeleybop/bbop-js#17, which itself is more work than expected. Thinking about bumping this to its own milestone and getting what's in there out as fast as possible.

kltm referenced this issue in berkeleybop/bbop-js May 5, 2014
…ough a lot of niceties in the ui and bs3 conversion still needs to be done; #17; kltm/amigo#69 #26
@kltm kltm modified the milestones: 2.2, 2.1 May 5, 2014
@kltm
Copy link
Member Author

kltm commented May 5, 2014

[Will edit this list in place.]
I would guess another very full day or two days for the JS frontend; still need:

  • live_filters spinner
  • new bs3 filter_shield (can use bs3 progress as spinner here)
  • search field selection for IDs
  • results selection
    • in-browser results in light form (a la the primitive search's handler)
    • TSV download (proxied through the perl)
      After all of that, the backend work should be easier:
  • clumping (or not) of the queries
  • results production
    I want to say three days total, but am a little leery after how much time had to be sunk into the accordion.

@kltm
Copy link
Member Author

kltm commented May 22, 2014

@kltm
Copy link
Member Author

kltm commented Aug 29, 2014

After some discussion with @cmungall and @hdietze, will explore (again?) if batching is really necessary. And the perl client?

To get this out the door faster, it may be that we should just switch to POST and set the bar fairly low, putting all of the responsibility on the JS client to compose a large query and then setting the limits to something reasonable through experimentation.

@kltm
Copy link
Member Author

kltm commented Aug 29, 2014

Well, looking at the issues and requests above again, I think I can explain the rationale better this time around.

One set of issues that users are having seem to be along the lines of: I want to put in a bunch of term ids and get a listing of those terms I can work with (link, download, etc.). This is essentially the way the current bulk interface prototype is heading, and would probably be workable for non-huge numbers by composing large queries (will require some new stuff in the manager, but probably not too bad).

That said, this is not the actual issue open here. The other set of issues (this issue) is along the lines of: I want to put in random gps and get annotation data; I want to put in terms and get gp (annotation) data. This would require the Solr equivalent of an RDB join, and I think can only practically be done with an initial query to get the "key ids" from one doctype and then using another query to get the wanted data from the target doctype.

In large or complicated cases, maybe not something that one would want to handle in a single pass from the client (time), and breaking it up (a la the matrix tool) is rather unwieldy in practice. Reading this thread though, it seems my final deciding factor for wanting to do it on the server was to make it easier to directly create links and kick-ins for the service, so that these bulk pages could be treated in much the same way that term and gp pages are currently treated.

Shelving those reasons for the moment though, if one was willing to sacrifice easy kick-ins and the single-step satisfaction of going straight from gp symbols to terms, a possible workable interface would be:

  • nice bulk search: live filters, search fields, and bulk input (essentially the current bulk mock-up)
    • maybe needs a download field selector too?
  • clicking search will produce the results (maybe just a preview, 100?)
  • these results will include buttons or links (a la current live search pages) that have options like: download, get direct annotations for these IDs, get all annotations for these IDs

One thing would be lost in an interface like this, at least in the beginning, would be filtering on the second step (e.g. with these term ids, give me all annotations with this evidence); but you can image either extending this or feeding it into itself (we'd need kick in to get things like TE results links to work) so there somebody could take multiple steps through the different document types, filtering and joining with the next one. Sort of a shopping cart of the moment.

If this makes sense, I think this might be a way to go for now: we'd get some parts running immediately, and could grow it out into other needed functionality (at the cost of single steps). It would also mean that the perl bits could be ignored for a while longer (possibly allowing us to stall long enough to get rid of them completely). If it sounds right, I'll add another issue for basic bulk search, and let this be the second step.

@cmungall
Copy link
Member

Re: joins - the golr documents are denormalized and should not require joins for non-boutique queries. E.g. fetching annotations or entities by term ids would be achieved via the closure field. It may be the case that performance would be poor, but no join required.

@kltm
Copy link
Member Author

kltm commented Aug 29, 2014

Talked again; the current plan is to 1) get the bulk download working and 2) get more gp information into the association doctype (which should meet most of the needs for most of the users without needing complicated "join" code). Will change title to reflect this.

@kltm kltm changed the title Bulk download of annotations with gene product input Bulk access to documents using bulk input Aug 29, 2014
kltm added a commit that referenced this issue Sep 16, 2014
@kltm kltm removed this from the 2.3 milestone Aug 14, 2015
@kltm kltm modified the milestones: 2.4, 2.5 Mar 2, 2016
cmungall added a commit that referenced this issue Apr 28, 2016
Addresses some parts of #69

Removed dead link to wiki (in any case all help should be inline).

Changed text to remove reference to personalities.

Still need to better explain how this works
@cmungall
Copy link
Member

Some comments on current status of bulk search (not sure if this is redundant with some of the extensive info above).

GO ID use case

  • This is v useful when you have a set of GO IDs
  • It's not completely clear what box to click on the right. "GO class (direct) (annotation_class)" is pretty opaque. And it doesn't do what is expected. It appears to use the closure (which is actually what most users want, even if they don't have the language to express it)
  • Arriving at a list of GO IDs is obviously a challenge for many users, be awesome when this is hooked up, e.g. shopping carts
  • Entering in GO term labels is fun, but again most people don't know the strings by memory, and getting the required list may be a challenge for some. Hooking up to the SG annotator would be awesome here.

Gene use case

  • there are checkboxes for bioentity label and bioentity name, but none for the ID. Most will have IDs
  • Guaranteed that all kinds of mad IDs will be entered. If we are serious about this functionality we should be sure we have similar behavior for term enrichment (the gene use case can be seen as a degenerate case of TE)

Genercity

It's great that this is driven by the schema metadata... from a CS perspective. But I think the experience will be alienating for a user here. Perhaps a more fruitful long term approach would be grebe with the ability to enter lists?

Breaking resolution into a separate step

There is no way for the user to see which subset of entered IDs resolve. Really resolution and bulk query are separate concerns. There are many cases where we might want to plug in the resolution part (e.g. TE) and many places where we want to include the bulk query part.

Text annotation can be seen as a special case of resolution. For example, try the default query here:
https://monarchinitiative.org/annotate/text

You should end up with a box that says "35 terms found" and a button to search with these 35.

Another scenario is where the user enters IDs one at a time, e.g.
https://monarchinitiative.org/analyze/phenotypes/

Both text annotation (for terms or genes) and autocomplete based list building are equally useful for GO

@kltm
Copy link
Member Author

kltm commented Apr 28, 2016

(The plain ID is just "bioentity"--it's being driven off of the metadata for other displays.)

There is a bit of question of scope here. There is the unarguably power tool aspect here--this obviously needs inside knowledge--but it does possibly fill a niche as is. It was relatively easy to create given that it is bootstrapped from the metadata, but it is obviously not particularly useful to most users (hence it never graduating from labs).

I think it would be good to determine if the current tool, or something similar, has any prospects (it was made for an initial narrow use case) and then try and work out from there. If a grebe-ified version would be more useful to users, we should probably just branch it off and try again now that the troubling parts work better.

Once label and ID resolution is brought in, I think we should scrape and try for a new tool--there is just so much packed into that there is unlikely much that could be salvaged from here.

The cart chugging along again would be so nice...

@kltm
Copy link
Member Author

kltm commented May 23, 2016

Still interest here: http://jira.geneontology.org/browse/GO-1224
We really should just go ahead and add this at some point, or something; @mcourtot does the tool as it stands look usable to you for some use cases?
http://tomodachi.berkeleybop.org/amigo/bulk_search/annotation
http://tomodachi.berkeleybop.org/amigo/bulk_search/ontology
http://tomodachi.berkeleybop.org/amigo/bulk_search/bioentity

@mcourtot
Copy link

mcourtot commented May 24, 2016

Hi Seth,

I had a quick look and it is quite unclear what the categories are (in the attached screenshot); what is the difference between the first and fourth checkboxes for example?
screen shot 2016-05-24 at 11 38 34

Also re download, it would be good to have 'download as GAF, GPAD, txt or else instead of the columns choice maybe? This is a bit overwhelming.

Finally, when searching for GO IDs, QuickGO allows to search for the exact terms, or for their terms and descendants (the latter being the default, typically desired, behaviour), while AmiGO seems to search only the exact ID.

In short, the functionality seems to be pretty much there, but the UI is a bit too complex IMHO.

Cheers,
Melanie

@kltm
Copy link
Member Author

kltm commented May 24, 2016

There could be some simplification added (e.g. GAF download on the annotation bulk download), and certainly better explanatory text for the different fields, but fundamentally the reason we essentially get these bulk widgets and pages for "free" is that they run right along the software patterns that are used underneath.

@vanaukenk
Copy link

There is still interest in this feature:
http://jira.geneontology.org/browse/GO-1277

@kltm
Copy link
Member Author

kltm commented May 14, 2018

talking to @hattrill , for the annotation search we should defualt to checked boxes for going from gene names/ids to terms.

@kltm
Copy link
Member Author

kltm commented Jul 12, 2018

Note that these exist in a hidden "live" form:
Gene/product/bioentity: http://amigo.geneontology.org/amigo/bulk_search/bioentity
Annotation: http://amigo.geneontology.org/amigo/bulk_search/annotation
Ontology term/class: http://amigo.geneontology.org/amigo/bulk_search/ontology

@pgaudet
Copy link

pgaudet commented Jul 12, 2018

Nice ! Can we add links from the 'Search' menu?

@pgaudet
Copy link

pgaudet commented Jul 12, 2018

Although you might want to remove 'This should not be displayed (bioentity_internal_id)' ...
(although it's also in the AmiGO download)

@kltm
Copy link
Member Author

kltm commented Jul 12, 2018

@pgaudet This feature is still in development--this is still the same as shown at the meeting.

@ValWood
Copy link

ValWood commented Jul 13, 2018

did not work with input
SPAC23D3.14c
SPCC63.02c
SPAC4F10.02
SPAC3H5.04
SPBC359.03c
SPBC2D10.18
SPAC3F10.11c
SPBC359.05

@kltm
Copy link
Member Author

kltm commented Jul 13, 2018

Hm. Looking at just "SPBC359.05", I couldn't find it in the "normal" AmiGO either. Is it possible we currently don't have those identifiers? Could you give me an ID for "SPBC359.05"?

@ValWood
Copy link

ValWood commented Jul 13, 2018

yes, thinking about it I didn't see yeast either when I was looking for C. elegans, so it was probably the same issue...

@kltm kltm removed this from the 2.5 milestone Sep 13, 2018
@ukemi
Copy link

ukemi commented Sep 28, 2018

[edited by @kltm ]

Testing: Gene/product/bioentity: http://amigo.geneontology.org/amigo/bulk_search/bioentity

  1. The search should be made case insensitive. Using the Gene/product (bioentity_label) and searching on Trib3 should return the human TRIB3 gene as well as mouse and rat.
  2. To make things clearer, I would relabel the choices:
  3. Gene/product (bioentity)----Gene Identifier using a prefix:identifier format (eg MGI:MGI:1345675 or PomBase:SPCC63.02c)
  4. Gene/product (bioentity_label)---- Gene Symbol (eg Trib3)
  5. Gene/product name (bioentity_name)---- Gene or protein name
  6. This should not be displayed (bioentity_internal_id)------- get rid of this choice
  7. Synonyms (synonym)----- Synonyms
  8. Involved in (isa_partof_closure_label)----- 'GO term' without genes that are annotated to 'regulation of GO term' I tried 'heart development' and 'limb development' no results. Also tried GO identifiers with no results returned.
  9. Inferred annotation (regulates_closure)---- 'GO term' including genes annotated to 'regulation of GO term'. I tried GO:0043405 and got back a list of genes that don't make sense.
    Inferred annotation (regulates_closure_label)---- What is the difference between this and the choice above this?
  10. PANTHER family (panther_family)---- This is useful, but doesn't seem to be working
  11. PANTHER family (panther_family_label)-----This is useful, but doesn't seem to be working
  12. Organism (taxon_label)----- How useful is this? If it works, won't it return all the genes for a given organism?

@kltm
Copy link
Member Author

kltm commented Sep 28, 2018

@ukemi to make it easier, I've numbered your list for reference.

  1. Hm, that's a bit nasty--thank you for catching it. I've spun it out into its own ticket: Bulk search's search narrowing fails due to missing "*_searchable" #540 . There is a possibility this will not be ready before the meeting; I'll dig in later today and see what the options are.
  2. n/a
  3. Changes to these descriptions will change them uniformly in all AmiGO displays and hovers (e.g. hovering over a facet/filter in the main annotation search); would you be okay with that?
  4. As 3 + include the "Trib3" example?
  5. As 3
  6. note: strike
  7. As 3
  8. Hm...would you find it acceptable to not have free-form entry of GO term labels, but just IDs? The mechanism I used here is not great, but I don't see much way around it for the moment (suggest: remove this not necessary for functionality); this is tangentially related to the issues in 1.
  9. Well, fundamentally, the idea is that we have different closures available and this allows choice in search; it practice, maybe it would make more sense to only offer the one that is currently default in AmiGO (regulates)? Either way, looking at "GO:0043405", I get the same results as I do with http://amigo.geneontology.org/amigo/search/bioentity?q=*:*&fq=regulates_closure_label:%22regulation%20of%20MAP%20kinase%20activity%22&sfq=document_category:%22bioentity%22 , and not a million miles off of the annotation data it is derived from http://amigo.geneontology.org/amigo/search/annotation?q=*:*&fq=regulates_closure:%22GO:0043405%22&sfq=document_category:%22annotation%22
  10. This works with "full" identifiers, e.g. "PANTHER:PTHR24249". As 3 and maybe 9.
  11. This is the same issue as 8, etc. and I would suggest removal (see below)
  12. The idea behind 12 is to give people who are using a "bag of symbols" a way of filtering them down to the species that they want if there are symbol collisions

Okay, you've found a lot of interesting quirks that may or may not be showstoppers depending in the use case we're attacking.
In my original vision, this is a tool for people who already know what they want: e.g. all annotations to species X to term set [A, B, C], all terms to symbol set [D, E, F]. Looking at this again, the more flexible this system its, the more quirky/weird/unusable it gets--it gets very hard to differentiate free-form strings in this context and have the end user predict what they may get back in certain cases.
Would this tool still be useful if it restricted itself to dealing with just symbols and identifiers (entities that are single token strings)?

@kltm kltm added this to TODO in AmiGO 2.7 via automation Oct 23, 2018
@kltm kltm moved this from TODO to In progress in AmiGO 2.7 Oct 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
AmiGO 2.7
  
In progress
Development

No branches or pull requests

7 participants