Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule Builder / Uploader - Future Steps #5381

Closed
6 of 9 tasks
jmchilton opened this issue Jan 24, 2018 · 5 comments
Closed
6 of 9 tasks

Rule Builder / Uploader - Future Steps #5381

jmchilton opened this issue Jan 24, 2018 · 5 comments

Comments

@jmchilton
Copy link
Member

jmchilton commented Jan 24, 2018

Smaller tweaks and issues are covered by #3916. This outlines some bigger directions I'd like to take the uploader / collection builder.

  • View / Edit Source
    • There is a JSON description of the rules - this is so that the same operations can be applied on the backend. Though the backend doesn't yet implement this and it would still be nice to expose this in the GUI I think. It'd be nice to be able to see these rules as an advanced option from the GUI and to paste sets of rules in. This would be helpful for users who repeatedly upload data in the same format or for sharing along with sample sheets / URL lists in training or production settings.
    • Save the set of rules used for an upload and expose to the user in some fashion.
  • Apply Collection Builder to a Whole History
    • It should be possible to dump a whole history into the rule builder - all non-deleted, non-hidden datasets. Since we don't need all the metadata the history panel view does and the table seems to scale better - we could use this as a best practice to get around issues with pagination. Would provide a pretty stellar workaround to History - "For all selected" doesn't work across pages #4350. This would be a better user experience for selecting datasets and seeing the collection being built IMO.
    • Should be possible then to add columns of interest from the collected metadata as well (file type, info, dbkey, etc...) to help filter and such.
  • Apply Collection Builder to Collections.
    • This request is probably going come very quickly as people need to deal with nested collections. We need to be able to sort of reshape them during an analysis - pool things in one part and unpool them in other parts, filter on identifiers in different parts of the collection, split the nested collections into sub-collections in various ways. I had originally planned to do this as a generalization of the expression-based grouping tool rejected during the original collection operation PR (953cdf2), but I think this rule-based language and GUI could really do all those things with very quickly based on what is there.

      We could load all the datasets of the collection into into the builder, have columns for every layer of list identifier and order index, grab extra HDA metadata already available and future collection metadata if we collection. Researchers could then interactively re-shape the collection, merge and split identifiers, filter, break it up into separate collections, etc....

    • To use this collection reorganization application in a workflow - we could allow dumping the JSON describing the rules out and re-using it from a stand-alone collection operation tool.

    • We could track these operations in the database and capture the seemingly interactive re-organizations as tool steps using this need tool when being applied to workflows during history extraction. What will feel like interactive re-organization of the collection will actually be the user writing a batch program executable on the backend.

  • Data Provider Access to Datasets
    • Currently the front-end attempts to parse tabular data in an ad-hoc manner. I looked into this and it seems all the backend data providers only provide numeric columns based on the table metadata (presumably developed for things like charts). It would be great if a new data provider was added that provided the full table structure as JSON - numeric or not. We could then offload a lot of the complicated CSV parsing logic to the backend in many cases.
  • Implement this as inputs for records Implement Record Types (generalization of tuple types) #3834 and allow creating collections of things other than files - such as parameters that match files (e.g. common-workflow-lab@a44455a).
  • Allow re-use of previously used rules - suggested by @mblue9 on Rule-based Uploader / Builder #5365.
@mblue9
Copy link
Contributor

mblue9 commented Apr 1, 2018

I see you've got a point here called "Apply Collection Builder to a Whole History" which sounds very exciting. Exciting, because the vast majority of our users' data is in Shared Data, as that's where the data from our sequencers is e.g.: Shared Data > fastq > user.name > analysis.type > run.id > samples

So currently our users typically need to send the sample fastqs from the run.id folder to a history and then build into collections. The building of each collection can be a bit tedious when there're many groups so wondering if that's something that this rule builder could soon be used for instead?

Or if there'll be another way to create collections from Shared Data e.g. this "Integrate with library import folder" I see here mentioned here: #5822

@jmchilton
Copy link
Member Author

@mblue9 So the data is loaded into data libraries already right? You should be able to export collections right from library folders to histories in 18.01 (#5080) - have you tried that yet? Seems that would be even easier in simple cases - or am I not understanding what you are hoping for?

What I mentioned here is loading folders not yet in Galaxy but on disk (e.g. user_library_import_dir) right into history collections, like one can do with FTP directories in the initial PR that was merged. In your case though it sounds like the data is already in Galaxy so we should work on getting the library to collection connection working well. For simple collections 18.01 has a new export option but I should try to make sure it works with the rule builder for 18.05.

@mblue9
Copy link
Contributor

mblue9 commented Apr 3, 2018

So the data is loaded into data libraries already right?

Yes

You should be able to export collections right from library folders to histories in 18.01 (#5080) - have you tried that yet?

No I hadn't (had missed that option). Our production instance with the Shared Data is still 17.09, we're working on upgrading to 18.01. But I did attempt to try it out just now in https://test.galaxyproject.org

Seems that would be even easier in simple cases

I don't think so, or am I'm missing the 'quick' way here?

As, even with the ability to send to Collections from the Shared Data, that looks like it's still going to require a lot of clicks, much more than the Rule Builder.

For example, if I have 6 groups with 3 reps each (not an uncommon scenario and some people have many more e.g. multiple treatments and timepoints), then in the Shared Data I have to tick to select the 3 reps (3 clicks), then click "To History", click "as a Collection", click "Continue". And then repeat that for all the groups. So looks like that would require ~36 clicks in total, just to create the collections (and that's not including adding the hashtags).

Whereas with the Rule Builder, I can paste in a samplesheet, select "Upload data as Collections", click "Build", add the few column Definitions, then click "Upload" and.....voila.... all collections automatically created (and looks as simple to create collections for 50 groups as it is for 6). That's a much easier and nicer way imho. And after trying the amazing new Rule Builder how can I now go back to all the clicking 😞

@jmchilton
Copy link
Member Author

I see, I had not considered creating many collections at once. Thanks for the clarifications. I'll try to get the rule builder hooked into the library stuff - it shouldn't be too bad, I just need to find some time.

@jmchilton
Copy link
Member Author

Mostly done - I'll create a new issue for "Apply Rules" to a history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Collections
  
Big Picture
Development

No branches or pull requests

2 participants