Eject (or split) dataset collection operation #3870

nekrut · 2017-04-04T21:27:37Z

In some cases it is necessary to gain access to collection elements individually. For example, in my ChIP-seq analysis I initially bundle all data (signal and control) together into a single collection to pre-process, map, and post-process. However, when I run MACS it requires me to load signal and control separately. To enable this it would be necessary to have one of these:

a collection operation that allows spliting a list into multiple lists (just like copying datasets: select datasets you need and copy them into a new collection)
an "eject button" on individual datasets when you look inside collection in history (you open a collection, press "eject" and this dataset becomes a new dataset in history)

bgruening · 2017-04-04T21:32:45Z

While I agree on the need for such functionality in a worst case, I do think that in this case the more correct thing would be to create two collections for your signal and control. Imho we should only aggregate files that belong functional together.

nekrut · 2017-04-04T21:33:53Z

then we really need to allow multiple select for collections in tools, so you can run on multiple collections

jxtx · 2017-04-04T21:36:06Z

Record types help a lot here. In my CWL chipseq workflow I have a list of replicates, each of which is a record of treatment and control, each of which is a (optionally paired) fasts. I think this leads to the most natural representation of the workflow. @jmchilton and I have discussed this and he is going to rough out an idea of record types for Galaxy (which would presumably subsume the current "paired" collection.

…

On Tue, Apr 4, 2017 at 5:32 PM Björn Grüning ***@***.***> wrote: While I agree on the need for such functionality in a worst case, I do think that in this case the more correct thing would be to create two collections for your signal and control. Imho we should only aggregate files that belong functional together. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3870 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAE4ZRtAxZak5JBFFpZRHxahPjzbGczpks5rsrb-gaJpZM4MzdJ3> .

nekrut · 2017-04-04T21:37:39Z

Aha, yes, but in the short term splitting collection would be nice

jxtx · 2017-04-04T21:38:37Z

But last resort ;)

jxtx · 2017-04-04T21:40:20Z

(Because it is hard/impossible to do reusably or reproducibly. You don't know which elements are treatment and which are control when you explode a collection interactively, so you can't make a workflow, record types address this)

bgruening · 2017-04-04T21:45:25Z

No clue about the client side, but I think selecting multiple collection at once would help you here more than this last resort tool.

nekrut · 2017-04-04T21:52:17Z

Yes indeed. so multiple select then

jxtx · 2017-04-04T21:54:03Z

Oh, don't close, last resort but still worth having.

…

On Tue, Apr 4, 2017 at 5:52 PM Anton Nekrutenko ***@***.***> wrote: Closed #3870 <#3870>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3870 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAE4ZQcvYtVHI09_mRMSMF555Yhr0AWVks5rsrubgaJpZM4MzdJ3> .

hexylena · 2017-04-06T00:30:48Z

Similar to #740, any solution for this would solve my issue as well :D

Takadonet · 2017-04-12T14:54:09Z

We find having the ability to eject or split would greatly improve our biologist ability to do their work. Specific example would be the SNVPhyl workflow where all samples are used in a single analysis. https://snvphyl.readthedocs.io/en/latest/ , Sometimes the only way to know if one or more samples have to be removed from the collection is at the end of the workflow.

The end user has to then re-make the collection without those samples and re-run. Issue is that sometimes they have to remove dozens or hundreds by hand. Since paging was added (so happy for that!), it makes almost impossible to select all the files again if there is more then 500 files in total.

I don't think the 'eject' tool should be ability in workflow execution but it should be available.

jmchilton · 2017-04-12T14:57:10Z

@Takadonet Can you use the filter failed tool to automate this? Or put another way - how are users selecting these datasets?

Takadonet · 2017-04-12T15:41:32Z

@jmchilton . Based on the output results from either a phylogenomics tree or based on values in secondary dataset. Example be all sample that have less then 60% identity to the reference should be removed.

jmchilton · 2017-04-14T03:27:56Z

@Takadonet Can you implement a tool that will just fail outputs that don't meet these criteria and then use the "filter failed" tool?

If it makes sense for your workflow to have a human involved - that is totally - but I'm always looking for guinea pigs to utilize new workflow functionality.

Takadonet · 2017-04-18T13:14:12Z

@jmchilton Seems to me that both cases would be needed. One case would be where human involvement is used and to me should be the same interface as creating a new collection so it is consistent.

Other case should be in a tool that is similar to the ones already in the base Galaxy codebase. i.e merge collection, unzip, zip etc... No point having a normal toolshed because of the duplication of datasets. Having the new tool available during a workflow execution would be awesome but difficult to implement for sure.

We are always up for being a guinea pigs!

jmchilton · 2017-04-19T15:38:43Z

@Takadonet Good points - I have a PR to add a filtering option that works without dataset duplication here #3940. Hopefully it will be in 17.05 - then all you would need to do is write a tool that looks at whatever metadata is interesting and builds a list of identifiers only of those you wish to keep.

Takadonet · 2017-04-19T18:45:53Z

@jmchilton Probably cherry pick into our current Galaxies ASAP. Got lots of users that would be interested for sure.

alexlenail · 2018-04-05T15:52:31Z

Sorry I'm a little lost between the multiple issues for this issue: What is the current status of being able to run tools on subsets of collections? If that isn't possible, is there a way to "eject" collections into a bunch of unique history items?

eschen42 · 2018-10-29T17:31:04Z

I would like to be able to copy a few datasets from a list of datasets. Specifically, I have a list of over mzML datasets, and I want to extract the dozen that represent the pooled samples. In the History UI, I can choose "Copy Datasets" and choose from the datasets in the history, but when I click on my list dataset so that its contents are revealed and the rest of the history is hidden (i.e., the history pane says "back to (my history)" and "a list with (count) items"), when I choose "Copy Datasets", it shows the datasets in the enclosing history.

Having "eject" would give me a workaround at least. Alternatively, if "Copy Datasets" worked for choosing members from list contents, then copying the members to the enclosing history would have the same effect as eject. Right now my only choice is to download (or find my original files) and upload.

@dannon I thought that it made better sense to comment here than to open a new issue since this seems so closely related.

hexylena · 2019-05-29T10:58:45Z

@nekrut the phrasing in your original post was very interactive, so for this case is it now resolved with #7553?

hexylena · 2020-12-10T18:09:25Z

I think this is mostly solved with the ability to filter by element identifier, and to interactively select in the tool form. I'm going to close this but please let me know if it's still not resolved and we should re-open.

nekrut added area/dataset-collections area/UI-UX labels Apr 4, 2017

nekrut assigned jmchilton Apr 4, 2017

nekrut closed this as completed Apr 4, 2017

nekrut reopened this Apr 4, 2017

nekrut mentioned this issue Apr 5, 2017

Allows to drag-and-drop history items into the content selector #3871

Merged

jmchilton mentioned this issue Apr 5, 2017

Collection Operation: Explode A Collection #3839

Closed

jennaj mentioned this issue Jan 9, 2018

Missing tool? Filter List from contents of a file (Galaxy Version 1.0.0) galaxyproject/usegalaxy-playbook#75

Closed

mtekman mentioned this issue Apr 23, 2018

Dragging data between histories inside a collection #5954

Open

hexylena closed this as completed Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eject (or split) dataset collection operation #3870

Eject (or split) dataset collection operation #3870

nekrut commented Apr 4, 2017 •

edited

bgruening commented Apr 4, 2017

nekrut commented Apr 4, 2017

jxtx commented Apr 4, 2017 via email •

edited

nekrut commented Apr 4, 2017

jxtx commented Apr 4, 2017

jxtx commented Apr 4, 2017

bgruening commented Apr 4, 2017

nekrut commented Apr 4, 2017

jxtx commented Apr 4, 2017 via email

hexylena commented Apr 6, 2017 •

edited

Takadonet commented Apr 12, 2017

jmchilton commented Apr 12, 2017

Takadonet commented Apr 12, 2017 •

edited

jmchilton commented Apr 14, 2017

Takadonet commented Apr 18, 2017

jmchilton commented Apr 19, 2017

Takadonet commented Apr 19, 2017 •

edited

alexlenail commented Apr 5, 2018

eschen42 commented Oct 29, 2018 •

edited

hexylena commented May 29, 2019

hexylena commented Dec 10, 2020

Eject (or split) dataset collection operation #3870

Eject (or split) dataset collection operation #3870

Comments

nekrut commented Apr 4, 2017 • edited

bgruening commented Apr 4, 2017

nekrut commented Apr 4, 2017

jxtx commented Apr 4, 2017 via email • edited

nekrut commented Apr 4, 2017

jxtx commented Apr 4, 2017

jxtx commented Apr 4, 2017

bgruening commented Apr 4, 2017

nekrut commented Apr 4, 2017

jxtx commented Apr 4, 2017 via email

hexylena commented Apr 6, 2017 • edited

Takadonet commented Apr 12, 2017

jmchilton commented Apr 12, 2017

Takadonet commented Apr 12, 2017 • edited

jmchilton commented Apr 14, 2017

Takadonet commented Apr 18, 2017

jmchilton commented Apr 19, 2017

Takadonet commented Apr 19, 2017 • edited

alexlenail commented Apr 5, 2018

eschen42 commented Oct 29, 2018 • edited

hexylena commented May 29, 2019

hexylena commented Dec 10, 2020

nekrut commented Apr 4, 2017 •

edited

jxtx commented Apr 4, 2017 via email •

edited

hexylena commented Apr 6, 2017 •

edited

Takadonet commented Apr 12, 2017 •

edited

Takadonet commented Apr 19, 2017 •

edited

eschen42 commented Oct 29, 2018 •

edited