Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eject (or split) dataset collection operation #3870

Closed
nekrut opened this issue Apr 4, 2017 · 21 comments
Closed

Eject (or split) dataset collection operation #3870

nekrut opened this issue Apr 4, 2017 · 21 comments

Comments

@nekrut
Copy link
Contributor

nekrut commented Apr 4, 2017

In some cases it is necessary to gain access to collection elements individually. For example, in my ChIP-seq analysis I initially bundle all data (signal and control) together into a single collection to pre-process, map, and post-process. However, when I run MACS it requires me to load signal and control separately. To enable this it would be necessary to have one of these:

  • a collection operation that allows spliting a list into multiple lists (just like copying datasets: select datasets you need and copy them into a new collection)
  • an "eject button" on individual datasets when you look inside collection in history (you open a collection, press "eject" and this dataset becomes a new dataset in history)
@bgruening
Copy link
Member

While I agree on the need for such functionality in a worst case, I do think that in this case the more correct thing would be to create two collections for your signal and control. Imho we should only aggregate files that belong functional together.

@nekrut
Copy link
Contributor Author

nekrut commented Apr 4, 2017

then we really need to allow multiple select for collections in tools, so you can run on multiple collections

@jxtx
Copy link
Contributor

jxtx commented Apr 4, 2017 via email

@nekrut
Copy link
Contributor Author

nekrut commented Apr 4, 2017

Aha, yes, but in the short term splitting collection would be nice

@jxtx
Copy link
Contributor

jxtx commented Apr 4, 2017

But last resort ;)

@jxtx
Copy link
Contributor

jxtx commented Apr 4, 2017

(Because it is hard/impossible to do reusably or reproducibly. You don't know which elements are treatment and which are control when you explode a collection interactively, so you can't make a workflow, record types address this)

@bgruening
Copy link
Member

No clue about the client side, but I think selecting multiple collection at once would help you here more than this last resort tool.

@nekrut
Copy link
Contributor Author

nekrut commented Apr 4, 2017

Yes indeed. so multiple select then

@nekrut nekrut closed this as completed Apr 4, 2017
@jxtx
Copy link
Contributor

jxtx commented Apr 4, 2017 via email

@hexylena
Copy link
Member

hexylena commented Apr 6, 2017

Similar to #740, any solution for this would solve my issue as well :D

@Takadonet
Copy link
Contributor

We find having the ability to eject or split would greatly improve our biologist ability to do their work. Specific example would be the SNVPhyl workflow where all samples are used in a single analysis. https://snvphyl.readthedocs.io/en/latest/ , Sometimes the only way to know if one or more samples have to be removed from the collection is at the end of the workflow.

The end user has to then re-make the collection without those samples and re-run. Issue is that sometimes they have to remove dozens or hundreds by hand. Since paging was added (so happy for that!), it makes almost impossible to select all the files again if there is more then 500 files in total.

I don't think the 'eject' tool should be ability in workflow execution but it should be available.

@jmchilton
Copy link
Member

@Takadonet Can you use the filter failed tool to automate this? Or put another way - how are users selecting these datasets?

@Takadonet
Copy link
Contributor

Takadonet commented Apr 12, 2017

@jmchilton . Based on the output results from either a phylogenomics tree or based on values in secondary dataset. Example be all sample that have less then 60% identity to the reference should be removed.

@jmchilton
Copy link
Member

@Takadonet Can you implement a tool that will just fail outputs that don't meet these criteria and then use the "filter failed" tool?

If it makes sense for your workflow to have a human involved - that is totally - but I'm always looking for guinea pigs to utilize new workflow functionality.

@Takadonet
Copy link
Contributor

@jmchilton Seems to me that both cases would be needed. One case would be where human involvement is used and to me should be the same interface as creating a new collection so it is consistent.

Other case should be in a tool that is similar to the ones already in the base Galaxy codebase. i.e merge collection, unzip, zip etc... No point having a normal toolshed because of the duplication of datasets. Having the new tool available during a workflow execution would be awesome but difficult to implement for sure.

We are always up for being a guinea pigs!

@jmchilton
Copy link
Member

@Takadonet Good points - I have a PR to add a filtering option that works without dataset duplication here #3940. Hopefully it will be in 17.05 - then all you would need to do is write a tool that looks at whatever metadata is interesting and builds a list of identifiers only of those you wish to keep.

@Takadonet
Copy link
Contributor

Takadonet commented Apr 19, 2017

@jmchilton Probably cherry pick into our current Galaxies ASAP. Got lots of users that would be interested for sure.

@alexlenail
Copy link

Sorry I'm a little lost between the multiple issues for this issue: What is the current status of being able to run tools on subsets of collections? If that isn't possible, is there a way to "eject" collections into a bunch of unique history items?

@eschen42
Copy link

eschen42 commented Oct 29, 2018

I would like to be able to copy a few datasets from a list of datasets. Specifically, I have a list of over mzML datasets, and I want to extract the dozen that represent the pooled samples. In the History UI, I can choose "Copy Datasets" and choose from the datasets in the history, but when I click on my list dataset so that its contents are revealed and the rest of the history is hidden (i.e., the history pane says "back to (my history)" and "a list with (count) items"), when I choose "Copy Datasets", it shows the datasets in the enclosing history.

Having "eject" would give me a workaround at least. Alternatively, if "Copy Datasets" worked for choosing members from list contents, then copying the members to the enclosing history would have the same effect as eject. Right now my only choice is to download (or find my original files) and upload.

@dannon I thought that it made better sense to comment here than to open a new issue since this seems so closely related.

@hexylena
Copy link
Member

@nekrut the phrasing in your original post was very interactive, so for this case is it now resolved with #7553?

@hexylena
Copy link
Member

I think this is mostly solved with the ability to filter by element identifier, and to interactively select in the tool form. I'm going to close this but please let me know if it's still not resolved and we should re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants