Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Add more download options for datasets and histories #2968

Open
jennaj opened this issue Sep 23, 2016 · 11 comments
Open

Enhancement: Add more download options for datasets and histories #2968

jennaj opened this issue Sep 23, 2016 · 11 comments

Comments

@jennaj
Copy link
Member

@jennaj jennaj commented Sep 23, 2016

  1. Compress datasets upon download
    User request, see this biostars post for usage example: https://biostar.usegalaxy.org/p/19684/

  2. Enable download of history archives by FTP (ideally with Filezilla, but curl or wget would also be nice and would parallel dataset download by URL). Why?? Archives >= 1k fail download from http://usegalaxy.org and maybe other Galaxy flavors.
    Related to several user posts, including https://biostar.usegalaxy.org/p/22538/

@bgruening
Copy link
Member

@bgruening bgruening commented Sep 23, 2016

Two things we might want to consider. The compression should not happen on the web-server but as part of the job-submission. The other is that if such a download failed, for some reason we need to clean up properly. People can easily fill up disc space by downloading multiple files that are cached. This does not count into the quota as far as I know.

Loading

@jennaj
Copy link
Member Author

@jennaj jennaj commented Sep 23, 2016

@bgruening What do you think about an FTP download holding area that is an analog of the current FTP upload holding area? Per-account, outside of quota, datasets expire after N days or after downloaded successfully (data currently expires in 3 days for FTP upload or until moved into a history)?

Then some way for users to browse that area (ideally with the same FTP clients used to upload - like Filezilla - but also line-command - both using galaxy server URL + account credentials) to drag files back to their computer in clients or find out the URL for the dataset to execute a fetching command (could be FTP, not just wget or curl).

An idea .. basically the reverse of FTP upload populated by a compress-to-download job, or possibly straight-up batch download staging (originals, compressed or not).

Loading

@bgruening
Copy link
Member

@bgruening bgruening commented Sep 23, 2016

Uh I like this idea!!!! And it should be not so hard to tie this into the existing FTP infrastructure I suppose. We could use the same FTP directory, or do you think this is to confusing? We could have a special folder called download that is hidden in the upload dialog?

Loading

@jennaj
Copy link
Member Author

@jennaj jennaj commented Apr 11, 2017

I'd like to bring this idea up again to be prioritized. We have recently been getting more requests about how to effectively download histories.

Users are asking for known methods, like an FTP client, even if only for convenience (same simple method for getting data in AND out) or want to download line-command because they want to capture status/tracking of the transfer through tools they prefer to work with (including scripts).

What are the current thoughts?

Loading

@jennaj jennaj changed the title Enhancement: Add option to compress datasets before/during download Enhancement: Add more download options for datasets and histories Apr 24, 2017
@GuyReeves
Copy link

@GuyReeves GuyReeves commented May 9, 2017

I wonder if this tool might do quite a bit if a tool were a solution (or intermediate step)

https://toolshed.g2.bx.psu.edu/repository?repository_id=d077871367f67b47

a simple tool which handles the compression of multiple datasets though a data collection (handles BAMs and the BAI index correctly).

This idea has come up else where.

https://biostar.usegalaxy.org/p/22760/

also

galaxyproject/tools-iuc#1295

Loading

@GuyReeves
Copy link

@GuyReeves GuyReeves commented May 18, 2017

Loading

@jennaj
Copy link
Member Author

@jennaj jennaj commented Mar 13, 2019

Update on this idea? Could it be put in 19.05? Was asked about again today at Galaxy Help: https://help.galaxyproject.org/t/downloading-histories-by-curl-or-wget-and-link-only-downloads-an-html-file/795

Loading

@jennaj
Copy link
Member Author

@jennaj jennaj commented Mar 25, 2019

Came up again at GHelp today. There is a tool in the MTS that exports datasets to the FTP area -- maybe it could be a starting place? It reportedly still works in local installs as of 19.01. https://toolshed.g2.bx.psu.edu/view/geert-vandeweyer/files_to_ftp/fe42761670f1

Loading

@jennaj
Copy link
Member Author

@jennaj jennaj commented Jun 21, 2019

Downloading compressed data in batch came up again at GHelp today: https://help.galaxyproject.org/t/export-data-as-compressed-file/1593

Loading

@hexylena
Copy link
Member

@hexylena hexylena commented Aug 1, 2019

Came up on our side today too, especially since we had previously implemented download throttling in nginx.

Loading

@jennaj
Copy link
Member Author

@jennaj jennaj commented Sep 11, 2019

This would be such a big win if we could get it implemented. Data just keeps getting larger and larger.

Plus, being able to use an FTP client like Filezilla vastly improves accessibility for people who are not comfortable on the command line. Getting a collection is particularly tedious/technical the way it is now.

Update: And getting an entire history is ... nearly impossible unless really small.

Ideally, FTP could grab anything in your account, however bundled (datasets/collections, histories, workflows, data-library content eg: training data already labeled and organized correctly -- whatever).

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants