galaxyproject / galaxy Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement: Add more download options for datasets and histories #2968
Comments
|
Two things we might want to consider. The compression should not happen on the web-server but as part of the job-submission. The other is that if such a download failed, for some reason we need to clean up properly. People can easily fill up disc space by downloading multiple files that are cached. This does not count into the quota as far as I know. |
|
@bgruening What do you think about an FTP download holding area that is an analog of the current FTP upload holding area? Per-account, outside of quota, datasets expire after N days or after downloaded successfully (data currently expires in 3 days for FTP upload or until moved into a history)? Then some way for users to browse that area (ideally with the same FTP clients used to upload - like Filezilla - but also line-command - both using galaxy server URL + account credentials) to drag files back to their computer in clients or find out the URL for the dataset to execute a fetching command (could be FTP, not just wget or curl). An idea .. basically the reverse of FTP upload populated by a compress-to-download job, or possibly straight-up batch download staging (originals, compressed or not). |
|
Uh I like this idea!!!! And it should be not so hard to tie this into the existing FTP infrastructure I suppose. We could use the same FTP directory, or do you think this is to confusing? We could have a special folder called |
|
I'd like to bring this idea up again to be prioritized. We have recently been getting more requests about how to effectively download histories. Users are asking for known methods, like an FTP client, even if only for convenience (same simple method for getting data in AND out) or want to download line-command because they want to capture status/tracking of the transfer through tools they prefer to work with (including scripts). What are the current thoughts? |
|
I wonder if this tool might do quite a bit if a tool were a solution (or intermediate step) https://toolshed.g2.bx.psu.edu/repository?repository_id=d077871367f67b47 a simple tool which handles the compression of multiple datasets though a data collection (handles BAMs and the BAI index correctly). This idea has come up else where. https://biostar.usegalaxy.org/p/22760/ also |
|
this tool might do all this https://toolshed.g2.bx.psu.edu/view/earlhaminst/export_to_cluster/ !! |
|
Update on this idea? Could it be put in 19.05? Was asked about again today at Galaxy Help: https://help.galaxyproject.org/t/downloading-histories-by-curl-or-wget-and-link-only-downloads-an-html-file/795 |
|
Came up again at GHelp today. There is a tool in the MTS that exports datasets to the FTP area -- maybe it could be a starting place? It reportedly still works in local installs as of |
|
Downloading compressed data in batch came up again at GHelp today: https://help.galaxyproject.org/t/export-data-as-compressed-file/1593 |
|
Came up on our side today too, especially since we had previously implemented download throttling in nginx. |
|
This would be such a big win if we could get it implemented. Data just keeps getting larger and larger. Plus, being able to use an FTP client like Filezilla vastly improves accessibility for people who are not comfortable on the command line. Getting a collection is particularly tedious/technical the way it is now. Update: And getting an entire history is ... nearly impossible unless really small. Ideally, FTP could grab anything in your account, however bundled (datasets/collections, histories, workflows, data-library content eg: training data already labeled and organized correctly -- whatever). |
Compress datasets upon download
User request, see this biostars post for usage example: https://biostar.usegalaxy.org/p/19684/
Enable download of history archives by FTP (ideally with Filezilla, but curl or wget would also be nice and would parallel dataset download by URL). Why?? Archives >= 1k fail download from http://usegalaxy.org and maybe other Galaxy flavors.
Related to several user posts, including https://biostar.usegalaxy.org/p/22538/
The text was updated successfully, but these errors were encountered: