Improve foreign files download #265

pditommaso · 2016-12-14T15:29:52Z

Currently foreign input files (eg. s3://foo/bar) are downloaded by the driver application even when using the distributed Ignite executor. Moreover an extra copy is needed when using a scratch directory.

This needs to be improved so that foreign files are download by the remote execution nodes and files are created in the target destination folder without any intermediate copy.

The text was updated successfully, but these errors were encountered:

ewels · 2017-05-04T07:06:55Z

And to confirm, as discussed on Gitter - would be great if such files are only downloaded a single time if used in multiple places by a workflow (eg. the same foreign input file used as the input to a process that is run for many other files).

ihaque-freenome · 2017-07-07T20:00:15Z

would be great if such files are only downloaded a single time if used in multiple places by a workflow

One caveat: this may be impractical in the case of distributed execution. I'm running Nextflow in a shared SLURM environment with limited networked storage but dedicated fast scratch space per node. I definitely would not want large remote files to be staged into the shared mount; rather, they should go into scratch. It would certainly be ideal to schedule jobs dependent on the same file onto the same node (into the same scratch dir?) so they don't need to transfer the file twice, but I'd rather have two transfers from object store to scratch than have the shared filer become a bottleneck.

pditommaso · 2017-07-08T11:13:44Z

Is that cloud SLURM deployment? Yes, ideally it should be able to cover this use case as well.

ihaque-freenome · 2017-07-09T05:41:41Z

@pditommaso yes - it is a SLURM cluster implemented on Google Compute Engine. Each node has local SSD mounted at /scratch and a shared mount at /home.

emi80 · 2019-01-14T13:59:06Z

Hello all,
I am awakening this thread since I am now facing the same problem.

Looking at related issues I found #686 (comment) and #686 (comment) by @ewels. I totally agree and I think we would cover most of the execution cases this way.

This commit adds the ability to cache foreign input files so that they are staged in the pipeline work directory. This brings two main benefits: 1) Multiple processes using the same remote input file will use the same downloaded copy with triggering multiple downloads of the same file; 2) When resuming the pipeline execution all remote files previously downloaded are retried from the execution cache. Moreover this commit fixes a bug in the download thread pool that was limiting the download one file at time. Solves #265, #686. Merge #1006 Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

pditommaso mentioned this issue Feb 9, 2017

storeDir does not work with S3 targets #247

Closed

pditommaso added the WIP label Feb 9, 2017

pditommaso mentioned this issue Jul 7, 2017

Google Cloud Storage built-in support #276

Closed

pditommaso mentioned this issue Aug 14, 2017

"when" directive should not allow staging of input files for a process, if the criteria are not met #424

Closed

pditommaso added this to the v0.26.0 milestone Sep 19, 2017

pditommaso mentioned this issue Oct 1, 2017

Failure during foreign files download cause execution to stop abruptly #295

Closed

pditommaso modified the milestones: v0.26.0, v0.27.0 Oct 25, 2017

pditommaso modified the milestones: v0.27.0, v0.28.0 Dec 19, 2017

pditommaso modified the milestones: v0.28.0, v0.29.0 Feb 13, 2018

pditommaso modified the milestones: v0.29.0, v0.30.0 Apr 8, 2018

pditommaso added the pri/high label May 3, 2018

pditommaso mentioned this issue May 4, 2018

Cache remotely pulled files #686

Closed

pditommaso modified the milestones: v0.30.0, v0.31.0 May 30, 2018

pditommaso modified the milestones: v0.31.0, v0.32.0 Jul 12, 2018

pditommaso removed this from the v0.32.0 milestone Sep 25, 2018

pditommaso added the storage/aws label Nov 8, 2018

pditommaso removed WIP labels Jan 29, 2019

pditommaso closed this as completed Jan 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve foreign files download #265

Improve foreign files download #265

pditommaso commented Dec 14, 2016

ewels commented May 4, 2017

ihaque-freenome commented Jul 7, 2017

pditommaso commented Jul 8, 2017 •

edited

ihaque-freenome commented Jul 9, 2017

emi80 commented Jan 14, 2019

Improve foreign files download #265

Improve foreign files download #265

Comments

pditommaso commented Dec 14, 2016

ewels commented May 4, 2017

ihaque-freenome commented Jul 7, 2017

pditommaso commented Jul 8, 2017 • edited

ihaque-freenome commented Jul 9, 2017

emi80 commented Jan 14, 2019

pditommaso commented Jul 8, 2017 •

edited