Allow the setting of the bootDiskSize with Google pipelines #1331

daudn · 2019-10-08T14:41:46Z

Bug report

In process declaration, the disk directive does not work.
When I run this workflow:

process {
    disk '30 GB'
    container 'python:3'
   
    script:
    """
    sleep 5 
    echo Done!
    """
}

I get an instance with 10 GB (local) and 500GB pipeline-worker persistent disks.

The Docker image I have is 7 GB large and the workflow runs out of memory since it cannot complete the Docker pull.

Is this issue going to be handled any time soon? I increased my persistent disk quota to accommodate 500GB for each instance. But there is no way around using a large docker image (since 10GB local disk)

Please help, or update. If this is not going to change then I need to change the entire design of my pipeline.

pditommaso · 2019-10-09T08:59:47Z

I'm quite sure it's not that difficult to allow customizability of these features (it already exists but does not work) Is this issue going to be handled any time soon?

Good, looking for receiving a code contribution from you.

daudn · 2019-10-09T09:05:47Z

I'm sorry if that sounded offensive. I just meant if the option is there, that means the developers already thought of/are developing the feature.

I would love to contribute however I'm sure this is between GCP and Nextflow. In GCP we can customize disk size, and since Nextflow uses GCP, that functionality is there and just needs to be implemented.

Once again, I apologize for wording it the way I did, frustrated since Nextflow is supposed to be the answer to all my problems.

pditommaso · 2019-10-09T12:16:11Z

No problem, I appreciate the enthusiasm. It should be possible but we have other priorities. The code is available so that anybody can use, learn and propose improvements.

daudn · 2019-10-10T11:03:28Z

Pipelines API definitely has this functionality.

{
  "minimumCpuCores": number,
  "preemptible": boolean,
  "minimumRamGb": number,
  "disks": [
    {
      object (Disk)
    }
  ],
  "zones": [
    string
  ],
  "bootDiskSizeGb": number,
  "noAddress": boolean,
  "acceleratorType": string,
  "acceleratorCount": string
}

The size of the boot disk. Defaults to 10 (GB).

daudn · 2019-10-11T10:54:34Z

In nextflow/modules/nf-google/src/main/nextflow/cloud/google/pipelines/GooglePipelinesTaskHandler.groovy if we edit the function createPipelineRequest(), could we not edit the boot disk size from there? Like:

def req = new GooglePipelinesSubmitRequest()
        req.machineType = machineType
        req.project = pipelineConfiguration.project
        req.zone = pipelineConfiguration.zone
        req.region = pipelineConfiguration.region
        req.diskName = diskName
        req.diskSizeGb = task.config.disk?.giga
//Here
        req.bootDiskSizeGb = 30
//Here
        req.preemptible = pipelineConfiguration.preemptible
        req.taskName = "nf-$task.hash"
        req.containerImage = task.container
        req.fileCopyImage = fileCopyImage
        req.stagingScript = stagingScript
        req.mainScript = mainScript
        req.unstagingScript = unstaging.join("; ").trim()
        req.sharedMount = sharedMount
        req.accelerator = task.config.getAccelerator()
        return req
    }

pditommaso · 2019-10-11T12:44:22Z

It should work.

daudn · 2019-10-11T13:09:54Z

If I was to make changes "locally", how could I then install that version of Nextflow? Basically, how do you guys package the nextflow? I can try getting it to work, if it does, I can fork, and upload to a feature branch.

pditommaso · 2019-10-11T13:11:38Z

make compile 
./launch.sh

https://github.com/nextflow-io/nextflow#build-from-source

daudn · 2019-10-14T09:21:10Z

daudnadeem:nextflow daudn$ ./launch.sh 
Picked up _JAVA_OPTIONS: -Xverify:none
Error: Could not find or load main class nextflow.cli.Launcher

daudn · 2019-10-14T11:40:05Z

I just understood that ./launch.sh is to be used as nextflow run ...

Anyway. I don't think I can get it to work. My groovy skills are insufficient.

Until this feature is available, I will look for another solution. If the feature is added, I'll come back to implementing the workflow though Nextflow.

mozack · 2019-10-14T19:47:53Z

To be clear, the only change needed here is for the boot disk. The scratch disk is already configurable via the process.disk directive in the latest edge release.

I can take a look at implementing this, however I can't provide a reliable timeframe right now.

@pditommaso Any thoughts on which directive to use for this? There appears to be a cloud.bootStorageSize in use for AWS. Would you like to use that here as well?

@daudn In the meantime, can you comment on what takes up most of the space in your image? 7GB seems like alot.

daudn · 2019-10-14T22:15:29Z

@mozack, @pditommaso

Actually, I've implemented a workflow in Kubernetes, then using RabbitMQ with manual autoscaling of VM Instances. Eventually had a look at Nextflow which seems like the answer if only I can get around the limitation of the 10 GB disk (which is why im quite persistent on this feature)

The docker image is so large because the image has a third party tool installed to do HLA Typing. This tool takes up 6GB of space after being installed (I've tried cleaning up the image), the size of third party tool is not in my hands, I'm afraid.

When I was doing this process manually, I had prebuilt (VMI) images on Google Cloud. And whenever a job was to be processed, the instance that booted up already had the previous 'layers' of the docker image and so it was a much quicker process to get the :latest version.

The manual workflow I built is def efficient, but it has many moving parts and so a higher chance for it to fail, and would also be difficult to maintain in the long run as compared to Nextflow.

Would really appreciate it if you guys could allow customisation of local disk storage.

pditommaso · 2019-10-15T10:05:40Z

@mozack The idea is to deprecate the cloud context at some point, therefore, I think it could be added a google.pipelines.bootDisk option to configure it.

daudn · 2019-10-22T11:58:53Z

@mozack any update?

pditommaso · 2019-10-22T12:01:28Z

@daudn we are discussing with the google team regarding this, tho no ETA at this time. stay tuned.

daudn · 2019-12-03T10:00:52Z

Any updates? Had a look at the release but it doesn't seem to include this.

pditommaso · 2019-12-03T13:37:56Z

It will be included in the next stable release 20.01.0

daudn · 2019-12-03T14:03:15Z

@pditommaso thank you for the update, looking forward to the release!

pditommaso · 2019-12-10T13:40:01Z

The new google-lifesciences executor allow the specification of the book disk size using the following config setting:

process.executor = 'google-lifesciences` 
google.lifeSciences.bootDiskSize = 50.GB

You can try it with the latest snapshot:

NXF_VER=19.12.0-SNAPSHOT nextflow run .. <usual cli params>

daudn changed the title ~~Disk directive does not work~~ Disk directive does not work in GCP Oct 8, 2019

daudn mentioned this issue Oct 10, 2019

Google Pipelines - Support for configurable disk size #1244

Closed

pditommaso added the executor/google-pipelines label Oct 17, 2019

pditommaso changed the title ~~Disk directive does not work in GCP~~ Allow the setting of the bootDiskSize with Google pipelines Oct 17, 2019

pditommaso added this to the v20.01.0 milestone Dec 3, 2019

pditommaso added the pri/high label Dec 7, 2019

pditommaso added a commit that referenced this issue Dec 10, 2019

Add support for bootDiskSize #1331

af8fe16

pditommaso added a commit that referenced this issue Dec 10, 2019

Add Google life sciences docs #1331

f118798

pditommaso closed this as completed Dec 10, 2019

pditommaso removed the pri/high label Jan 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow the setting of the bootDiskSize with Google pipelines #1331

Allow the setting of the bootDiskSize with Google pipelines #1331

daudn commented Oct 8, 2019 •

edited by pditommaso

Loading

pditommaso commented Oct 9, 2019

daudn commented Oct 9, 2019

pditommaso commented Oct 9, 2019

daudn commented Oct 10, 2019 •

edited

Loading

daudn commented Oct 11, 2019 •

edited

Loading

pditommaso commented Oct 11, 2019

daudn commented Oct 11, 2019

pditommaso commented Oct 11, 2019

daudn commented Oct 14, 2019

daudn commented Oct 14, 2019

mozack commented Oct 14, 2019

daudn commented Oct 14, 2019

pditommaso commented Oct 15, 2019

daudn commented Oct 22, 2019

pditommaso commented Oct 22, 2019

daudn commented Dec 3, 2019

pditommaso commented Dec 3, 2019 •

edited

Loading

daudn commented Dec 3, 2019

pditommaso commented Dec 10, 2019

Allow the setting of the bootDiskSize with Google pipelines #1331

Allow the setting of the bootDiskSize with Google pipelines #1331

Comments

daudn commented Oct 8, 2019 • edited by pditommaso Loading

Bug report

pditommaso commented Oct 9, 2019

daudn commented Oct 9, 2019

pditommaso commented Oct 9, 2019

daudn commented Oct 10, 2019 • edited Loading

daudn commented Oct 11, 2019 • edited Loading

pditommaso commented Oct 11, 2019

daudn commented Oct 11, 2019

pditommaso commented Oct 11, 2019

daudn commented Oct 14, 2019

daudn commented Oct 14, 2019

mozack commented Oct 14, 2019

daudn commented Oct 14, 2019

pditommaso commented Oct 15, 2019

daudn commented Oct 22, 2019

pditommaso commented Oct 22, 2019

daudn commented Dec 3, 2019

pditommaso commented Dec 3, 2019 • edited Loading

daudn commented Dec 3, 2019

pditommaso commented Dec 10, 2019

daudn commented Oct 8, 2019 •

edited by pditommaso

Loading

daudn commented Oct 10, 2019 •

edited

Loading

daudn commented Oct 11, 2019 •

edited

Loading

pditommaso commented Dec 3, 2019 •

edited

Loading