Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test fails when using GCP due to missing tools in the basic biocontainer #764

Closed
hnawar opened this issue Feb 10, 2022 · 20 comments
Closed
Labels
bug Something isn't working

Comments

@hnawar
Copy link

hnawar commented Feb 10, 2022

Description of the bug

When trying to run the pipeline with profiles test,google the pipeline fails in the prep steps
After initial debugging, it seems the container suggested in the untar and the gunzip which is the base biocontainer is the issue.
When replacing it in my fork with another container image I built that had those tools the pipeline ran further and failed in a later step.
I updated the containers here
https://github.com/nf-core/rnaseq/blob/master/modules/nf-core/modules/gunzip/main.nf#L8
and here
https://github.com/nf-core/rnaseq/blob/master/modules/nf-core/modules/untar/main.nf#L8

The same issue seems to affect a number of pipelines that uses these modules at least on Google Cloud and potentially in other cases as well.

Command used and terminal output

# # google is my own profile where I define some additional config for my GCP environment 
$ nextflow run hnawar/rnaseq -profile google,gls,test


error executing process > 'NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GUNZIP_GTF (genes.gtf.gz)'
Caused by:
  Process `NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GUNZIP_GTF (genes.gtf.gz)` terminated with an error exit status (9)
Command executed:
  gunzip \
      -f \
       \
      genes.gtf.gz
 
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GUNZIP_GTF":
      gunzip: $(echo $(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*$//')
  END_VERSIONS
Command exit status:
  9
Command output:
  (empty)
Command error:
  Execution failed: generic::failed_precondition: while running "nf-11ec56a6c594fc3a85d97d3ab128fc0b-main": unexpected exit status 1 was not ignored
Work dir:
  gs://mybucket/wd/11/ec56a6c594fc3a85d97d3ab128fc0b
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
Unexpected error [AbortedException]

Relevant files

No response

System information

Nextflow version: 21.10.6
Hardware: Cloud
Executor: google (And potentially others)
Container engine: Docker
OS Linux
Version of nf-core/rnaseq 3.5

@hnawar hnawar added the bug Something isn't working label Feb 10, 2022
@drpatelh
Copy link
Member

Thanks @hnawar ! I need to prep a release soon so will try to look into this. Will need to find a way to reproduce on GCP. Quite a few modules will be using the Biocontainers BusyBox base image which seems to work fine on AWS and Azure.

Any ideas why this error is raised?

Execution failed: generic::failed_precondition: while running "nf-11ec56a6c594fc3a85d97d3ab128fc0b-main": unexpected exit status 1 was not ignored

@hnawar
Copy link
Author

hnawar commented Feb 11, 2022

@drpatelh
The only error I could find was in the stderr of the main action

/bin/bash: /wd/88/3292ba6f8e2c7bb81d029e4cf0ae78/.command.log: Permission denied

Here is the .command.log for this particular action

+ cd /wd/88/3292ba6f8e2c7bb81d029e4cf0ae78
+ gsutil -m -q cp gs://argo-hn-nf-gke1/wd/88/3292ba6f8e2c7bb81d029e4cf0ae78/.command.run .
+ bash .command.run nxf_stage
+ [[ 1 -gt 0 ]]
+ ls -lah /wd/88/3292ba6f8e2c7bb81d029e4cf0ae78
total 37M
drwxr-xr-x 4 root root 4.0K Feb 10 11:06 .
drwxr-xr-x 3 root root 4.0K Feb 10 11:06 ..
-rw-r--r-- 1 root root  228 Feb 10 11:06 .command.log
-rw-r--r-- 1 root root  13K Feb 10 11:06 .command.run
-rw-r--r-- 1 root root  272 Feb 10 11:06 .command.sh
drwx------ 2 root root  16K Feb 10 11:05 lost+found
drwxr-xr-x 2 root root 4.0K Feb 10 11:06 nextflow-bin
-rw-r--r-- 1 root root  37M Feb 10 11:06 star.tar.gz
+ cd /wd/88/3292ba6f8e2c7bb81d029e4cf0ae78
+ bash .command.run nxf_unstage
CommandException: No URLs matched: .command.out
CommandException: 1 file/object could not be transferred.
CommandException: No URLs matched: .command.err
CommandException: 1 file/object could not be transferred.
CommandException: No URLs matched: .command.trace
CommandException: 1 file/object could not be transferred.
CommandException: No URLs matched: .exitcode
CommandException: 1 file/object could not be transferred.
ls: cannot access 'star': No such file or directory
ls: cannot access 'versions.yml': No such file or directory

@hnawar
Copy link
Author

hnawar commented Feb 11, 2022

I will do some more tests with just the Life Sciences API and the BusyBox container image

@leipzig
Copy link

leipzig commented Feb 18, 2022

can you add these to your profile and see if it completes?

    process.memory = '50GB'
    process.cpus = 8

@andrewfrank
Copy link

I was able to replicate this issue on GCP with the google-lifesciences executor on nf-core/rnaseq 3.6.

Similar to @hnawar's solution, I was able to resolve this by switching the container for any process that used the biocontainers/biocontainers:v1.2.0_cv1 container.

Here's the configuration I used:

process {
    withName: 'GUNZIP_GTF|GUNZIP_ADDITIONAL_FASTA|UNTAR_SALMON_INDEX|UNTAR_STAR_INDEX|CAT_FASTQ|CAT_ADDITIONAL_FASTA' {
        container = 'ubuntu:20.04'
    }
}

Any ideas if there's a more graceful method of changing the container for processes that use the biocontainers/biocontainers:v1.2.0_cv1 container other than hardcoding the process names?

Also I expect that this will impact any workflow using nf-core/modules, should I open an issue there?

@pditommaso
Copy link
Contributor

I wonder what's the rationale to use a fancy biocontainers/biocontainers:v1.2.0_cv1 container just to run gunzip. Why not use a stock debian image or something similar?

@drpatelh
Copy link
Member

drpatelh commented Apr 21, 2022

Thanks again for looking into this @andrewfrank @hnawar @leipzig. I will replace this container in the next release to one that is compatible with GCP too. The main advantage of using a Biocontainer here is that it comes with both Docker and Singularity images we can use off the shelf. Depending on the execution environment either of these can be pulled directly.

As it stands, the ubuntu:20.04 image only comes in Docker form which is why I chose the biocontainers/biocontainers:v1.2.0_cv1 image. Everything has been working fine on various other infrastructures until now! I will look into this when I get a moment.

@pditommaso
Copy link
Contributor

Does not nextflow convert it automatically to singularity on-demand?

@drpatelh
Copy link
Member

drpatelh commented Apr 21, 2022

It does but that has had inherent issues in the past where for example user's home directories are filled up with temp files due to the conversion from Docker -> Singularity. There is also a time overhead in the conversion which doesn't always work seamlessly either.

If you can directly download a container and avoid all of that then why not?! Especially if it means pipelines break less and users are on Cloud 9.

@drpatelh
Copy link
Member

With some digging it turns out that the Galaxy project is also hosting a Singularity image for ubuntu:20.04 🎉
https://depot.galaxyproject.org/singularity/ubuntu:20.04

@andrewfrank
Copy link

@drpatelh if you wanted to maintain consistency with biocontainers, I confirmed that quay.io/biocontainers/python:3.9--1 solves this issue as well.

@drpatelh
Copy link
Member

Thanks @andrewfrank. Let's let the size of the container be the deciding factor here:

ubuntu:20.04 -> 72.8MB
quay.io/biocontainers/python:3.9--1 -> 191MB

I may rope you into testing when I have made the changes if that's ok?

@andrewfrank
Copy link

@drpatelh 👍 happy to help with testing

@drpatelh
Copy link
Member

@andrewfrank this should now be fixed on dev via #806

Be great if you can give the pipeline a go with -r dev as the revision. And if you can post the command you are using along with any custom configuration would be helpful too.

Will leave this issue open until we have managed to get the pipeline running out of the box on GCP.

Thanks!

@andrewfrank
Copy link

@drpatelh Success!

Config from $HOME/.nextflow/config:

 profiles {

    gcp {

        // Set some convenient command line arguments
        params {
            google_zone = 'us-east4-a'
            google_debug = false
            google_preemptible = false
        }

        // Specify google configuration options
        process.executor = 'google-lifesciences'
        google.zone = params.google_zone
        google.lifeSciences.debug = params.google_debug
        google.lifeSciences.preemptible = params.google_preemptible

        // Set google appropriate error strategy
        process.errorStrategy = {
            task.exitStatus in [143,137,104,134,139,14] ? 'retry' : 'finish'
        }
        process.maxRetries = 5

    }
}

Command run:

nextflow run nf-core/rnaseq \
        -revision dev \
        -profile gcp,test \
        -bg \
        -work-dir 'gs://PATH_TO_GCP_BUCKET/work' \
        --outdir 'gs://PATH_TO_GCP_BUCKET/results'

Let me know if you need any other output.

(Another reason ubuntu:20.04 was the right choice: it turns out that using quay.io/biocontainers/python:3.9--1 for these processes breaks DUMPSOFTWAREVERSIONS because of an error obtaining the version of gunzip from that container.)

@drpatelh
Copy link
Member

Awesome! Thanks for testing so quickly and for all of the infra set-up info. I want to do a little more testing and then we can get a release in the next week or two.

@hnawar
Copy link
Author

hnawar commented Apr 25, 2022

Thanks @drpatelh and @andrewfrank I have also tested it and it worked on the same setup it was failing before.

I will test a few more nf-core pipelines that had the same issue and open other issues when needed

@dshinzie
Copy link

dshinzie commented Apr 26, 2022

I was experiencing the same issues and can confirm using the dev branch resolved the errors. I can also confirm changing biocontainers/biocontainers:v1.2.0_cv1 to ubuntu:20.04 in nf-core/cutandrun fixed similar errors that were occurring in that pipeline.

@andrewfrank
Copy link

This issue also appears in nf-core/mag, tagging @skrakau on this.

@ptn24
Copy link

ptn24 commented Dec 25, 2023

Can someone help me understand how using ubuntu:20.04 addresses the problem? What was the problem with biocontainers/biocontainers:v1.2.0_cv1? I ask because I may be running into the same problem. I am using a custom image though, so ubuntu:20.04 does not work for me. I would like to update my image to get it to work, but I am not sure where to begin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants