Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MD5SUM hits docker pull limit #2547

Open
berguner opened this issue Oct 26, 2022 · 6 comments
Open

MD5SUM hits docker pull limit #2547

berguner opened this issue Oct 26, 2022 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@berguner
Copy link
Contributor

Description of the bug

Hi,

I was running the pipeline on AWS Batch and MD5SUM tasks failed due to docker pull limit. I assume this happens when there are more than ~100 FastQ files. It seems like this task was pulling the ubuntu:10.04 image from Docker hub, so it can be fixed by pointing to an image on quay.io/biocontainers.

https://github.com/nf-core/demultiplex/blob/b0a004eb2e79f6fceb9b5d79b91b563b7724ed62/modules/nf-core/md5sum/main.nf#L8

Command used and terminal output

Error executing process > 'NFCORE_DEMULTIPLEX:DEMULTIPLEX:MD5SUM (RNA2015_34_S34_L002)'                                                                                                                                                      
Caused by:
  Task failed to start - CannotPullContainerError: Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

Command executed:

  md5sum \
       \
      RNA2015_34_S34_L002_R2_001.fastq.gz \
      > RNA2015_34_S34_L002_R2_001.fastq.gz.md5
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_DEMULTIPLEX:DEMULTIPLEX:MD5SUM":
      md5sum: $(echo $(md5sum --version 2>&1 | head -n 1| sed 's/^.*) //;' ))
  END_VERSIONS

Command exit status:
  -

Command output:
  (empty)

Relevant files

No response

System information

N E X T F L O W ~ version 22.04.5

@berguner berguner added the bug Something isn't working label Oct 26, 2022
@ewels
Copy link
Member

ewels commented Oct 26, 2022

uff.. I wonder if we go back to using an nfcore container for this. We're listed as OSS on docker-hub so shouldn't have any pull limits.

Alternatively we could use a mirror on quay.io as that's the same docker registry as biocontainers and shouldn't have pull limits. eg: https://quay.io/repository/bedrock/ubuntu

@berguner
Copy link
Contributor Author

I guess it should be fine to use any linux image with md5sum in it. For example I was able to run it with tabix image, which also has a small footprint. Below is the configuration that I used.

process {
	withName: MD5SUM {
                container = "quay.io/biocontainers/tabix:0.2.6--ha92aebf_0"
        }
}

@edmundmiller edmundmiller transferred this issue from nf-core/demultiplex Nov 23, 2022
@edmundmiller
Copy link
Contributor

@matthdsm Any objections to using https://quay.io/repository/bedrock/ubuntu?

@matthdsm
Copy link
Contributor

I don't really care which image you use, just make sure it's not some bloated mess so we can keep the download times low

@edmundmiller
Copy link
Contributor

With that, we could probably get away alpine for this task, but I'm thinking across the board whether that would work for these minimal containers

@matthdsm
Copy link
Contributor

I agree with alpine! We'll have to make sure it uses the same algorithms for everything though, been burned on that before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants