Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The upper limit of the thread pool size has probably been reached. #2021

Closed
rjpbonnal opened this issue Apr 8, 2021 · 6 comments
Closed
Milestone

Comments

@rjpbonnal
Copy link

Bug report

Running the nf-core/sarek pipeline from a node with 128 Cores and 256 Threads, 1TB ram using Singularity and SGE.
Similar to #1871 and related to #92

This is the first time I am facing this issue. My team is running successfully from the same machine, other nf-core pipelines w/o reporting any problem.

go to the bottom for the solution.

Expected behavior and actual behavior

Run the nf-core/sarek pipeline; running the internal test of the pipelines was fine for all of them.

I got the pipeline interrupted and resumed several times when running on real data. Every time I resumed the pipieline another step got the thread pool size error.

The thread pool executor cannot run the task. The upper limit of the thread pool size has probably been reached. Current pool size: 1000 Maximum pool size: 1000

Steps to reproduce the problem

The discriminant is a node with presumably an higher number of core/threads.

Program output

N E X T F L O W  ~  version 20.10.0
Launching `./main.nf` [insane_newton] - revision: 3a69235604
----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
    ____
  .´ _  `.
 /  |\`-_ \      __        __   ___
|   | \  `-|    |__`  /\  |__) |__  |__/
 \ |   \  /     .__| /¯¯\ |  \ |___ |  \
  `|____\´

  nf-core/sarek v2.7
----------------------------------------------------
Run Name          : insane_newton
Max Resources     : 128 GB memory, 16 cpus, 10d time per job
Container         : singularity - nfcore/sarek:2.7
Input             : /storage/workingarea/testing-sarek/input_qwerty.tsv
Step              : mapping
Genome            : GRCh38
Nucleotides/s     : 1000.0
Tools             : ascat, cnvkit, controlfreec, strelka, vep, snpeff
MarkDuplicates    : Options
Java options      : "-Xms4000m -Xmx7g"
GATK Spark        : No
Save BAMs mapped  : No
Skip MarkDuplicates: No
ASCAT             : Options
purity            : 0.8
ploidy            : 2
Control-FREEC     : Options
coefficientOfVariation: 0.05
Tools to annotate : strelka
Annotation cache  : Enabled
AWS iGenomes base : s3://ngi-igenomes/igenomes/
Save Reference    : No
Loci              : s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/ASCAT/1000G_phase3_GRCh38_maf0.3.loci
Loci GC           : s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/ASCAT/1000G_phase3_GRCh38_maf0.3.loci.gc
BWA indexes       : s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/BWAIndex/Homo_sapiens_assembly38.fasta.64.{alt,amb,ann,bwt,pac,sa}
Chromosomes       : s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/Chromosomes
Chromosomes length: s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/Length/Homo_sapiens_assembly38.len
dbsnp             : s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/dbsnp_146.hg38.vcf.gz
dbsnpIndex        : s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/dbsnp_146.hg38.vcf.gz.tbi
dict              : s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.dict                                                                                                                                                                                                                                             fasta reference   : s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta
fasta index       : s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta.fai
intervals         : s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/intervals/wgs_calling_regions.hg38.bed
known indels      : s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz
known indels index: s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz.tbi
Mappability       : s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/Control-FREEC/out100m2_hg38.gem
snpEff DB         : GRCh38.86
species           : homo_sapiens
VEP cache version : 99
[b1/f50ef4] process > get_software_versions                                  [100%] 1 of 1 ✔
[-        ] process > BuildBWAindexes                                        -
[-        ] process > BuildDict                                              -
[-        ] process > BuildFastaFai                                          -
[-        ] process > BuildDbsnpIndex                                        -
[-        ] process > BuildGermlineResourceIndex                             -
[-        ] process > BuildKnownIndelsIndex                                  -
[-        ] process > BuildPonIndex                                          -
[-        ] process > BuildIntervals                                         -
[74/83ffb5] process > CreateIntervalBeds (wgs_calling_regions.hg38.bed)      [100%] 1 of 1, cached: 1 ✔
[bc/535310] process > FastQCFQ (2002-2002_N1)                                [100%] 2 of 2, cached: 2 ✔
[-        ] process > FastQCBAM                                              -
[-        ] process > TrimGalore                                             -
[-        ] process > UMIFastqToBAM                                          -
[-        ] process > UMIMapBamFile                                          -
[-        ] process > GroupReadsByUmi                                        -
[-        ] process > CallMolecularConsensusReads                            -
[8e/2b20ea] process > MapReads (2002-2002_N1)                                [100%] 2 of 2, cached: 2 ✔
[-        ] process > Sentieon_MapReads                                      -
[-        ] process > MergeBamMapped                                         -
[-        ] process > IndexBamMergedForSentieon                              -
[-        ] process > IndexBamFile                                           -
[22/823708] process > MarkDuplicates (2002-2002_T)                           [100%] 2 of 2, cached: 2 ✔
[-        ] process > Sentieon_Dedup                                         -
[a7/a9c31d] process > BaseRecalibrator (2002-2002_T-chr22_12691731-12726204) [100%] 266 of 266, cached: 266 ✔
[8d/c43000] process > GatherBQSRReports (2002-2002_N)                        [100%] 2 of 2, cached: 2 ✔
[c3/cbb40c] process > ApplyBQSR (2002-2002_N-chr9_43236168-43263290)         [100%] 266 of 266, cached: 266 ✔
[-        ] process > Sentieon_BQSR                                          -
[8e/9329f9] process > MergeBamRecal (2002-2002_T)                            [100%] 2 of 2, cached: 2 ✔
[-        ] process > IndexBamRecal                                          -
[7d/173653] process > SamtoolsStats (2002-2002_T)                            [  0%] 0 of 2
[e7/bd002e] process > BamQC (2002-2002_T)                                    [ 50%] 2 of 4, cached: 2
[-        ] process > HaplotypeCaller                                        -
[-        ] process > GenotypeGVCFs                                          -
[-        ] process > Sentieon_DNAseq                                        -
[-        ] process > Sentieon_DNAscope                                      -
[-        ] process > StrelkaSingle                                          -
[-        ] process > MantaSingle                                            -
[-        ] process > TIDDIT                                                 -
[-        ] process > FreebayesSingle                                        -
[-        ] process > FreeBayes                                              -
[-        ] process > Mutect2                                                -
[-        ] process > MergeMutect2Stats                                      -
[-        ] process > ConcatVCF                                              -
[-        ] process > ConcatVCF_Mutect2                                      -
[-        ] process > PileupSummariesForMutect2                              -
[-        ] process > MergePileupSummaries                                   -
[-        ] process > CalculateContamination                                 -
[-        ] process > FilterMutect2Calls                                     -
[-        ] process > Sentieon_TNscope                                       -
[-        ] process > CompressSentieonVCF                                    -
[-        ] process > Strelka                                                -
[-        ] process > Manta                                                  -
[-        ] process > StrelkaBP                                              -
[-        ] process > CNVkit                                                 -
[-        ] process > MSIsensor_scan                                         -
[-        ] process > MSIsensor_msi                                          -
[-        ] process > AlleleCounter                                          -
[-        ] process > ConvertAlleleCounts                                    -
[-        ] process > Ascat                                                  -
[-        ] process > Mpileup                                                -
[-        ] process > MergeMpileup                                           -
[-        ] process > ControlFREEC                                           -
[-        ] process > ControlFREECSingle                                     -
[-        ] process > ControlFreecViz                                        -
[-        ] process > ControlFreecVizSingle                                  -
[-        ] process > BcftoolsStats                                          -
[-        ] process > Vcftools                                               -
[-        ] process > Snpeff                                                 -
[-        ] process > CompressVCFsnpEff                                      -
[-        ] process > VEP                                                    -
[-        ] process > VEPmerge                                               -
[-        ] process > CompressVCFvep                                         -                                                                                                                                                                                                                                                                                              [-        ] process > MultiQC                                                -
[55/85074c] process > Output_documentation                                   [100%] 1 of 1, cached: 1 ✔
Error executing process > 'Mpileup'

Caused by:
  The thread pool executor cannot run the task. The upper limit of the thread pool size has probably been reached. Current pool size: 1000 Maximum pool size: 1000
-[nf-core/sarek] Pipeline completed with errors-

Environment

  • Nextflow version: [20.10.0]
  • Java version: [11.0.9.1-internal 2020-11-04, openjdk version "10.0.2" 2018-07-17]
  • Operating system: [RHEL 8.2 (Ootpa)]
  • Bash version: (GNU bash, version 4.4.19(1)-release (x86_64-redhat-linux-gnu))

Additional context

Using

nextflow -Dnxf.pool.type=sync run ...

the pipeline run smoothly w/o issues.

@pditommaso
Copy link
Member

Can you please include the complete error stack trace (look in the .nextflow.log file)

@rjpbonnal
Copy link
Author

Can you please include the complete error stack trace (look in the .nextflow.log file)

Sure,
nextflow.log

@pditommaso
Copy link
Member

pditommaso commented Apr 14, 2021

Adding here list of different thread pool implementation for reference

default

        new ThreadPoolExecutor(
              poolSize,  // 1 
              1000, 
              ResizeablePool.KEEP_ALIVE_TIME, 
              TimeUnit.SECONDS, 
              new SynchronousQueue<Runnable>(), 
              new ThreadFactory() 

sync

        new ThreadPoolExecutor(
              1,
              maxThreads, // cpus+1
              KEEP_ALIVE_TIME,
              TimeUnit.SECONDS,
              new SynchronousQueue<Runnable>(),
              newDaemonThreadFactory(),
              new ThreadPoolExecutor.CallerRunsPolicy())

bound

        new ThreadPoolExecutor(
              1,
              maxThreads, // cpus+1
              KEEP_ALIVE_TIME,
              TimeUnit.SECONDS,
              new LinkedBlockingQueue<Runnable>(queueSize), // 1000
              newDaemonThreadFactory(),
              new ThreadPoolExecutor.CallerRunsPolicy())

unbound

        new ThreadPoolExecutor(
              1,
              maxThreads, // cpus+1
              KEEP_ALIVE_TIME,
              TimeUnit.SECONDS,
              new LinkedBlockingQueue<Runnable>(),
              newDaemonThreadFactory(),
              new ThreadPoolExecutor.CallerRunsPolicy())

@rjpbonnal
Copy link
Author

@pditommaso do you need something from my side such as running more tests?

@pditommaso
Copy link
Member

I'll give a try asap. tx!

@pditommaso
Copy link
Member

Uploaded a patch on develop branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants