Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JVM does not exit after failed tasks are complete #1457

Closed
mahesh-panchal opened this issue Jan 14, 2020 · 6 comments
Closed

JVM does not exit after failed tasks are complete #1457

mahesh-panchal opened this issue Jan 14, 2020 · 6 comments

Comments

@mahesh-panchal
Copy link
Contributor

Bug report

The JVM hangs sometimes after an error.

Expected behavior and actual behavior

After all the tasks have completed/failed, the JVM should exit after reporting the error message, however sometimes it does not.

Steps to reproduce the problem

These particular steps are not reproducible outside the Swedish Computer cluster, but there are others on the nf-core help channel who have also encountered the JVM hanging, and not using the Swedish cluster specific profile (uppmax).

nextflow run nf-core/chipseq -profile test,singularity,uppmax

The profile uppmax also needs the --project parameter set in order not to fail, so the pipeline failed as expected, however the JVM did not exit, even after 40 mins.

Program output

This is the terminal output:

nextflow run nf-core/chipseq -profile test,singularity,uppmax
N E X T F L O W  ~  version 19.10.0
Launching `nf-core/chipseq` [small_mestorf] - revision: 21be314954 [master]
----------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/chipseq v1.1.0
----------------------------------------------------
Run Name            : small_mestorf
Data Type           : Paired-End
Design File         : https://raw.githubusercontent.com/nf-core/test-datasets/chipseq/design.csv
Genome              : Not supplied
Fasta File          : https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/reference/genome.fa
GTF File            : https://raw.githubusercontent.com/nf-core/test-datasets/atacseq/reference/genes.gtf
MACS2 Genome Size   : 1.2E+7
Min Consensus Reps  : 1
MACS2 Narrow Peaks  : No
MACS2 Broad Cutoff  : 0.1
Trim R1             : 0 bp
Trim R2             : 0 bp
Trim 3' R1          : 0 bp
Trim 3' R2          : 0 bp
NextSeq Trim        : 0 bp
Fingerprint Bins    : 100
Save Genome Index   : No
Max Resources       : 6 GB memory, 2 cpus, 12h time per job
Container           : singularity - nfcore/chipseq:1.1.0
Output Dir          : ./results
Launch Dir          : /domus/h1/mahesh/test_dir/nf-core-chipseq
Working Dir         : /domus/h1/mahesh/test_dir/nf-core-chipseq/work
Script Dir          : /home/mahesh/.nextflow/assets/nf-core/chipseq
User                : mahesh
Config Profile      : test,singularity,uppmax
Config Description  : Minimal test dataset to check pipeline function
Config Contact      : Phil Ewels (@ewels)
Config URL          : https://www.uppmax.uu.se/
----------------------------------------------------
[-        ] process > CheckDesign            -
[-        ] process > BWAIndex               -
[-        ] process > CheckDesign              -
[-        ] process > BWAIndex                 -
[-        ] process > CheckDesign              -
[-        ] process > BWAIndex                 -
[-        ] process > CheckDesign              -
[-        ] process > BWAIndex                 -
[-        ] process > MakeGeneBED              -
[-        ] process > MakeTSSBED               -
[-        ] process > MakeGenomeFilter         -
[-        ] process > FastQC                   -
[-        ] process > TrimGalore               -
[-        ] process > BWAMem                   -
[-        ] process > SortBAM                  -
[-        ] process > MergeBAM                 -
[-        ] process > MergeBAMFilter           -
[-        ] process > CheckDesign                  -                                                                              [-        ] process > BWAIndex                     -
[-        ] process > MakeGeneBED                  -                                                                              [-        ] process > MakeTSSBED                   -
[-        ] process > MakeGenomeFilter (genome.fa) -
[-        ] process > FastQC                       -
[-        ] process > TrimGalore                   -
[-        ] process > BWAMem                       -
[-        ] process > SortBAM                      -
[-        ] process > MergeBAM                     -                                                                              [-        ] process > MergeBAMFilter               -
[-        ] process > MergeBAMRemoveOrphan         -                                                                              [-        ] process > Preseq                       -
[-        ] process > CollectMultipleMetrics       -
[-        ] process > BigWig                       -
[-        ] process > PlotProfile                  -                                                                              [-        ] process > PhantomPeakQualTools         -
[-        ] process > PlotFingerprint              -                                                                              [-        ] process > MACSCallPeak                 -
[-        ] process > AnnotatePeaks                -
[-        ] process > PeakQC                       -
[-        ] process > ConsensusPeakSet             -
[-        ] process > ConsensusPeakSetAnnotate     -
[-        ] process > ConsensusPeakSetDESeq        -
[-        ] process > IGV                          -
[-        ] process > get_software_versions        -
[-        ] process > MultiQC                      -
[-        ] process > output_documentation         -
Execution cancelled -- Finishing pending tasks before exit
WARN: Access to undefined parameter `project` -- Initialise it to a default value eg. `params.project = some_value`
Error executing process > 'MakeGenomeFilter (genome.fa)'
Caused by:
  Failed to submit process to grid scheduler for execution
Command executed:
  sbatch .command.run
Command exit status:
  1
Command output:
  sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
Work dir:
  /domus/h1/mahesh/test_dir/nf-core-chipseq/work/00/146faa743af319b96497a092041fe6
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Nextflow log:
nextflow.log

Jstack trace at point JVM had hung:
jstack.log

Environment

  • Nextflow version: 19.10.0
  • Java version: openjdk 11.0.1-internal 2018-10-16
  • Operating system: Linux 3.10.0-1062.9.1.el7.x86_64
@jpfeuffer
Copy link

jpfeuffer commented Feb 25, 2020

I have the same problem. For me the only way to resolve it was to set "errorStrategy" to "terminate" or reduce "maxForks" to 1.
As soon as it tries to queue multiple processes that fail, they seem to block each other indefinitely.

Edit: For me this even happens on a local docker setup.

@pditommaso
Copy link
Member

This may be related to #1432.

@pditommaso
Copy link
Member

I think this issue reported above. Feel free to reopen if still happens with a version including that patch.

@jpfeuffer
Copy link

jpfeuffer commented May 5, 2020

Hmm, I just tried it with 20.04.1 and it did not solve the problem for me.
I create a (rather) minimal repository where you could replicate the error.
https://github.com/jpfeuffer/nfbug

(you can see that GH Actions CI runs indefinitely since the command in the process is not going to work.)

For manual replication, just clone and run nextflow run -profile docker,test

@jpfeuffer
Copy link

Sorry, after creating this minimal example I think it is just an error in the configuration and resource allocation in retries.

@jpfeuffer
Copy link

Unfortunately, I cannot find a real pattern in when the execution stalls after an error though.
If I reduce the resources to a bare minimum and all four processes can be scheduled at the same time it somehow works.

It would be great if someone could still have a look at the example I gave.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants