-
Notifications
You must be signed in to change notification settings - Fork 739
Closed
Description
Nextflow seems to be hanging after the pipeline fails when running on a slurm cluster. Here is the full .nextflow.log. This particular pipeline has a workflow.onComplete block that also isn't being called.
May-31 19:17:58.816 [main] DEBUG nextflow.cli.Launcher - $> /usr/local/bin/nextflow run http://git.l.synthgeno.global/SGI-Pipelines/GeneFlagging.git -resume -hub gitlab -r aws_changes -latest -params-file /mnt/efs/ubuntu/nextflow.run.47342766-b597-4f67-8726-82c4e9a0d267/params.yaml -process.executor slurm
May-31 19:17:58.964 [main] INFO nextflow.cli.CmdRun - N E X T F L O W ~ version 0.24.4
May-31 19:17:59.521 [main] DEBUG nextflow.scm.AssetManager - Repository URL: http://git.l.synthgeno.global/SGI-Pipelines/GeneFlagging.git; Project: SGI-Pipelines/GeneFlagging; Hub provider: gitlab
May-31 19:17:59.529 [main] INFO nextflow.cli.CmdRun - Pulling SGI-Pipelines/GeneFlagging ...
May-31 19:17:59.533 [main] DEBUG nextflow.scm.RepositoryProvider - Request [credentials -:-] -> http://git.l.synthgeno.global/api/v3/projects/SGI-Pipelines%2FGeneFlagging
May-31 19:17:59.884 [main] DEBUG nextflow.scm.RepositoryProvider - Request [credentials -:-] -> http://git.l.synthgeno.global/api/v3/projects/SGI-Pipelines%2FGeneFlagging/repository/files?file_path=nextflow.config&ref=master
May-31 19:18:00.018 [main] DEBUG nextflow.scm.RepositoryProvider - Request [credentials -:-] -> http://git.l.synthgeno.global/api/v3/projects/SGI-Pipelines%2FGeneFlagging/repository/files?file_path=main.nf&ref=master
May-31 19:18:00.139 [main] DEBUG nextflow.scm.AssetManager - Pulling SGI-Pipelines/GeneFlagging -- Using remote clone url: http://git.l.synthgeno.global/SGI-Pipelines/GeneFlagging.git
May-31 19:18:01.244 [main] INFO nextflow.cli.CmdRun - downloaded from http://git.l.synthgeno.global/SGI-Pipelines/GeneFlagging.git
May-31 19:18:01.675 [main] DEBUG nextflow.scm.AssetManager - Git config: /tools/nextflow/assets/SGI-Pipelines/GeneFlagging/.git/config; branch: master; remote: origin; url: http://git.l.synthgeno.global/SGI-Pipelines/GeneFlagging.git
May-31 19:18:01.675 [main] INFO nextflow.cli.CmdRun - Launching `SGI-Pipelines/GeneFlagging` [big_jennings] - revision: e8dc1db4ec [aws_changes]
May-31 19:18:01.686 [main] DEBUG nextflow.config.ConfigBuilder - Found config base: /tools/nextflow/assets/SGI-Pipelines/GeneFlagging/nextflow.config
May-31 19:18:01.690 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /tools/nextflow/assets/SGI-Pipelines/GeneFlagging/nextflow.config
May-31 19:18:01.825 [main] DEBUG nextflow.config.ConfigBuilder - Setting config profile: 'standard'
May-31 19:18:01.874 [main] WARN nextflow.config.ConfigBuilder - It seems you never run this project before -- Option `-resume` is ignored
May-31 19:18:01.936 [main] DEBUG nextflow.Session - Session uuid: 57c9f9a1-cc45-41cd-be69-4bc9818b2b3e
May-31 19:18:01.936 [main] DEBUG nextflow.Session - Run name: big_jennings
May-31 19:18:01.936 [main] DEBUG nextflow.Session - Executor pool size: 4
May-31 19:18:01.951 [main] DEBUG nextflow.cli.CmdRun -
Version: 0.24.4 build 4341
Modified: 22-05-2017 11:18 UTC
System: Linux 4.4.0-64-generic
Runtime: Groovy 2.4.10 on OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11
Encoding: UTF-8 (UTF-8)
Process: 4001@ip-172-20-22-81 [172.20.22.81]
CPUs: 4 - Mem: 15.7 GB (14.3 GB) - Swap: 0 (0)
May-31 19:18:02.043 [main] DEBUG nextflow.Session - Work-dir: /mnt/efs/ubuntu/nextflow.run.47342766-b597-4f67-8726-82c4e9a0d267/work [nfs]
May-31 19:18:02.044 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /tools/nextflow/assets/SGI-Pipelines/GeneFlagging/bin
May-31 19:18:02.428 [main] DEBUG nextflow.Session - Session start invoked
May-31 19:18:02.432 [main] DEBUG nextflow.processor.TaskDispatcher - Dispatcher > start
May-31 19:18:02.433 [main] DEBUG nextflow.trace.TraceFileObserver - Flow starting -- trace file: /mnt/efs/ubuntu/nextflow.run.47342766-b597-4f67-8726-82c4e9a0d267/trace.tsv
May-31 19:18:02.446 [main] DEBUG nextflow.script.ScriptRunner - > Script parsing
May-31 19:18:10.194 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
May-31 19:18:10.197 [main] WARN nextflow.script.ScriptBinding - Access to undefined parameter `clean_up` -- Initialise it to a default value eg. `params.clean_up = some_value`
May-31 19:18:10.234 [main] DEBUG nextflow.file.FileHelper - Creating a file system instance for provider: S3FileSystemProvider
May-31 19:18:10.245 [main] DEBUG nextflow.file.FileHelper - AWS S3 config details: {region=us-west-2}
May-31 19:18:10.989 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: slurm
May-31 19:18:10.989 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'slurm'
May-31 19:18:11.000 [main] DEBUG nextflow.executor.Executor - Initializing executor: slurm
May-31 19:18:11.002 [main] INFO nextflow.executor.Executor - [warm up] executor > slurm
May-31 19:18:11.009 [main] DEBUG n.processor.TaskPollingMonitor - Creating task monitor for executor 'slurm' > capacity: 100; pollInterval: 5s; dumpInterval: 5m
May-31 19:18:11.012 [main] DEBUG nextflow.processor.TaskDispatcher - Starting monitor: TaskPollingMonitor
May-31 19:18:11.013 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: slurm)
May-31 19:18:11.016 [main] DEBUG nextflow.executor.Executor - Invoke register for executor: slurm
May-31 19:18:11.017 [main] DEBUG n.executor.AbstractGridExecutor - Creating executor 'slurm' > queue-stat-interval: 1m
May-31 19:18:11.056 [main] DEBUG nextflow.Session - >>> barrier register (process: fetch_blast_nr_db)
May-31 19:18:11.069 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > fetch_blast_nr_db -- maxForks: 4
May-31 19:18:11.091 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: slurm
May-31 19:18:11.091 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'slurm'
May-31 19:18:11.091 [main] DEBUG nextflow.executor.Executor - Initializing executor: slurm
May-31 19:18:11.092 [main] DEBUG n.executor.AbstractGridExecutor - Creating executor 'slurm' > queue-stat-interval: 1m
May-31 19:18:11.092 [main] DEBUG nextflow.Session - >>> barrier register (process: fetch_pfam_db)
May-31 19:18:11.095 [Actor Thread 2] DEBUG nextflow.processor.TaskProcessor - <fetch_blast_nr_db> Poison pill arrived
May-31 19:18:11.102 [Actor Thread 1] DEBUG nextflow.processor.StateObj - <fetch_blast_nr_db> State before poison: StateObj[submitted: 1; completed: 0; poisoned: false ]
May-31 19:18:11.110 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > fetch_pfam_db -- maxForks: 4
May-31 19:18:11.128 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: slurm
May-31 19:18:11.128 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'slurm'
May-31 19:18:11.128 [main] DEBUG nextflow.executor.Executor - Initializing executor: slurm
May-31 19:18:11.129 [main] DEBUG n.executor.AbstractGridExecutor - Creating executor 'slurm' > queue-stat-interval: 1m
May-31 19:18:11.129 [main] DEBUG nextflow.Session - >>> barrier register (process: fetch_superfam_db)
May-31 19:18:11.136 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > fetch_superfam_db -- maxForks: 4
May-31 19:18:11.138 [Actor Thread 4] DEBUG nextflow.processor.TaskProcessor - <fetch_pfam_db> Poison pill arrived
May-31 19:18:11.138 [Actor Thread 1] DEBUG nextflow.processor.StateObj - <fetch_pfam_db> State before poison: StateObj[submitted: 1; completed: 0; poisoned: false ]
May-31 19:18:11.157 [Actor Thread 6] DEBUG nextflow.processor.TaskProcessor - <fetch_superfam_db> Poison pill arrived
May-31 19:18:11.162 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: slurm
May-31 19:18:11.162 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'slurm'
May-31 19:18:11.162 [Actor Thread 1] DEBUG nextflow.processor.StateObj - <fetch_superfam_db> State before poison: StateObj[submitted: 1; completed: 0; poisoned: false ]
May-31 19:18:11.163 [main] DEBUG nextflow.executor.Executor - Initializing executor: slurm
May-31 19:18:11.163 [main] DEBUG n.executor.AbstractGridExecutor - Creating executor 'slurm' > queue-stat-interval: 1m
May-31 19:18:11.164 [main] DEBUG nextflow.Session - >>> barrier register (process: order_by_gc)
May-31 19:18:11.172 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > order_by_gc -- maxForks: 4
May-31 19:18:11.197 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: slurm
May-31 19:18:11.198 [Actor Thread 8] DEBUG nextflow.processor.TaskProcessor - <order_by_gc> Poison pill arrived
May-31 19:18:11.203 [Actor Thread 1] DEBUG nextflow.processor.StateObj - <order_by_gc> State before poison: StateObj[submitted: 1; completed: 0; poisoned: false ]
May-31 19:18:11.198 [main] DEBUG nextflow.processor.ProcessFactory - >> processorType: 'slurm'
May-31 19:18:11.205 [main] DEBUG nextflow.executor.Executor - Initializing executor: slurm
May-31 19:18:11.205 [main] DEBUG n.executor.AbstractGridExecutor - Creating executor 'slurm' > queue-stat-interval: 1m
May-31 19:18:11.206 [main] DEBUG nextflow.Session - >>> barrier register (process: contig_stats)
May-31 19:18:11.224 [Actor Thread 9] DEBUG nextflow.processor.TaskProcessor - Copying to process workdir foreign file: s3:///sgi-pipeline-dev/geneflagging_data/contigs.fa
May-31 19:18:11.240 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > contig_stats -- maxForks: 4
May-31 19:18:11.269 [Actor Thread 10] DEBUG nextflow.processor.TaskProcessor - <contig_stats> Poison pill arrived
May-31 19:18:11.271 [Actor Thread 12] DEBUG nextflow.processor.TaskProcessor - Copying to process workdir foreign file: s3:///sgi-pipeline-dev/geneflagging_data/contigs.fa
May-31 19:18:11.278 [Actor Thread 1] DEBUG nextflow.processor.StateObj - <contig_stats> State before poison: StateObj[submitted: 1; completed: 0; poisoned: false ]
May-31 19:18:11.288 [Actor Thread 3] INFO nextflow.processor.TaskProcessor - [skipping] Stored process > fetch_blast_nr_db
May-31 19:18:11.293 [Actor Thread 7] INFO nextflow.processor.TaskProcessor - [skipping] Stored process > fetch_superfam_db
May-31 19:18:11.293 [Actor Thread 5] INFO nextflow.processor.TaskProcessor - [skipping] Stored process > fetch_pfam_db
May-31 19:18:11.295 [Actor Thread 1] DEBUG nextflow.processor.TaskProcessor - <fetch_pfam_db> Sending poison pills and terminating process
May-31 19:18:11.298 [Actor Thread 13] DEBUG nextflow.processor.TaskProcessor - <fetch_superfam_db> Sending poison pills and terminating process
May-31 19:18:11.299 [Actor Thread 6] DEBUG nextflow.processor.TaskProcessor - <fetch_superfam_db> After stop
May-31 19:18:11.299 [Actor Thread 2] DEBUG nextflow.processor.TaskProcessor - <fetch_blast_nr_db> After stop
May-31 19:18:11.299 [Actor Thread 14] DEBUG nextflow.processor.TaskProcessor - <fetch_blast_nr_db> Sending poison pills and terminating process
May-31 19:18:11.301 [Actor Thread 4] DEBUG nextflow.processor.TaskProcessor - <fetch_pfam_db> After stop
May-31 19:18:11.303 [Actor Thread 1] DEBUG nextflow.Session - <<< barrier arrive (process: fetch_pfam_db)
May-31 19:18:11.304 [Actor Thread 14] DEBUG nextflow.Session - <<< barrier arrive (process: fetch_blast_nr_db)
May-31 19:18:11.308 [Actor Thread 13] DEBUG nextflow.Session - <<< barrier arrive (process: fetch_superfam_db)
May-31 19:18:11.892 [Actor Thread 9] DEBUG nextflow.executor.GridTaskHandler - Launching process > order_by_gc (1) -- work folder: /mnt/efs/ubuntu/nextflow.run.47342766-b597-4f67-8726-82c4e9a0d267/work/c2/2abdaf28c7057602e6edbf6dd85444
May-31 19:18:12.072 [Actor Thread 12] DEBUG nextflow.executor.GridTaskHandler - Launching process > contig_stats (1) -- work folder: /mnt/efs/ubuntu/nextflow.run.47342766-b597-4f67-8726-82c4e9a0d267/work/c3/9e29e54e126749a4cda9a366422f24
May-31 19:18:12.073 [Actor Thread 9] INFO nextflow.Session - [c2/2abdaf] Submitted process > order_by_gc (1)
May-31 19:18:12.090 [Actor Thread 8] DEBUG nextflow.processor.TaskProcessor - <order_by_gc> After stop
May-31 19:18:12.181 [Actor Thread 12] INFO nextflow.Session - [c3/9e29e5] Submitted process > contig_stats (1)
May-31 19:18:12.185 [Actor Thread 10] DEBUG nextflow.processor.TaskProcessor - <contig_stats> After stop
May-31 19:23:16.031 [Thread-3] DEBUG n.processor.TaskPollingMonitor - !! executor slurm > tasks to be completed: 2 -- first: TaskHandler[jobId: 2; id: 4; name: order_by_gc (1); status: RUNNING; exit: -; workDir: /mnt/efs/ubuntu/nextflow.run.47342766-b597-4f67-8726-82c4e9a0d267/work/c2/2abdaf28c7057602e6edbf6dd85444 started: 1496258571038; exited: -; ]
May-31 19:23:26.058 [Thread-3] WARN nextflow.processor.TaskProcessor - Process `order_by_gc (1)` terminated with an error exit status (125) -- Execution is retried (1)
May-31 19:23:26.077 [pool-2-thread-1] DEBUG nextflow.executor.GridTaskHandler - Launching process > order_by_gc (1) -- work folder: /mnt/efs/ubuntu/nextflow.run.47342766-b597-4f67-8726-82c4e9a0d267/work/88/a767bf3eb8c490cc267e6e25ec21b2
May-31 19:23:26.201 [pool-2-thread-1] INFO nextflow.Session - [88/a767bf] Re-submitted process > order_by_gc (1)
May-31 19:23:26.203 [Thread-3] WARN nextflow.processor.TaskProcessor - Process `contig_stats (1)` terminated with an error exit status (125) -- Execution is retried (1)
May-31 19:23:26.220 [pool-2-thread-2] DEBUG nextflow.executor.GridTaskHandler - Launching process > contig_stats (1) -- work folder: /mnt/efs/ubuntu/nextflow.run.47342766-b597-4f67-8726-82c4e9a0d267/work/23/d2a1c73986ef90e0e685cf32def4df
May-31 19:23:26.329 [pool-2-thread-2] INFO nextflow.Session - [23/d2a1c7] Re-submitted process > contig_stats (1)
May-31 19:24:01.085 [Thread-3] ERROR nextflow.processor.TaskProcessor - Error executing process > 'order_by_gc (1)'
Caused by:
Process `order_by_gc (1)` terminated with an error exit status (125)
Command executed [/tools/nextflow/assets/SGI-Pipelines/GeneFlagging/templates/orderFastaByGc.py]:
#!/usr/bin/env python
from gene_annotation.fasta.sortFastaByGc import sortFastaByGc
def main():
inputFile = 'contigs.fa'
outputFile = 'contigsByGc.fa'
sortFastaByGc(inputFile, outputFile)
if __name__ == '__main__':
main()
Command exit status:
125
Command output:
(empty)
Command error:
Unable to find image 'dockreg-dev01.awsv.l.synthgeno.global/compbio/gene_annotation:0.1.0' locally
docker: Error response from daemon: Get https://dockreg-dev01.awsv.l.synthgeno.global/v1/_ping: dial tcp 172.17.0.30:443: getsockopt: no route to host.
See 'docker run --help'.
Work dir:
/mnt/efs/ubuntu/nextflow.run.47342766-b597-4f67-8726-82c4e9a0d267/work/88/a767bf3eb8c490cc267e6e25ec21b2
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
May-31 19:24:01.088 [Actor Thread 13] DEBUG nextflow.processor.TaskProcessor - <order_by_gc> Sending poison pills and terminating process
May-31 19:24:01.088 [Actor Thread 13] DEBUG nextflow.Session - <<< barrier arrive (process: order_by_gc)
May-31 19:24:01.097 [Thread-3] DEBUG nextflow.Session - Session aborted -- Cause: Process `order_by_gc (1)` terminated with an error exit status (125)
May-31 19:24:01.119 [Actor Thread 13] DEBUG nextflow.processor.TaskProcessor - <contig_stats> Sending poison pills and terminating process
May-31 19:24:01.120 [Actor Thread 13] DEBUG nextflow.Session - <<< barrier arrive (process: contig_stats)
May-31 19:24:01.122 [Thread-3] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: slurm)
~
Metadata
Metadata
Assignees
Labels
No labels