Using nextflow with moab and msub #1224

MJKampmann · 2019-07-12T08:34:26Z

I would like to use nextflow on a cluster which uses moab cluster suite as a workload manager to run a RNA-seq analysis pipeline.
In moab jobs are submitted using the msub command, the resource manager in moab is torque.
Is there a way to implement moab as an executor in nextflow?
Thank you

pditommaso · 2019-07-12T08:55:31Z

It would not be too hard to add it. Could you provide a command-line example to submit a job, delete it and checking the queue status?

MJKampmann · 2019-07-12T09:06:49Z

Sure,
for example for a job named test running the script job.sh
msub -q single -N test -l nodes=1:ppn=1,walltime=3:00:00,pmem=5000mb job.sh
Resources are requested with -l, the job requires 1 core, 3h walltime and 5000mb memory. -q specifies the queue.
Jobs are deleted by
mjobctl -c <job-id>
And the queue status can be checked with showq

pditommaso · 2019-07-12T09:20:23Z

To me looks very similar to PBS. Have you check if there's a qsub command in your cluster?

MJKampmann · 2019-07-12T09:27:24Z

Yes, it is very similar to PBS. It is not possible to submit jobs with qsub on the cluster.

pditommaso · 2019-07-12T09:36:02Z

Then copy and paste here the exact output of the following commands:

msub help

msub -h

msub submission

Submit a job and include the command and its exact output

showq help

showq -h

showq example

Copy and paste here the exact output of the showq reporting at least a couple of jobs

showq job status codes

Can you find all possible job status codes that can be reported by showq and report here?

mjobctl help

mjobctl -h

MJKampmann · 2019-07-12T09:50:15Z

Showq Active Jobs

showq, two active jobs:

active jobs------------------------
JOBID              USERNAME      STATE PROCS   REMAINING            STARTTIME

881196             hd_ow424    Running    16     2:59:58  Fri Jul 12 11:41:49
881197             hd_ow424    Running    16     2:59:58  Fri Jul 12 11:41:49

2 active jobs          32 of 11920 processors in use by local jobs (0.27%)
                        491 of 705 nodes active      (69.65%)

eligible jobs----------------------
JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME


0 eligible jobs

blocked jobs-----------------------
JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME


0 blocked jobs

Total jobs:  2

Showq help

showq -h :

Usage: showq [FLAGS]
  --about
  --help
  --host=<SERVERHOSTNAME>
  --loglevel=<LOGLEVEL>
  --port=<SERVERPORT>
  --timeout=<SECONDS>
  --version
  --xml

  --blocking

  -b // BLOCKED JOBS
  -c // COMPLETED QUEUE
  -g // DISPLAY SRC. PEER GRID NAME
  -i // IDLE QUEUE
  -l // LOCAL/REMOTE VIEW
  -n // DISPLAY USER/ALTERNATE JOB NAMES
  -N // DISPLAY NODE/TASK USAGE BY JOB
  -o <ORDER> // DISPLAY ACTIVE JOBS IN SPECIFIED SORT ORDER:
             //  REMAINING REVERSEREMAINING JOB USER STATE STARTTIME
  -p <PARTITION> // PARTITION
  -r // ACTIVE QUEUE
  -R <RSVID> // show jobs in reservation
  -s // DISPLAY WORKLOAD SUMMARY
  -S // SYSTEM JOBS
  -v // VERBOSE
  -w {user,group,acct,class,qos,jobgroup}=<VAL> // where constraint

mjobctl help

mjobctl --help:

Usage: mjobctl [FLAGS]
  --about
  --help
  --host=<SERVERHOSTNAME>
  --loglevel=<LOGLEVEL>
  --port=<SERVERPORT>
  --timeout=<SECONDS>
  --version
  --xml

  -c <JOBID>[,<JOBID> ...] // CANCEL
  -e <JOBID> // RERUN (TORQUE RM only)
  -F <JOBID>[,<JOBID> ...] // FORCE CANCEL
  -C <JOBID> // CHECKPOINT
  { -h | -u } [<TYPE>] <JOBID> // HOLD
     <TYPE>=user|system|batch|defer|ALL
  -m <ATTR>{=|+=}<VAL> <JOBID> // MODIFY
  -N [signal=]<SIGID> <JOBID> // NOTIFY
  -p [+=|-=] <VAL> <JOBID> // MODIFY SYSTEM PRIORITY
  -q {diag|hostlist|starttime|wiki|json} { ALL | <JOBID> } [ --flags=COMPLETED ] // QUERY
  -r <JOBID> // RESUME
  -R <JOBID> // REQUEUE
  -s <JOBID> // SUSPEND
  -w <ATTR>=<VAL> // WHERE
  -x <JOBID> // EXECUTE

  <ATTR>={account|advres|allocnodelist|awduration|class|eeduration|env|flags|gres|group|hostlist|jobid|jobname|maxmem|messages|minstarttime|nodecount|qos|releasetime|reqreservation|rmxstring|state|sysprio|tpn|trig|user|userprio|var|wclimit}

msub help

msub --help:

Usage: msub [FLAGS] [<CMDFILE> [<ARG>] [<ARG>]...]
  --about
  --help
  --host=<SERVERHOSTNAME>
  --loglevel=<LOGLEVEL>
  --port=<SERVERPORT>
  --timeout=<SECONDS>
  --version
  --xml

 DATA STAGING
  --stagein=<STAGEIN-SPEC>
  --stageinsize=<STAGEINSIZE-SPEC>
  --stageinfile=<FILENAME>
  --stageout=<STAGEOUT-SPEC>
  --stageoutsize=<STAGEOUTSIZE-SPEC>
  --stageoutfile=<FILENAME>

 Workflow Job IDs
  --workflowjobids

  [-a date_time] [-A account_string] [-b retry_count] [-c interval] [-C directive_prefix] [-d initdir]
  [-e errorpath] [-F "<args>"] [-h] [-I] [-j join] [-k keep] [-l resource_list] [-L task] [-m mail_options]
  [-M user_list] [-n] [-N name] [-o path] [-p priority] [-q destination] [-r c] [-t jobarrays] [-S path_list]
  [-u user_list] [-v variable_list] [-V] [-w path] [-W additional_attributes] [-z] [script]

  -L tasks=#[:lprocs=#|all][:{usecores|usethreads|allowthreads}]
            [:place={node|socket|numanode|core|thread}[=#]][:memory=#]
            [:swap=#][:maxtpn=#][:gpus=#[:<mode>]][:mics=#]
            [:gres=<gres>][:feature=<feature>][[:{cpt|cgroup_per_task}]|[:{cph|cgroup_per_host}]]

pditommaso · 2019-07-12T09:55:57Z

Good, I would need also the job submit output and the complete list of job possible statuses.

MJKampmann · 2019-07-12T09:56:54Z

Job submission example:
msub -N testjob -l 'walltime=00:10:00,nodes=1:ppn=1,pmem=500mb' -o 'output.txt' test.sh
Output:
output.txt

pditommaso · 2019-07-12T10:00:47Z

It looks there's a xml output option. Could you please include the msub and showq output specifying the --xml option?

MJKampmann · 2019-07-12T10:04:57Z

So in showq, jobs are either active, eligible or blocked.
Active jobs have as status either Running or Starting.

Blocked Jobs can be in the following states:

State | Description
-- | --
Idle | Job violates a fairness policy. Use diagnose -q for more      information.
UserHold | A user hold is in place.
SystemHold | An administrative or system hold is in place.
BatchHold | A scheduler batch hold is in place (used when the job cannot be run      because the requested resources are not available in the system or because     the resource manager has repeatedly failed in attempts to start the job).
Deferred | A scheduler defer hold is in place (a temporary hold used when a job      has been unable to start after a specified number of attempts. This hold     is automatically removed after a short period of time).
NotQueued | Job is in the resource manager state NQ (indicating the job's controlling     scheduling daemon in unavailable).

MJKampmann · 2019-07-12T10:09:43Z

showq --xml:

<Data><Object>queue</Object><cluster LocalActiveNodes="479" LocalAllocProcs="16" LocalConfigNodes="747" LocalIdleNodes="229" LocalIdleProcs="3910" LocalUpNodes="708" LocalUpProcs="11968" RemoteActiveNodes="0" RemoteAllocProcs="0" RemoteConfigNodes="0" RemoteIdleNodes="0" RemoteIdleProcs="0" RemoteUpNodes="0" RemoteUpProcs="0" time="1562925910"/><queue count="1" option="active"><job AWDuration="1392" Account="bw18k008" Class="single" DRMJID="881196.admin2" GJID="881196" Group="hd_hd" JobID="881196" JobName="JOBNAME.deeptools_bamCompare.cell=HMG007,treat=repl1,chip=PU1,scale=SES,ratio=subtract" MasterHost="m13s0703" PAL="torque" ReqAWDuration="10800" ReqNodes="1" ReqProcs="16" RsvStartTime="1562924509" RunPriority="99" StartPriority="99" StartTime="1562924509" StatPSDed="22271.200000" StatPSUtl="1829.560100" State="Running" SubmissionTime="1562924505" SuspendDuration="0" User="hd_ow424"/></queue><queue count="0" option="eligible"/><queue count="0" option="blocked"/></Data>

Update:

<?xml version="1.0" encoding="UTF-8"?>
<Data>
   <Object>queue</Object>
   <cluster LocalActiveNodes="479" LocalAllocProcs="16" LocalConfigNodes="747" LocalIdleNodes="229" LocalIdleProcs="3910" LocalUpNodes="708" LocalUpProcs="11968" RemoteActiveNodes="0" RemoteAllocProcs="0" RemoteConfigNodes="0" RemoteIdleNodes="0" RemoteIdleProcs="0" RemoteUpNodes="0" RemoteUpProcs="0" time="1562925910" />
   <queue count="1" option="active">
      <job AWDuration="1392" Account="bw18k008" Class="single" DRMJID="881196.admin2" GJID="881196" Group="hd_hd" JobID="881196" JobName="JOBNAME.deeptools_bamCompare.cell=HMG007,treat=repl1,chip=PU1,scale=SES,ratio=subtract" MasterHost="m13s0703" PAL="torque" ReqAWDuration="10800" ReqNodes="1" ReqProcs="16" RsvStartTime="1562924509" RunPriority="99" StartPriority="99" StartTime="1562924509" StatPSDed="22271.200000" StatPSUtl="1829.560100" State="Running" SubmissionTime="1562924505" SuspendDuration="0" User="hd_ow424" />
   </queue>
   <queue count="0" option="eligible" />
   <queue count="0" option="blocked" />
</Data>

MJKampmann · 2019-07-12T10:15:37Z

And for
msub -N testjob -l 'walltime=00:10:00,nodes=1:ppn=1,pmem=500mb' -o 'output.txt' --xml test.sh
I get
<Data><job JobID="881218"/></Data>

pditommaso · 2019-07-12T10:33:10Z

Excellent. Regarding job status, there's no status for job completion successfully or failed?

MJKampmann · 2019-07-12T10:37:24Z

No, the job status is just given as completed, errors are listed in a separate error file.

pditommaso · 2019-07-12T10:58:49Z

I see. Can you also include an example of the error file?

MJKampmann · 2019-07-12T11:33:19Z

If no erros are raised the error file is empty, otherwise it contains errors raised by the program.
You can set the error file path with msub -N testjob -l 'walltime=00:10:00,nodes=1:ppn=1,pmem=500mb' -o 'output.txt' -e 'error.txt' test.sh
Here is an example where test.sh ends with open quotation marks, so a warning is raised
error.txt

MJKampmann · 2019-07-12T11:39:30Z

Here is a different example. TrimGalore is executed as part of a snakemake pipeline to process a reads in a fastq file:
run_trimGalore.cell=HMG017,treat=repl1,chip=PU1,reps=rep1.txt

pditommaso · 2019-07-12T14:41:53Z

I've pushed a possible implementation. There little chance that it will work at the first try, but if you can manage to test it, it should not be so too difficult.

To compile and test it, do the following:

clone this projet and checkout maob-executor with this command:

git clone -b maob-executor https://github.com/nextflow-io/nextflow.git

compile, assemble the executable

make compile pack
cp build/releases/nextflow-19.08.0-SNAPSHOT-all ./nextflow
chmod +x ./nextflow
./nextflow info

use the above binary in place of the stock nextflow luncher adding this setting in your nextflow.config file.
```
 process.executor='moab'
```

MJKampmann · 2019-07-14T19:33:13Z

Hello, I tried the implementation.
I tried to launch the pipeline with the command
nextflow run ~/my-pipelines/nf-core/rnaseq-master
but get the following errror message:

N E X T F L O W  ~  version 19.08.0-SNAPSHOT
Launching `~my-pipelines/nf-core/rnaseq-master/main.nf` [agitated_poitras] - revision: 653dedd4d2
Unable to parse config file: '~/my-pipelines/nf-core/rnaseq-master/nextflow.config'

  Compile failed for sources FixedSetSources[name='/groovy/script/Script93D6D5AA4C3D38DC945138B35C17E9B0']. Cause: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
  /groovy/script/Script93D6D5AA4C3D38DC945138B35C17E9B0: 13: unexpected char: '#' @ line 13, column 3.
       # executer = 'pbs'
       ^

  1 error

pditommaso · 2019-07-14T19:34:45Z

# is not a valid character for the config file .

MJKampmann · 2019-07-14T19:49:36Z

Right, fixed that.
Now I just get the following output:

N E X T F L O W  ~  version 19.08.0-SNAPSHOT
Launching `~/my-pipelines/nf-core/rnaseq-master/main.nf` [exotic_shockley] - revision: 653dedd4d2
originalHostName

The runs' logfile looks as follows:
nextflow.log.txt

pditommaso · 2019-07-16T07:00:51Z

Oh, never seen such error before:

ERROR nextflow.cli.Launcher - @unknown
java.lang.NoSuchFieldError: originalHostName
	at java.net.InetAddress.init(Native Method)
	at java.net.InetAddress.<clinit>(InetAddress.java:277)
	at java.net.PlainSocketImpl.initProto(Native Method)
	at java.net.PlainSocketImpl.<clinit>(PlainSocketImpl.java:45)
	at java.net.Socket.setImpl(Socket.java:503)
	at java.net.Socket.<init>(Socket.java:84)
	at javax.net.ssl.SSLSocket.<init>(SSLSocket.java:145)
	at sun.security.ssl.BaseSSLSocketImpl.<init>(BaseSSLSocketImpl.java:61)
	at sun.security.ssl.SSLSocketImpl.<init>(SSLSocketImpl.java:524)
	at sun.security.ssl.SSLSocketFactoryImpl.createSocket(SSLSocketFactoryImpl.java:72)
	at sun.net.www.protocol.https.HttpsClient.createSocket(HttpsClient.java:409)

Please, include the script and config files you are using.

update: which version of Java are you using?

MJKampmann · 2019-07-16T14:26:00Z

Hello,
yes, i think the error occurred because I had installed a different version of java then was running on the cluster by default. Now I am using:
java -version

openjdk version "1.8.0_152-release"
OpenJDK Runtime Environment (build 1.8.0_152-release-1056-b12)
OpenJDK 64-Bit Server VM (build 25.152-b12, mixed mode)

and the error doesn't occurr and I can launch nextflow.
However, there appears to be a different issue.
When I launch the pipeline, jobs get submitted to the cluster and also re, but are completed after a couple of seconds and no output files are created. I attached you the .nextflow.log file:
nextflow_log_new.txt

pditommaso · 2019-07-16T14:31:22Z

You need to investigate why it's failing. as the error message is suggesting try

  #
  #  Detailed information about the job (available <24h after job exit):
  #    checkjob -v 884646
  #    checkjob -v -v 884646

MJKampmann · 2019-07-16T14:58:28Z

The job terminated because of the following error:

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
  - bash
  - fish
  - tcsh
  - xonsh
  - zsh
  - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.

However, I can activate the conda environment with
conda activate <environment-name>
Also if I use
conda init --all
no action is taken.

pditommaso · 2019-07-16T19:49:26Z

Try the following:

change to the failing task work dir
edit the script .command.run
add the directive #MSUB -S /bin/bash
try to execute again the job using the command: msub .command.run

Does it solve the problem?

MJKampmann · 2019-07-16T20:53:28Z

No, does not solve the problem. I have atttached a jobfile from a snakemake pipeline that worked as an example.
jobfile-796879.txt
If I use #!/bin/sh in the script the error
/var/spool/torque/mom_priv/jobs/884845.admin2.SC: line 33: syntax error near unexpected token <'`
is returned instead.

MJKampmann · 2019-07-17T14:56:31Z

Using the script:

#!/bin/bash
source activate <environment>

the environments are activated
while

#!/bin/bash
conda activate <environment>

fails with the error message as seen above.
Also, if I modify .command.run accordingly, the job is executed.

pditommaso · 2019-07-18T07:47:06Z

Not sure to understand, can you include both . command.run scripts (original not working and the one you modified and executed successfully)?

MJKampmann · 2019-07-18T08:10:57Z

Sure.
This is the original script
command.run.txt
The conda environment is activated in line 274, with conda activate this is where the execution always failed.
In the modified script I changed that to source activate , which is a command used to activate conda environments in older versions of conda.
command.run_modified.txt
Using the modified script the job is executed.

pditommaso · 2019-07-18T08:15:39Z

I see. source activate is a legacy Conda activation style and not supported any more by NF 2bdc925.

MJKampmann · 2019-07-23T09:22:16Z

I now executed the pipeline on the cluster and everything else appears to be working, job submission etc works fine.

pditommaso · 2019-07-23T09:22:48Z

nice to read that

pditommaso · 2019-07-27T20:00:22Z

I've included this feature in the latest stable 19.07.0.

pditommaso added the kind/feature label Jul 12, 2019

pditommaso added a commit that referenced this issue Jul 12, 2019

Add Moab executor (first iteration) #1224

201c524

pditommaso added this to the v19.07.0 milestone Jul 27, 2019

pditommaso added a commit that referenced this issue Jul 27, 2019

Add support for Moab resource manager #1224

3ba51a1

pditommaso closed this as completed Jul 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using nextflow with moab and msub #1224

Using nextflow with moab and msub #1224

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019 •

edited

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019

MJKampmann commented Jul 12, 2019 •

edited by pditommaso

MJKampmann commented Jul 12, 2019 •

edited by pditommaso

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019 •

edited

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019

MJKampmann commented Jul 12, 2019

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019

MJKampmann commented Jul 14, 2019

pditommaso commented Jul 14, 2019

MJKampmann commented Jul 14, 2019 •

edited

pditommaso commented Jul 16, 2019 •

edited

MJKampmann commented Jul 16, 2019

pditommaso commented Jul 16, 2019

MJKampmann commented Jul 16, 2019

pditommaso commented Jul 16, 2019

MJKampmann commented Jul 16, 2019 •

edited

MJKampmann commented Jul 17, 2019

pditommaso commented Jul 18, 2019

MJKampmann commented Jul 18, 2019

pditommaso commented Jul 18, 2019 •

edited

MJKampmann commented Jul 23, 2019

pditommaso commented Jul 23, 2019

pditommaso commented Jul 27, 2019 •

edited

Using nextflow with moab and msub #1224

Using nextflow with moab and msub #1224

Comments

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019 • edited

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019

msub help

msub submission

showq help

showq example

showq job status codes

mjobctl help

MJKampmann commented Jul 12, 2019

Showq Active Jobs

Showq help

mjobctl help

msub help

pditommaso commented Jul 12, 2019

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019

MJKampmann commented Jul 12, 2019 • edited by pditommaso

MJKampmann commented Jul 12, 2019 • edited by pditommaso

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019 • edited

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019

MJKampmann commented Jul 12, 2019

MJKampmann commented Jul 12, 2019

pditommaso commented Jul 12, 2019

MJKampmann commented Jul 14, 2019

pditommaso commented Jul 14, 2019

MJKampmann commented Jul 14, 2019 • edited

pditommaso commented Jul 16, 2019 • edited

MJKampmann commented Jul 16, 2019

pditommaso commented Jul 16, 2019

MJKampmann commented Jul 16, 2019

pditommaso commented Jul 16, 2019

MJKampmann commented Jul 16, 2019 • edited

MJKampmann commented Jul 17, 2019

pditommaso commented Jul 18, 2019

MJKampmann commented Jul 18, 2019

pditommaso commented Jul 18, 2019 • edited

MJKampmann commented Jul 23, 2019

pditommaso commented Jul 23, 2019

pditommaso commented Jul 27, 2019 • edited

pditommaso commented Jul 12, 2019 •

edited

MJKampmann commented Jul 12, 2019 •

edited by pditommaso

MJKampmann commented Jul 12, 2019 •

edited by pditommaso

pditommaso commented Jul 12, 2019 •

edited

MJKampmann commented Jul 14, 2019 •

edited

pditommaso commented Jul 16, 2019 •

edited

MJKampmann commented Jul 16, 2019 •

edited

pditommaso commented Jul 18, 2019 •

edited

pditommaso commented Jul 27, 2019 •

edited