Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



32 Commits

Repository files navigation


Easily submitting multiple PBS jobs or running local jobs in parallel. Multiple input files supported.

Submitting PBS jobs

easy_qsub submits PBS jobs with script template, avoid repeatedly editing PBS scripts.

Default template (~/.easy_qsub/default.pbs):

#PBS -S /bin/bash
#PBS -N $name
#PBS -q $queue
#PBS -l ncpus=$ncpus
#PBS -l mem=$mem
#PBS -l walltime=$walltime

echo run on node: $$HOSTNAME >&2


Generated PBS scripts are saved in /tmp/easy_qsub-user. If jobs are submitted successfuly, PBS scripts will be moved to current directory. If not, they will be removed.

Support for multiple inputs

Inspired by qtask, multiple inputs is supported (See example 2). If "{}" appears in a command, it will be replaced with the current filename. Four formats are supported. For example, for a file named "a/b/read_1.fq.gz":

format target result
{} full path a/b/read_1.fq.gz
{%} basename read_1.fq.gz
{^.fq.gz} remove suffix from full path a/b/read_1
{%^.fq.gz} remove suffix from basename read_1

Running local jobs in parallel

It also support runing commands locally with option -lp (parallelly) or -ls (serially). This make it easy to switch between cluster and local machine.

Best partner: cluster_files

New version:

To make best use of the support for multiple input, a script cluster_files is added to cluster files into multiple directories by creating symbolic links or moving files (See example 3,4). It's useful for programs which take one directory as input.

Another useful scene is to apply different jobs to a same dataset. One bad directory structure is:

├── A
├── A.stage1
├── A.stage2
├── B
├── B.stage1
└── B.stage2

A flexible structure can be organsize by cluster_files. Instead of changing original directory structure, using links could be more clear and flexible.

├── A
└── B
├── A
└── B
├── A
└── B


  1. Submit a single job

     easy_qsub 'ls -lh'
  2. Submit multiple jobs, runing fastqc for a lot of fq.gz files

     easy_qsub -n 8 -m 2GB 'mkdir -p QC/{%^.fq.gz}.fastqc; zcat {} | fastqc -o QC/{%^.fq.gz}.fastqc stdin' *.fq.gz

    Excuted commands are:

     mkdir -p QC/read_1.fastqc; zcat read_1.fq.gz | fastqc -o QC/read_1.fastqc stdin
     mkdir -p QC/read_2.fastqc; zcat read_2.fq.gz | fastqc -o QC/read_2.fastqc stdin

    Dry run with -vv

     easy_qsub -n 8 -m 2GB 'mkdir -p QC/{%^.fq.gz}.fastqc; zcat {} | fastqc -o QC/{%^.fq.gz}.fastqc stdin' *.fq.gz -vv
  3. Supposing a directory rawdata containing paired files as below.

     $ tree rawdata
     ├── A2_1.fq.gz
     ├── A2_1.unpaired.fq.gz
     ├── A2_2.fq.gz
     ├── A2_2.unpaired.fq.gz
     ├── A3_1.fq.gz
     ├── A3_1.unpaired.fq.gz
     ├── A3_2.fq.gz
     ├── A3_2.unpaired.fq.gz

    And I have a program, which takes a directory as input and do some thing with the paired files. Command is like this, dirA.

    It is slow by submiting jobs like example 2), handing A2_*.fq.gz and then A3_*.fq.gz. We can split rawdata directory into multiple directories (cluster files by the prefix), and submit jobs for all directories.

     cluster_files -p '(.+?)_\d\.fq\.gz$' rawdata -o rawdata.cluster
     tree rawdata.cluster/
     ├── A2
     │   ├── A2_1.fq.gz -> ../../rawdata/A2_1.fq.gz
     │   └── A2_2.fq.gz -> ../../rawdata/A2_2.fq.gz
     └── A3
         ├── A3_1.fq.gz -> ../../rawdata/A3_1.fq.gz
         └── A3_2.fq.gz -> ../../rawdata/A3_2.fq.gz
     easy_qsub ' {}' rawdata.split/*

    Another example (e.g. some assembler can handle unpaired reads too):

     cluster_files -p '(.+?)_\d.*\.fq\.gz$' rawdata -o rawdata.cluster2
     tree rawdata.cluster2
     ├── A2
     │   ├── A2_1.fq.gz -> ../../rawdata/A2_1.fq.gz
     │   ├── A2_1.unpaired.fq.gz -> ../../rawdata/A2_1.unpaired.fq.gz
     │   ├── A2_2.fq.gz -> ../../rawdata/A2_2.fq.gz
     │   └── A2_2.unpaired.fq.gz -> ../../rawdata/A2_2.unpaired.fq.gz
     └── A3
         ├── A3_1.fq.gz -> ../../rawdata/A3_1.fq.gz
         ├── A3_1.unpaired.fq.gz -> ../../rawdata/A3_1.unpaired.fq.gz
         ├── A3_2.fq.gz -> ../../rawdata/A3_2.fq.gz
         └── A3_2.unpaired.fq.gz -> ../../rawdata/A3_2.unpaired.fq.gz
  4. Another example (complexed directory structure)

     tree rawdata2
     ├── OtherDir
     │   └── abc.fq.gz.txt
     ├── S1
     │   ├── A2_1.fq.gz
     │   ├── A2_1.unpaired.fq.gz
     │   ├── A2_2.fq.gz
     │   ├── A2_2.unpaired.fq.gz
     │   ├── A4_1.fq.gz
     │   └── A4_2.fq.gz
     └── S2
         ├── A3_1.fq.gz
         ├── A3_1.unpaired.fq.gz
         ├── A3_2.fq.gz
         └── A3_2.unpaired.fq.gz
     cluster_files -p '(.+?)_\d\.fq\.gz$' rawdata2/
     tree rawdata2.cluster/
     ├── A2
     │   ├── A2_1.fq.gz -> ../../rawdata2/S1/A2_1.fq.gz
     │   └── A2_2.fq.gz -> ../../rawdata2/S1/A2_2.fq.gz
     ├── A3
     │   ├── A3_1.fq.gz -> ../../rawdata2/S2/A3_1.fq.gz
     │   └── A3_2.fq.gz -> ../../rawdata2/S2/A3_2.fq.gz
     └── A4
         ├── A4_1.fq.gz -> ../../rawdata2/S1/A4_1.fq.gz
         └── A4_2.fq.gz -> ../../rawdata2/S1/A4_2.fq.gz
     cluster_files -p '(.+?)_\d\.fq\.gz$'  rawdata2/ -k -f  # keep original dir structure
     tree rawdata2.cluster/
     ├── S1
     │   ├── A2
     │   │   ├── A2_1.fq.gz -> ../../../rawdata2/S1/A2_1.fq.gz
     │   │   └── A2_2.fq.gz -> ../../../rawdata2/S1/A2_2.fq.gz
     │   └── A4
     │       ├── A4_1.fq.gz -> ../../../rawdata2/S1/A4_1.fq.gz
     │       └── A4_2.fq.gz -> ../../../rawdata2/S1/A4_2.fq.gz
     └── S2
         └── A3
             ├── A3_1.fq.gz -> ../../../rawdata2/S2/A3_1.fq.gz
             └── A3_2.fq.gz -> ../../../rawdata2/S2/A3_2.fq.gz


easy_qsub and cluster_files is a single script written in Python using standard library. It's Python 2/3 compatible, version 2.7 or later.

You can simply save the script easy_qsub and cluster_files to directory included in environment PATH, e.g /usr/local/bin.


git clone
cd easy_qsub
sudo copy easy_qsub cluster_files /usr/local/bin



usage: easy_qsub [-h] [-lp | -ls] [-N NAME] [-n NCPUS] [-m MEM] [-q QUEUE]
                 [-w WALLTIME] [-t TEMPLATE] [-o OUTFILE] [-v]
                 command [files [files ...]]

Easily submitting PBS jobs with script template. Multiple input files

positional arguments:
  command               command to submit
  files                 input files

optional arguments:
  -h, --help            show this help message and exit
  -lp, --local_p        run commands locally, parallelly
  -ls, --local_s        run commands locally, serially
  -N NAME, --name NAME  job name
  -n NCPUS, --ncpus NCPUS
                        cpu number [logical cpu number]
  -m MEM, --mem MEM     memory [5gb]
  -q QUEUE, --queue QUEUE
                        queue [batch]
  -w WALLTIME, --walltime WALLTIME
                        walltime [30:00:00:00]
  -t TEMPLATE, --template TEMPLATE
                        script template
  -o OUTFILE, --outfile OUTFILE
                        output script
  -v, --verbose         verbosely print information. -vv for just printing
                        command not creating scripts and submitting jobs

Note: if "{}" appears in a command, it will be replaced with the current
filename. More format supported: "{%}" for basename, "{^suffix}" for clipping
"suffix", "{%^suffix}" for clipping suffix from basename. See more:


usage: cluster_files [-h] [-o OUTDIR] [-p PATTERN] [-k] [-m] [-f] indir

clustering files by regular expression [V3.0]

positional arguments:
  indir                 source directory

optional arguments:
  -h, --help            show this help message and exit
  -o OUTDIR, --outdir OUTDIR
                        out directory [<indir>.cluster]
  -p PATTERN, --pattern PATTERN
                        pattern (regular expression) of files in indir. if not
                        given, it will be the longest common substring of the
                        files. GROUP (parenthese) should be in the regular
                        expression. Captured group will be the cluster name.
                        e.g. "(.+?)_\d\.fq\.gz"
  -k, --keep            keep original dir structure
  -m, --mv              moving files instead of creating symbolic links
  -f, --force           force file overwriting, i.e. deleting existed out


Copyright (c) 2015-2017, Wei Shen (

MIT License


Easily submitting multiple PBS jobs or running local jobs in parallel. Multiple input files supported.








No releases published


No packages published