Skip to content

Commit

Permalink
updated readme
Browse files Browse the repository at this point in the history
  • Loading branch information
fstrozzi committed Sep 19, 2012
1 parent 494ba9b commit 6c37175
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@ Utility to create and distribute jobs
Usage
=====

This utility is command line based tools built around the concept of a command line template that can be reused to generate tens, hundreds or thousands of different jobs to be sent on a queue system.
This utility is a command line based tools built around the concept of a template that can be reused to generate tens, hundreds or thousands of different jobs to be sent on a queue system.

It is particularly useful when dealing with BigData analysis (e.g. NGS data processing) on a distributed system.

The code for now supports only PBS queue systems, but can be easily expanded to account also for other queueing systems.

A typical example
Expand All @@ -25,7 +27,7 @@ What is happening here is the following:

* the "-i" options specifies the input files or, as in this case, the location where to find input files based on a typical wildcard expression. You can actually specify as many input files/locations as you need using a comma separated list.
* the "-n" specify the job name
* the "-c" is the command line to be executed on the cluster / grid system. What BioGrid does is to fill in the <input1>, <input2> and <output> placeholders with the corresponding parameters passed on the command lines. This is done for each input file and BioGrid will generate a unique output file name for each job.
* the "-c" is the command line to be executed on the cluster / grid system. What BioGrid does is to fill in the '<input1>', '<input2>' and '<output>' placeholders with the corresponding parameters passed on the command lines. This is done for each input file and BioGrid will generate a unique output file name for each job.
* the "-o" just specify the location where output files for each job will be saved
* the "-s" is a key parameter to specify the number of input files (or group files when more than one input is present in the command line) to be used for each job. So, going back to the FastQ example, if -s 1 is specified, each job will be run with exactly one FastQ R1 file and one FastQ R2 file. This gives you a great power to decide how to split the entire dataset analysis across multiple computing nodes.
* the "-p" parameter indicates how many processes we want to use for each job. This number needs to match with the actual number of threads / processes that our command or tool will use for the analysis.
Expand Down

0 comments on commit 6c37175

Please sign in to comment.