Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse code

updated readme

  • Loading branch information...
commit 9beb3bb1276285d6cfea4ee5029a45b20f5eccba 1 parent 2c3d40b
Francesco Strozzi authored September 24, 2012

Showing 1 changed file with 34 additions and 4 deletions. Show diff stats Hide diff stats

  1. 38  README.md
38  README.md
Source Rendered
@@ -25,10 +25,8 @@ What is happening here is the following:
25 25
 
26 26
 * the ```-i``` options specifies the input files or, as in this case, the location where to find input files based on a typical wildcard expression. You can actually specify as many input files/locations as you need using a comma separated list.
27 27
 * the ```-n``` specify the job name
28  
-* the ```-c``` is the command line to be executed on the cluster / grid system. What BioGrid does is to fill in the ```<input1>```,```<input2>``` and ```<output>``` placeholders with the corresponding parameters passed on the command line. This is done for each input file (or each group of input files) and BioGrid will check if the ```<output>``` placeholder has an extension (like .sam, .out etc.) and will generate a unique output file name for each job. IMPORTANT: If no extension is specified for the ```<output>``` placeholder, BioGrid will assume the job will generate more than one output files and that those files will be saved into the folder specified by the "-o" option. Therefore it will manage the output as a whole directory, copying and/or removing the entire folder if "-r" and "-e" options are present (check the [Other options](https://github.com/fstrozzi/bioruby-grid#other-options) section to see what these options are expected to do).
29  
-
30  
-
31  
-* the ```-o``` set the location where output files for each job will be saved. Only provide the folder where you want to save the output file(s), BioGrid will take care of generating a unique file name for the output, if needed.
  28
+* the ```-c``` is the command line to be executed on the cluster / grid system. What BioGrid does is to fill in the ```<input1>```,```<input2>``` and ```<output>``` placeholders with the corresponding parameters passed on the command line. This is done for each input file (or each group of input files) and BioGrid will check if the ```<output>``` placeholder has an extension (like .sam, .out etc.) and will generate a unique output file name for each job. 
  29
+* the ```-o``` set the location where output files for each job will be saved. Only provide the folder where you want to save the output file(s), BioGrid will take care of generating a unique file name for the output, if needed. Check the [Output management](https://github.com/fstrozzi/bioruby-grid#output-management) for more details.
32 30
 * the ```-s``` is a key parameter to specify the granularity of the jobs, setting the number of input files (or group of files, when more than one input placeholder is present in the command line) to be used for each job. So, going back to the FastQ example, if -s 1 is specified, each job will be run with exactly one FastQ R1 file and one FastQ R2 file. This gives you a great power in deciding how to split the entire dataset analysis across multiple computing nodes.
33 31
 * the ```-p``` parameter indicates how many processes we want to use for each job. This number needs to match with the actual number of threads / processes that our command or tool will use for the analysis.
34 32
 
@@ -45,6 +43,38 @@ mkdir -p /data/Project_X/Sample_Y_mapping
45 43
 
46 44
 and this will be repeated for every input file, according to the -s parameter. So, in this case given that we have 2 input files for each command line and that we had 60 R1 and 60 R2 FastQ files and we have specified "-s 1", 60 different jobs will be created and submitted, each with a specific read pair to be processed by Bowtie.
47 45
 
  46
+Output management
  47
+-----------------
  48
+For each job, BioGrid will set an output name according to a UUID generated on the fly and the combination of the job name plus an incremental number. So a typical output file name will look like this:
  49
+
  50
+```shell
  51
+3cb0b800_Bowtie_mapping_001.bam
  52
+```
  53
+If no extension is specified for the ```<output>``` placeholder in the command line definition, BioGrid will assume the job will generate more than one output file and that those files will be saved into the folder specified by the "-o" option. Therefore it will manage the output as a whole directory, copying and/or removing the entire folder if "-r" and "-e" options are present (check the [Other options](https://github.com/fstrozzi/bioruby-grid#other-options) section to see what these options are expected to do).
  54
+The same rule for output name apply also in the case of an output folder and the final directory will look like this:
  55
+
  56
+```shell
  57
+3cb0b800_Bowtie_index/
  58
+```
  59
+
  60
+without the incremental number, which is only used for output files.
  61
+
  62
+If you want to do some [Advanced stuff](https://github.com/fstrozzi/bioruby-grid#advanced-stuff) and run parameters testing, the output names will be changed accordingly by BioGrid. So if I am running BioGrid to test some parameter ```-L``` for my favorite tool, and I am sampling it, with three different values, let's say 3, 7 and 10 the corresponding output files will be:
  63
+
  64
+```shell
  65
+9ec55d90_tophat_001-param:3.sam
  66
+9ec55d90_tophat_001-param:7.sam
  67
+9ec55d90_tophat_001-param:10.sam
  68
+```
  69
+
  70
+If you are using the ```--param``` options to test non-numerical parameters, the corresponding parameter value (or name) will be appended to the output file name in the same way:
  71
+
  72
+```shell
  73
+9ec55d90_tophat_001-param:--sensitive.sam
  74
+9ec55d90_tophat_001-param:--fast.sam
  75
+```
  76
+
  77
+
48 78
 Other options
49 79
 -------------
50 80
 

0 notes on commit 9beb3bb

Please sign in to comment.
Something went wrong with that request. Please try again.