# Command interface

DSC2 implements DSC by executing DSC script and annotation via command line program `dsc`:
## Help message

In [1]:
! dsc -h

usage: dsc [-h] [--version] [-v {0,1,2,3,4}] [-j N] [-b str] [-f]
           [--target str] [-x DSC script] [--sequence str [str ...]]
           [--seeds values [values ...]] [--recover levels] [--ignore-errors]
           [--clean [str [str ...]]] [--host str] [-a DSC Annotation]
           [-e block:variable [block:variable ...]] [--tags str [str ...]]
           [-o str] [--distribute [files [files ...]]]

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v {0,1,2,3,4}, --verbosity {0,1,2,3,4}
                        trace (4) information. Default to 2.
  -j N                  Number of maximum concurrent processes.
  -b str                Benchmark name. Will overwrite "DSC::run" in DSC
                        configuration file.
  -f                    Force re-run -x / -e commands from scratch.
  --target str          The ultimate target of a DSC benchmark is the name of
                 

## Global options
*  `-j` specifies number of CPU threads to use. You may run DSC in parallel on a single desktop, or on distributed computer system that runs `Redis` server / worker (FIXME: link to tutorial), in which case `--host` will have to be specified.
*  `-v` controls verbosity level. Default level is 2. If you encounter error messages you can increase the verbosity level to output more information to diagnose the cause.
*  `-b` changes the default output filename / directory specified in the DSC script.
*  `-f` forces executing DSC or extracting DSC results without using existing cache. During a DSC execution, cache files of intermediate output and parameters are saved so that future re-runs of the same computation will be skipped. The `-f` option forces DSC to ignore existing cache and run the entire procedure from scratch.
* `--target` by default, is the name of the last DSC block if the DSC procedure only runs one sequence. Otherwise the name of the last DSC block has to be provided when `-a` or `-e` switches are turned on.


## DSC Execution options
* `-x` specifies the DSC configuration script.
* `--sequence` specifies which particular DSC sequence to run. See this [section](DSC_Execution.html#Load-sequence-from-command-line) for an example.
*  `--recover` executes DSC using existing output files without checking for their properties. This is useful when files are migrated from one computational system to another: `--recover` prevents DSC from re-running due to change of file properties (mostly absoluate path to file) in this case. This option takes parameters 1 or 2. 1 will re-run the benchmark ignoring existing output; 2 will attempt to recover the output metadata without re-running the benchmark at all (this is useful when the benchmark is not complete but a preview of results is desired).
* `--seeds` is useful in exploratory analysis and debug. One can use `--seeds 1` or `--seeds 1 2` to overwrite all `seed` properties defined in the DSC file, thus only executing a small number of replicates, for testing purpose.
* `--ignore-errors` is a flag which, when added, will ignore errors from user provided scripts thus keep the benchmark running. There are often situations when we use other people's software packages that produces errors in random corner cases that will interrupt the benchmark from moving forward. It is suggested that one should develop the DSC benchmark without adding this flag, and use this flag to run the benchmark in large scale when all codes are gauranteed to work in most situations. Instead of throwing an error, DSC will **store the problematic chunk of code that resulted in an error to the output, creating a fake output file that keeps DSC moving**. These faked output will result in a chain reaction of failure in every other step that depends on them. In the end the `dsc -e` option will try to deal with this situation by extracting values whenever available, otherwise using the problematic file name in place of the values. In this way the users can either filter out these bad output as if they are missing values, or track those files to examine what goes wrong, by executing them as plain text script.
*  `--clean` will remove output from specified DSC sequences / steps, instead of executing them.

## DSC annotation
These options will annotate DSC by applying "tags" to DSC results. This is a pre-processing step to DSC results extraction

* `-a` specifies annotation files. See this [section](DSC_Annotation.html) for an example annotation file.

## DSC results extraction
These options will extract annotated DSC to a separate file for further analysis.

* `-e` specifies the name of variable to extract.
* `--tags` annotations to extract. It is possible to extract intersection of annotations using the `&&` operator. One can also rename extracted fields using `=` operator. For example, `--tags "case1 = small_sample && large_features && elasticnet"`.
* `-o` specifies ultimate output data file name.

## DSC release
The `--distribute` option will bundle the DSC benchmark into a tarball that can be uploaded to `shinydsc` for data query and visualization, or be transfered to other computational environment. DSC benchmarks should be released with this command.

* `--distribute` when used without any argument it will pack project meta information, DSC configuration and annotation script (if applicable), and benchmark output into a tarball. This tarball contains enough information to be used with `shinydsc`. To port the complete benchmark, one should specify additional files such as computational scripts and data used, eg, `--distribute /path/to/scripts`. 
