# Command interface

DSC implements 2 command programs, `dsc` and `dsc-query`, for executing DSC and extracting result from executed benchmarks, respectively.

## DSC main program

In [1]:
dsc -h

usage: dsc [--target str [str ...]] [--truncate] [--replicate N] [-o str]
           [-s option] [--touch] [--clean option] [-c N] [--ignore-errors]
           [-v {0,1,2,3,4}] [--host file] [--to-host dir [dir ...]]
           [--version] [-h]
           DSC script

positional arguments:
  DSC script            DSC script to execute.

Customized execution:
  --target str [str ...]
                        This argument can be used in two contexts: 1) When
                        used without "--clean" it overrides "DSC::run" in DSC
                        file. Input should be quoted string(s) defining one or
                        multiple valid DSC pipelines (multiple pipelines
                        should be separated by space). 2) When used along with
                        "--clean" it specifies one or more computational
                        modules, separated by space, whose output are to be
                        removed. Alternatively one can specify path(s) of
        

Here we elaborate on some of options we did not have space to elaborate on the interface:

*  `-v` controls verbosity level. Default level is 2 (recommended), which displays necessary runtime info and uses a progressbar to display DSC progress. This verbosity level is typically good enough to hide trivial information yet report back errors. When these prompts are not sufficient to fix problems, it is very likely you run into a software bug. It would be very helpful if you could reproduce the bug with increased verbosity level (eg `-v4`) and post the problem to a [github issue](https://github.com/stephenslab/dsc/issues).

* `--target`: this option takes multiple input. 
  * When used without `--clean` it overrides `DSC::run`, where DSC benchmark is defined. For example, benchmark in DSC file
  
      ```
      run: simulate * method * score
      ```
      
      can be re-defined with, for example `--target "simulate * method"` so that the pipeline `simulate * target` will be executed instead. Using this option, one can execute DSC bit by bit to debug, eg, `"simulate"`, then `"simulate * method"` and finally `"simulate * method * score"`.
  * When used with `--clean` it specifies the module(s) whose output are to be "clean"-ed up (defined by behavior specified in `--clean`). Therefore only names of modules or ensembles are valid input in this context, not pipelines for benchmark.
* `--skip`: unlike with `Make` / `Snakemake` that determines whether or not to re-execute based on time stamps, DSC creates HASH for modules that takes into consideration all input parameter, input and output module variables, and the content of the module script if applicable. When all these information agree with a previous execution it will by default skip those runs. This behavior can be changed by this `--skip` option. 
  * `--skip none` will re-execute everything: it will remove and ignore any existing output files. 
  * `--skip all` will not perform any module computations. It will only construct the execution meta-data and create a DSC database that one can query from. That is, `dsc <script> --skip all` followed by `dsc-query <output> -o` will help one understand the expected results from specified DSC benchmark. This can also be used as sanity check when benchmark is only partially completed. If you are uncertain about the scale of the benchmark, for example, it is recommanded to run DSC with `--skip all` and use `dsc-query <output> -o` to view the benchmark structure.
* `-c` configures parallel computing. Since DSC is designed to be executed in either local or remote computers, `-c` configures the number of CPU threads used when computing in local and number of jobs to be sent when computing on the remote computers, when used with `--host` option (which will override `max_running_jobs` in the configuration file). Default value is in fact set to using half of the local computers CPU threads. Thus the displayed default `4` CPUs in the documentation above is result of running `dsc -h` on a computer with a total of 8 CPU threads. This number should be specifically configured if one wants to use more (or less) computing power on a desktop; and should definitely be configured for remote job executions based on current queue status of the remote computing environment.
* `--ignore-errors` is a flag which, when added, will ignore errors from user provided scripts, thus keeps the benchmark running. There are often situations when we use other people's software packages that produces errors in random corner cases that will interrupt the benchmark from moving forward; sometimes one cannot `try ... catch` them. With this option, instead of throwing an error, DSC will **store the problematic chunk of code that resulted in an error to the output, creating a dummy output file that keeps DSC moving**. These dummy output will natually result in a chain reaction of failure in every other module that depends on them. But the benchmark will therefore complete, after which one can check the output and trace back the most upstream dummy output as source of errors.
* `--distribute`: **FIXME: removed since version 0.2.2 because of pending design decision on how DSC is to be shared. Future version may bring back this feature if our DSC server is properly configured and regularly maintained.** This option will bundle the DSC benchmark into a tarball that can be uploaded to DSC shiny server for data query and visualization, or be transfered to other computational environments. DSC benchmarks should be released and shared with this command. When used without any argument it will pack DSC configuration, project meta data and available output into a tarball. To port the complete benchmark, one should specify additional files such as computational scripts and data used, eg, `--distribute /path/to/scripts /path/to/data`. 

## DSC query program

This is a companion program to `dsc` that can be used to extract results from benchmark.

In [2]:
dsc-query -h

usage: dsc-query [-h] [--version] -o str [--limit N] [--title str]
                 [--description str [str ...]] [-t WHAT [WHAT ...]]
                 [-c WHERE [WHERE ...]] [-g G:A,B [G:A,B ...]]
                 [--language str] [--addon str [str ...]]
                 [--rds {omit,overwrite}] [-v {0,1,2,3}]
                 DSC output folder or a single output file

An internal command to extract meta-table for DSC results (requires 'sos-
essentials' package to use notebook output).

positional arguments:
  DSC output folder or a single output file

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -o str                Output notebook / data file name. In query
                        applications if file name ends with ".csv", ".ipynb"
                        or ".xlsx" then only data file will be saved as result
                        of query. Otherwise both data file in ".xlsx" format
 

### To query result

The main goal of `dsc-query` is to extract results from benchmark given conditions. Although `dsc-query` works as is, we are extending it to meet language specific demands by creating separate software packages that wraps this command and enhances it -- the approach differs from language to language and will be documented in language specific manner. Therefore we will not provide in-depth documentation to this program.

Currently supported languages specific libraries are:

- R: `dsc_query` function for [`dscrutils` package](https://github.com/stephenslab/dsc/tree/master/dscrutils).

### To dump single file from DSC benchmark to text

`dsc-query` can also be used to browse single benchmark output file in `rds` or `pkl` format. For example:

```
dsc-query dsc_result/simulate/data_1.pkl -o data.out
INFO: Loading database ...
INFO: Data dumped to text files data.out and data.out.debug.
```