# Quick start tutorial

In this tutorial we will use DSC to compare methods implemented in [R](https://cran.r-project.org/) for location parameter estimations, based on this DSCR example ([R Markdown version](https://github.com/stephens999/dscr/blob/master/vignettes/one_sample_location.rmd) and [HTML version](dscr_one_sample_location.html)). Material used in this document can be found in [DSC2 vignettes repo](https://github.com/stephenslab/dsc2/tree/master/vignettes/one_sample_location).


## DSC Specification
The DSC problem is to assess location parameter estimation methods using simulation studies. We will simulate data under normal distribution and *t* distribution with 2 degrees of freedom; then estimate the location parameter using mean and median, and finally compare the performance of estimators using 2 loss functions: squared mean error and absolute mean error. The problem is fully specified in DSC2 language below:

```
normal: normal.R
  n: 100
  $data: x
  $true_mean: 0

t: t.R
  n: 100
  df: 2
  $data: x
  $true_mean: 3

mean: mean.R
  x: $data
  $est_mean: y

median: median.R
  x: $data
  $est_mean: y

sq_err: sq.R
  a: $est_mean
  b: $true_mean
  $error: e
 
abs_err: abs.R
  a: $est_mean
  b: $true_mean
  $error: e 
  
DSC:
    define:
      simulate: normal, t
      analyze: mean, median
      score: abs_err, sq_err
    run: simulate * analyze * score
    exec_path: R
    output: dsc_result
```

All computational routines in this DSC are R scripts, located in directories as specified in the `DSC::exec_path` property of the configuration file. Contents of these R scripts are:

```r
==> normal.R <==
x = rnorm(n,0,1)

==> t.R <==
x = 3+rt(n,df)

==> mean.R <==
y = mean(x)

==> median.R <==
y = median(x)

==> sq.R <==
e = (a-b)^2

==> abs.R <==
e = abs(a-b)  
```

It is important to ensure the variable names match between R script and DSC files. For example the first syntax block involves computational routine `normal.R`, which takes parameter `n` and generate output `x`. The R script, `normal.R`, is `x = rnorm(n,0,1)`, which uses parameters `n` on the right hand side to produce `x` on the left hand side, consistent with output specified for module `normal`. The same holds for all other modules.

The `DSC::run` property reflects a typical DSC setup where `normal` and `t` create *simulate* under various settings, `mean` and `median` are both methods to *analyze*, and `sq_err` and `abs_err` are *score*s that evaluate for performance. Therefore ensembles `simulate`, `analyze` and `score` are created using `DSC::define` and are used to build a benchmark with `*` logic to create various pipelines via combinations of modules.

In [1]:
%cd ~/GIT/dsc2/vignettes/one_sample_location

/home/gaow/GIT/dsc2/vignettes/one_sample_location

## Run DSC
To execute the DSC on a computer using 30 CPU threads,

In [2]:
! dsc settings.dsc -c 30

[1;32mINFO: Checking R library dscrutils@stephenslab/dsc2/dscrutils ...[0m
INFO: DSC script exported to [32mdsc_result.html[0m
INFO: Constructing DSC from [32msettings.dsc[0m ...
INFO: Building execution graph & running DSC ...
DSC: 100%|██████████████████████████████████████| 15/15 [00:03<00:00,  3.94it/s]
INFO: Building DSC database ...
INFO: DSC complete!
INFO: Elapsed time [32m6.928[0m seconds.


In this example the results will be stored in folder `dsc_result/`. We will discuss these results later.

## Re-run DSC
DSC keeps track of completed tasks so that if the same module instance is re-executed it will skip the computation. For example if you rerun this command it will end quickly, because all computations are skipped:

In [3]:
! dsc settings.dsc -c 30

INFO: DSC script exported to [32mdsc_result.html[0m
INFO: Constructing DSC from [32msettings.dsc[0m ...
INFO: Building execution graph & running DSC ...
DSC: 100%|██████████████████████████████████████| 15/15 [00:02<00:00,  7.09it/s]
INFO: Building DSC database ...
INFO: DSC complete!
INFO: Elapsed time [32m4.010[0m seconds.


Notice the last line of output records elapsed time of ~4.4 seconds, compared to ~11 seconds in the first run. If you want to ignore existing result you can use the `--skip none` flag to force DSC rerun existing results.

In [4]:
! dsc settings.dsc -c 30 --skip none

INFO: DSC script exported to [32mdsc_result.html[0m
INFO: Constructing DSC from [32msettings.dsc[0m ...
INFO: Building execution graph & running DSC ...
DSC: 100%|██████████████████████████████████████| 15/15 [00:03<00:00,  4.40it/s]
INFO: Building DSC database ...
INFO: DSC complete!
INFO: Elapsed time [32m5.787[0m seconds.


## DSC script browser
DSC commands generates a script browser in HTML format. In this example it is [`dsc_result.html`](dsc_result.html) under your work directory. You can use a web browser to open it. This file contains the DSC configuration, executed pipelines as well as source code for each pipeline in the benchmark.

## DSC results
Results of this DSC is stored in the folder `dsc_result/`. It has numerous files for each module instance involved in the DSC benchmark. Please continue on the [next tutorial](Explore_Output.html) to extract and analyze the benchmark results.