# Quick start tutorial

In this tutorial we will use DSC to compare methods implemented in [R](https://cran.r-project.org/) for location parameter estimations, based on this DSCR example ([R Markdown version](https://github.com/stephens999/dscr/blob/master/vignettes/one_sample_location.rmd) and [HTML version](dscr_one_sample_location.html)). Material used in this document can be found in [DSC2 vignettes repo](https://github.com/stephenslab/dsc2/tree/master/vignettes/one_sample_location).


## DSC Specification
The DSC problem is to assess location parameter estimation methods using simulation studies. We will simulate data under normal distribution and *t* distribution with 2 degrees of freedom (fixed parameter); then estimate the location parameter using mean and median, and finally compare the performance of estimators by computing the difference between the estimate and the underlying parameter. The problem is fully specified in DSC2 language below:

```
normal, t: rnorm.R, rt.R
    seed: R(1:10)
    n: 1000
    true_mean: 0, 1
    $x: x
    $true_mean: true_mean

mean, median: mean.R, median.R
    x: $x
    $mean: mean

mse: MSE.R
    mean_est: $mean
    true_mean: $true_mean
    $mse: mse

DSC:
    define:
      simulate: normal, t
      estimate: mean, median
    run: simulate * estimate * mse
    exec_path: R/scenarios, R/methods, R/scores
    output: dsc_result
```

All computational routines in this DSC are R scripts (each with 1 or 2 lines of code!), located in directories as specified in the `DSC::exec_path` property of the configuration file. Contents of these R scripts are:

```r
  ==> ../vignettes/one_sample_location/R/methods/mean.R <==
  mean = mean(x)
  
  ==> ../vignettes/one_sample_location/R/methods/median.R <==
  mean = median(x)
  
  ==> ../vignettes/one_sample_location/R/scores/MSE.R <==
  mse = (mean_est-true_mean)^2
  
  ==> ../vignettes/one_sample_location/R/scenarios/rt.R <==
  # produces n random numbers from t with df=2 and  with specified mean
  set.seed(seed)
  x=true_mean+rt(n,df=2)
  
  ==> ../vignettes/one_sample_location/R/scenarios/rnorm.R <==
  # produces n random numbers from normal with specified mean
  set.seed(seed)
  x=rnorm(n,mean=true_mean)
  
```

It is important to ensure the variable names match between R script and DSC files. For example the first syntax block involves computational routines `rnorm.R` and `rt.R`, both take parameters `n` and `true_mean` and generate module output `x` (the other output variable `true_mean` already exists as a parameter). The R script, `rnorm.R`, is ` x = rnorm(n, mean = true_mean)`, which uses parameters `n` and `true_mean` on the right hand side to produce `x` on the left hand side, as the output for module `normal`. The same holds for `rt.R` which is `x = true_mean + rt(n, df = 2)`. These module outputs are both `x` due to the way `rt.R` and `rnorm.R` is written so there is no need to create special aliases.

The `DSC::run` property reflects a typical DSC setup where `normal` and `t` create *simulate* under various settings, `mean` and `median` are both methods to *estimate*, that can be applied to simulated data and `mse` is a *score* that measures the performance of different methods. Therefore ensembles are created using `DSC::define` and are used to build a benchmark with `*` logic so that all possible combinations of modules can be expanded to various pipelines.

In [1]:
%cd ~/GIT/dsc2/vignettes/one_sample_location

/home/gaow/GIT/dsc2/vignettes/one_sample_location

## Run DSC
To execute the DSC on a computer using 30 CPU threads,

In [2]:
! dsc settings.dsc -c 30

[1;32mINFO: Checking R library dscrutils@stephenslab/dsc2/dscrutils ...[0m
INFO: DSC script exported to [32mdsc_result.html[0m
INFO: Constructing DSC from [32msettings.dsc[0m ...
INFO: Building execution graph & running DSC ...
DSC: 100%|██████████████████████████████████████| 11/11 [00:07<00:00,  1.24it/s]
INFO: Building DSC database ...
INFO: DSC complete!
INFO: Elapsed time [32m11.643[0m seconds.


In this example the results will be stored in folder `dsc_result/`. We will discuss these results later.

## Re-run DSC
DSC keeps track of completed tasks so that if the same module instance is re-executed it will skip the computation. For example if you rerun this command it will end quickly, because all computations are skipped:

In [3]:
! dsc settings.dsc -c 30

INFO: DSC script exported to [32mdsc_result.html[0m
INFO: Constructing DSC from [32msettings.dsc[0m ...
INFO: Building execution graph & running DSC ...
DSC: 100%|██████████████████████████████████████| 11/11 [00:02<00:00,  5.02it/s]
INFO: Building DSC database ...
INFO: DSC complete!
INFO: Elapsed time [32m4.423[0m seconds.


Notice the last line of output records elapsed time of ~4.4 seconds, compared to ~11 seconds in the first run. If you want to ignore existing result you can use the `--skip none` flag to force DSC rerun existing results.

In [4]:
! dsc settings.dsc -c 30 --skip none

INFO: DSC script exported to [32mdsc_result.html[0m
INFO: Constructing DSC from [32msettings.dsc[0m ...
INFO: Building execution graph & running DSC ...
DSC: 100%|██████████████████████████████████████| 11/11 [00:07<00:00,  1.22it/s]
INFO: Building DSC database ...
INFO: DSC complete!
INFO: Elapsed time [32m10.546[0m seconds.


## DSC script browser
DSC commands generates a script browser in HTML format. In this example it is [`dsc_result.html`](dsc_result.html) under your work directory. You can use a web browser to open it. This file contains the DSC configuration, executed pipelines as well as source code for each pipeline in the benchmark.

## DSC results
Results of this DSC is stored in the folder `dsc_result/`. It has numerous files for each module instance involved in the DSC benchmark. Please continue on the [next tutorial](Explore_Output.html) to extract and analyze the benchmark results.