# New DSC syntax and query demo

This demo is created for the purpose of discussing the new design of DSC syntax and query, along the lines of [this note](https://github.com/stephenslab/dsc-wiki/blob/master/development/finalized_terminology_and_extraction.md).

## The new syntax

The new syntax is no longer `yaml` compatible. We dropped the compatibility for more compact syntax. 

### Example for the mean estimation problem

[Link](https://github.com/stephenslab/dsc2/blob/master/vignettes/one_sample_location/settings.dsc)

** Module section (blocks not called `DSC`)**

* Removed `exec`, `input`, `output` tags
* `@` decoration to configure DSC level module options
  - `@RNG` for replicates
  - `@ALIAS` for swapping parameter names
  - `@FILTER` for parameter combination filtering
  - `@CONF` cluster configurations
  - `@{module}` for module specific parameters, for example
      ```
      normal, t: rnorm.R, rt.R
          n: 9
          @normal:
              true_mean: 0
          @t:
              true_mean: 1
       ```
* `$` symbol for pipeline variables (module output)

** Pipeline section (block named `DSC`) **
* `define`: where module essemble are defined
  - `simulate: normal, t`
  - `preprocess: (filter1, filter2) * normalize`

## Example for ash

[link](https://github.com/stephenslab/dsc2/blob/master/vignettes/ash/settings.dsc)

## Command interface

In [2]:
! dsc -h

usage: dsc [-h] [--version] [-o str] [--target str [str ...]]
           [--seed values [values ...]] [--recover option] [--remove option]
           [--host str] [-c N] [--ignore-errors] [-v {0,1,2,3,4}]
           DSC script

positional arguments:
  DSC script            DSC script to execute.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -o str                Benchmark output. It overwrites "DSC::run::output"
                        defined in configuration file. (default: None)
  --target str [str ...]
                        This argument can be used in two contexts: 1) When
                        used without "--remove" it specifies DSC sequences to
                        execute. It overwrites "DSC::run" defined in
                        configuration file. Multiple sequences are allowed.
                        Each input should be a quoted string defining a valid
                

We endeavor to keep command list short. And in most cases no additional arguments are needed.

### Command in action

First run:


In [4]:
! dsc settings.dsc

INFO: DSC script exported to [32mdsc_result.html[0m
INFO: Constructing DSC from [32msettings.dsc[0m ...
INFO: Building execution graph ...
INFO: DSC in progress ...
DSC: 100%|██████████████████████████████████████| 11/11 [00:26<00:00,  3.40s/it]
INFO: Building DSC database ...
INFO: DSC complete!
INFO: Elapsed time [32m34.336[0m seconds.


Second run:

In [5]:
! dsc settings.dsc

INFO: DSC script exported to [32mdsc_result.html[0m
INFO: Constructing DSC from [32msettings.dsc[0m ...
INFO: Building execution graph ...
INFO: DSC in progress ...
DSC: 100%|██████████████████████████████████████| 11/11 [00:02<00:00,  4.88it/s]
INFO: Building DSC database ...
INFO: DSC complete!
INFO: Elapsed time [32m4.509[0m seconds.


`ash` example run:

In [6]:
%cd ../ash
! echo -e "\nash example"
! dsc settings.dsc

/home/gaow/GIT/dsc2/vignettes/ash
INFO: Checking R library [32mstephens999/ashr[0m ...
INFO: DSC script exported to [32mdsc_result.html[0m
INFO: Constructing DSC from [32msettings.dsc[0m ...
INFO: Building execution graph ...
INFO: DSC in progress ...
DSC: 100%|████████████████████████████████████████| 5/5 [00:53<00:00, 11.19s/it]
INFO: Building DSC database ...
INFO: DSC complete!
INFO: Elapsed time [32m64.821[0m seconds.


In [8]:
! dsc settings.dsc

INFO: DSC script exported to [32mdsc_result.html[0m
INFO: Constructing DSC from [32msettings.dsc[0m ...
INFO: Building execution graph ...
INFO: DSC in progress ...
DSC: 100%|████████████████████████████████████████| 5/5 [00:01<00:00,  2.26it/s]
INFO: Building DSC database ...
INFO: DSC complete!
INFO: Elapsed time [32m4.518[0m seconds.


In [9]:
%cd ../one_sample_location

/home/gaow/GIT/dsc2/vignettes/one_sample_location

## The new query design

* Query syntax still mimics SQL but enhanced with dynamic grouping of tables
* Is not SQL compatible!
  * Supported operations:
    - `=, ==, >, <, >=, <=, !=`
  * Supported logic:
    - `AND`, `OR`
    - Or derived compound logic, eg `(((a AND b) OR c) AND d)`
* After deeper thinking into the problem, loading data into query result is disabled
  * Mainly due to R/Python data communication problems for complex objects
  * Both for loading information (matters to developers) and writing information (matters to users)
* Only a "meta" table is saved. Should be used as an internel command (or even never openly advertise it) for a companion R package.

### Command interface

In [1]:
! dsc-query -h

usage: dsc-query [-h] [--version] -o str [--limit N] [--title str]
                 [--description str [str ...]] [-t WHAT [WHAT ...]]
                 [-c WHERE [WHERE ...]] [-g G:A,B [G:A,B ...]]
                 [--language str] [--addon str [str ...]] [-v {0,1,2,3,4}]
                 DSC output folder

positional arguments:
  DSC output folder     Path to DSC output.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -o str                Output notebook / data file name. In query
                        applications if file name ends with ".rds" then only
                        data file will be saved as result of query. Otherwise
                        both data file and a notebook that displays the data
                        will be saved. (default: None)
  --limit N             Number of rows to display for tables. Default is to
                        display it for all rows (will r

The `-t`, `-c` and `-g` options are the core features.

### Overview of executed DSC

In [3]:
! dsc-query dsc_result -o Overview

INFO: Loading database ...
INFO: Exporting database ...
INFO: Export complete. You can use [32mjupyter notebook Overview.ipynb[0m to open it and run all cells, or run it from command line with [32mjupyter nbconvert --to notebook --execute Overview.ipynb[0m first, then use [32mjupyter notebook Overview.nbconvert.ipynb[0m to open it.


In [4]:
! jupyter nbconvert --to notebook --execute Overview.ipynb

[NbConvertApp] Converting notebook Overview.ipynb to notebook
[NbConvertApp] Executing notebook with kernel: sos
[NbConvertApp] Writing 47851 bytes to Overview.nbconvert.ipynb


The aim of [this notebook](../playground/Overview.html) is to show the structure of what benchmark files are generated and how they are connected to each other. 

### Query example: ungrouped tables

In [5]:
! dsc-query dsc_result -o Q1 \
    -t normal.n mean mse.score \
    -c "normal.true_mean = 1"

INFO: Loading database ...
INFO: Running queries ...
INFO: Query results saved to spread sheet [32mQ1.xlsx[0m
INFO: Export complete. You can use [32mjupyter notebook Q1.ipynb[0m to open it and run all cells, or run it from command line with [32mjupyter nbconvert --to notebook --execute Q1.ipynb[0m first, then use [32mjupyter notebook Q1.nbconvert.ipynb[0m to open it.


The output is a [spreadsheet](../playground/Q1.xlsx) of meta info (that can be loaded to `R` to process) as well as a [notebook](../playground/Q1.html) that loads the data in the spreed sheet for Jupyter users to get engaged right away. The notebook additonally contains an SQL query to help checking what's going on under the hood.

### Query example: grouped tables

In [6]:
! dsc-query dsc_result -o Q2 \
    -t simulate.n method mse.score \
    -c "simulate.true_mean = 1" \
    -g "simulate: normal, t" \
       "method: mean, median"

INFO: Loading database ...
INFO: Running queries ...
INFO: Query results saved to spread sheet [32mQ2.xlsx[0m
INFO: Export complete. You can use [32mjupyter notebook Q2.ipynb[0m to open it and run all cells, or run it from command line with [32mjupyter nbconvert --to notebook --execute Q2.ipynb[0m first, then use [32mjupyter notebook Q2.nbconvert.ipynb[0m to open it.


Here, `normal` and `t` are grouped to `simulate`; `mean` and `median` are grouped to `method`. Therefore the [output spreadsheet](../playground/Q2.xlsx) is a bit different from the one generated above -- an additional column is needed to annotate the merged columns. 

Here is [under the hood](../playground/Q2.html).