---
title: Core Scikick Architechture
---

<div hidden>

In [None]:
# Remove previous executions of this tutorial
rm -rf test
mkdir test

</div>

# The Main Objective

Scikick aims to improve projects with many related computational notebooks by allowing for simple expressions of dependence between the notebooks to be translated into the required execution patterns and a systematically organized report.

This requires Scikick to handle three main areas:

1. Dependence definitions
2. Notebook execution
3. Report compilation

# Suitable Projects for Scikick

Scikick is designed to enable loosely structured projects containing notebooks (which may not have been developed with any build system in mind) to rapidly adopt systematic execution patterns with minimal changes to analysis development practice. No special variables are defined within notebooks, therefore, notebooks can execute outside of Scikick for development work and then execute with `sk run` for validation.


# Dependence Definitions

Projects with multiple notebooks often have documentation which describes the order in which notebooks must be executed. Scikick provides a standardized minimal format for this definition in [YAML](https://yaml.org/) format for a both human- and machine-readable description of notebook execution order. The basic format is as follows:

```
analysis:
  - first_notebook.Rmd: []
  - second_notebook.Rmd: [first_notebook.Rmd]
```

Where the keys for each notebook, specify what must execute before it (e.g. above, first_notebook.Rmd must execute before second_notebook.Rmd).

# Workflow Management

Execution in Scikick is handled by a highly generic [snakemake](https://snakemake.readthedocs.io/en/stable/) workflow for executing computational notebooks. Snakemake is a workflow tool that is well suited to take in input files and executes a prespecified command to generate output files.

For example, in simplistic snakemake pseudocode, Scikick executes `.Rmd` input notebooks as follows:

```
rule execute_code:
    input: first_notebook.Rmd
    output: report/out_md/first_notebook.md
    R: knitr::knit(input = {input}, output = {output})
    
rule generate_html:
    input: report/out_md/first_notebook.md
    output: report/out_html/first_notebook.html
    R: rmarkdown::render({input},{output})
```

That is:

1. Notebooks are executed to produce a markdown file.
2. Markdown files are converted into a website (`.html` files).

Some supplementary rules support this process.

The above is specific to `.Rmd` files, however, similar rules exist for other file types. 

Furthermore, the rules implemented in Scikick specify dependent notebooks as follows:

<pre> <code>
rule execute_code:
    input: second_notebook.Rmd, <b> report/out_md/first_notebook.md </b>
    output: report/out_md/second_notebook.md
    R: ...
</code>
</pre>

Here, the `second_notebook.Rmd` requires the `first_notebook.Rmd` to have been executed (*i.e.* an up-to-date `report/out_md/first_notebook.md` output) prior to itself executing.

In order to avoid maintainenance of project-specific Snakefiles specifying each of these rules, Scikick accepts a simple configuration file which specifies which notebooks should be executed and in which order. 

Below, we will generate this configuration file using the Scikick CLI's [sk init](help.html#init).

In [None]:
# Go to an empty testing directory
cd test
# Get scikick.yml template
sk init -y

Taking a look at the configuration file descriptions:

In [None]:
cat scikick.yml

Keys in the `analysis` field will be executed using the Scikick snakemake workflows, while values to these `analysis` keys specify which other files the notebook should depend on (*i.e.* used as further `inputs` in the snakemake rule).

For further convenience, the Scikick CLI allows for manipulation of this configuration file to increase accesibility and avoid mistakes in its specification.

For example, we will use [sk add](help.html#add) to add a notebook to an analysis.

In [None]:
sk add notebook.Rmd

Added the notebook to the project under the `analysis` field.

In [None]:
cat scikick.yml

And a [sk status](help.html#status) command uses the workflow rules with snakemake to recognize that the notebook requires execution, taking this output from snakemake:

In [None]:
# Calling on snakemake with Scikick and passing arguments 
# to snakemake to get status output used by sk status
sk run -v -s -n -r --nocolor 

And parsing it into this summarized output:

In [None]:
sk status

# Notebook Execution

Running with [sk run](help.html#run) will execute the Scikick workflows for notebook execution and website building.

In [None]:
sk run

Scikick outputs short messages for each job that has been started (supplemental tasks are indented to highlight the main execution tasks). We can see that Scikick is executing the notebook and creating the final site. The homepage is created under `report/out_html/index.html`.

`sk: Executing code in *` tasks are executing notebooks with appropriate methods (e.g. [knitr](https://yihui.org/knitr/) for `Rmd` and [nbconvert](https://nbconvert.readthedocs.io/en/latest/) for `ipynb`). This is where code is being executed.

`sk:  Adding project map *` tasks are appending content to the outputs.

`sk:   Converting *` tasks are generating a website from the content (using R package rmarkdown::render_site which uses [pandoc](https://pandoc.org/)).

See the Scikick [snakemake rule definitions](https://github.com/matthewcarlucci/scikick/tree/master/scikick) source code for further details.

# Report Rendering

Scikick utilizes pandoc with Rmarkdown to build a website from the notebook outputs. Additional workflows create the necessary site files and converts the markdown files in the `out_md` directory to the final HTML outputs in `out_html`.
