---
title: Scikick
abstract: "Collections of data analysis notebooks often lose coherence due to the complex and branching nature of investigations. Scikick is a command line utility for managing ensembles of computational notebooks developed throughout a project by providing simple commands for workflow configuration, report generation, and state management.    "
---





# Preface: Notebook-Centric Workflows 

A thorough data analysis will involve multiple computational notebooks (*e.g.,* in [Rmarkdown](https://rmarkdown.rstudio.com/),  [Jupyter](https://jupyter.org/), or plain scripts). 
Consider this two stage data analysis where `QC.Rmd` provides a cleaned dataset 
for `model.Rmd` to perform modeling:

<pre>
|-- input/raw_data.csv
|-- code
<b>│   |-- QC.Rmd</b>
<b>│   |-- model.Rmd</b>
|-- output/QC/QC_data.csv
|-- report/out_html
|   |-- QC.html
|   |-- model.html
</pre>

Each of these notebooks may be internally complex, but the essence of this workflow is:

**`QC.Rmd` must run before `model.Rmd`**

This simple definition can be applied to:

- Re-execute the notebook collection in the correct order.
- Avoid unnecessary execution of `QC.Rmd` when only `model.Rmd` changes.
- Build a shareable report from the rendered notebooks.
- Collect relevant execution logs.

These features are key to the use of notebooks for complex analyses, however, 
**too much configuration is currently required to accomplish these goals**.
To remain focused on an investigation, tools 
are needed to streamline the organization of notebook collections.

# Scikick

*Scikick* is a command-line-tool for connecting and executing related data analyses 
 with a few simple commands to generate cohesive investigative reports and 
 ensure future reproducibility. 

![Figure 1 (reference to be added upon availability).](../../../source/figure1.png)

Common useful features for *ad hoc* data analysis are managed through Scikick:

 - Preset methods for executing a variety of notebook formats to markdown output 
 - Awareness of up-to-date results
 - Website generation with automated navigation based on workflow definition
 - Collection of page metadata (session info, page runtime, git history)
 - Simple ordering of notebook executions (via user-defined definition)
 - Defining notebook imports (*e.g.* importing functions from source file)

These features allow for easy development of transparent data analysis repositories.

Commands are inspired by git for configuring the workflow: `sk init`, `sk add`, `sk status`, `sk del`, `sk mv`.

Scikick currently contains methods for executing `.R`, `.Rmd`, `.ipynb` (experimental), and `.md` (simple copy) to `.md` output pages. `.md` files are then compiled into a website (currently with `rmarkdown::render_site`).

[See the output website for an implementation of a single cell transcriptomic analysis.](../../single-cell_analysis/report/out_html/index.html)

## Installation

|**Requirements**                                     |                     **Recommended**                                         |
|-----------------------------------------------------|-----------------------------------------------------------------------------|
|python3 (>=3.6)                                      | [git >= 2.0](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) |
|R + packages `install.packages(c("rmarkdown", "knitr", "yaml","git2r"))`   | [singularity >= 2.4](http://singularity.lbl.gov/install-linux)  |
|[pandoc > 2.0](https://pandoc.org/installing.html)   | [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/)   |
|                                                     | [GraphViz (for project maps)](https://graphviz.org/download/) |

With the requirements above installed, Scikick can be installed using pip:

```
pip install scikick
```

# Getting Started

- Follow the short ["hello world"](hello_world.html) usage of Scikick.

- Execute the demo project in a terminal with `sk init --demo`. This will walk through a short demonstration which executes basic Scikick commands and generates a project for inspection. 

- Read a longer realistic usage of [Scikick for single cell transcriptomics](SCRNA_walkthrough.html).

- Read about the [core design](core_design.html) of Scikick.

# Reading a Scikick Report

- Pages each correspond to a computational notebook. 
- Notebooks were configured to execute in a specific order. 
- The "Project Map" at the bottom of each page illustrates the order in which notebooks were executed (the below map shows no required order of execution of tutorial notebooks).