---
title: Scikick for Single Cell Analysis
---

The following repository is an interactively executed demonstration of Scikick usage through an analysis of single cell transcriptomic (scRNAseq) data. 

This analysis is based on work from the [Orchestrating Single Cell Analysis](http://bioconductor.org/books/release/OSCA/) book.

The `notebooks` directory contains a series of notebooks which were developed to analyze scRNAseq datasets. Each notebook will be introduced and added to the Scikick project as if they were developed throughout a true project timeline.  

The final report generated by Scikick for this tutorial project can be seen [here](../../single-cell_analysis/report/out_html/index.html).

Links to `sk <command> --help` outputs are provided throughout this tutorial for review and convenience. It is recommended to first read at least the ["Hello World"](hello_world.html) usage of Scikick prior to reading this page. 

# Preparing the Software Environment

Essential screenshots of the report state are provided throughout this tutorial and there is no need to directly execute the commands. However, to interactively view the state of the report at each stage of the tutorial, commands must be executed interactively in-line with the text. Full execution of this notebook takes approximately 10 minutes and installation of the required software for the project can take up to a couple of hours.

Later in this tutorial, a demonstration will show how to execute this project using [Singularity](https://sylabs.io/docs/) with Scikick such that no project-specific dependencies require installation. 

If you are executing the tutorial commands interactively, you should ensure you have followed the setup instructions in this section. 

If you are reading only, continue to the [initialize scikick](#Initialize-Scikick) section.

## Executing Tutorial Commands

This tutorial utilizes a [bash kernel](https://pypi.org/project/bash_kernel/) to execute bash commands in Jupyter. Additionally, this tutorial was developed in [Jupyter Lab](https://jupyter.org/install); when viewing the report site throughout the tutorial, it may be necessary to click "trust html" to enable all site content to be properly displayed.

Commands can also simply be executed in a shell terminal and reports viewed by opening in any modern web browser.

## Install Scikick

Ensure you have a working installation of [Scikick](https://github.com/matthewcarlucci/scikick).

## Obtain Project Notebooks

To execute the code in this tutorial, download [Scikick's source code repository](https://github.com/matthewcarlucci/scikick) and navigate to the directory containing the analysis code and this Jupyter notebook found under `docs/scikick_documentation/`.

## Install Project Dependencies

The software used throughout the analysis is required for execution. This project uses [R >= 4.0](https://www.r-project.org/) for analysis. Dependencies for the analysis can be installed from the project root (*i.e.* `single-cell_analysis/`) in R with:

```r
# Install BiocManager and remotes prior to this command
BiocManager::install(remotes::local_package_deps(dependencies=TRUE), version="3.12")
```



# Initialize Scikick

In [None]:
# Starting from scikick source code docs/scikick_documentation (this notebook)
pwd

In [None]:
# Move into the scRNAseq project
cd single-cell_analysis

<div hidden>

In [None]:
# (this code cell is hidden by HTML tags) - Remove previous executions of this tutorial
if [ -f scikick.yml ]; then
    rm scikick.yml || echo "Not found"
    rm -rf report || echo "Not found"
    rm -rf output || echo "Not found"
    # If tutorial is executed some notebooks would have been moved
    mv notebooks/nestorowa/* notebooks/ || echo "No notebooks found"
fi

</div>

In the project root directory, the Scikick project is initiated with [sk init](help.html#init).

In [None]:
sk init -y

We will add some optional website styling to the configuration file (`scikick.yml`).

In [None]:
cat _site.yml 
cat _site.yml >> scikick.yml

## Adding to the Homepage

It is often useful to first start with writing some background on the project and to provide an overview of data sources, analysis goals, and the current state of the project. The `index.Rmd` notebook provides an overview of the scRNAseq analysis.

In [None]:
# Inspecting index.Rmd contents
cat index.Rmd

`index.Rmd` is added with [sk add](help.html#add).

In [None]:
sk add index.Rmd

`index.Rmd` is the only special file name in Scikick projects. It will be used for homepage content. We can see in the message above that Scikick recognized this.

An initial website will now be created with [sk run](help.html#run).

In [None]:
sk run

---

*Note: The console outputs seen above for project map, site layout, and html generation will be supressed throughout the rest of the tutorial by using `-q` with `sk run`.*

We can see the contents of `index.Rmd` were added to the homepage. 

[![](../../single-cell_analysis/imgs/homepage.png)](../../single-cell_analysis/report/out_html/index.html)

Now we are ready to start adding analysis to the project.


# Nestorowa Dataset Analysis

## Modular Notebooks

During data exploration it is often a good idea to develop analyses in separate notebooks when multiple stages of data transformation are being performed such that notebooks:

1. Remain focused on a single topic.
2. Each have manageable namespaces and resource usage for data objects. 
3. Maintain clear checkpoints in the state of main data objects.

To practice this modularized approach, notebooks in this tutorial are small and focus on specific tasks. Scikick provides tools to manage notebooks developed in this modular fashion. Modularization may be slightly exagerated in this project where a typical project may combine many of these steps into a single notebook at the user's discretion.

## Importing Data

The `notebooks/import.Rmd` notebook was developed to import the scRNAseq dataset from *Nestorowa et al 2016*. This notebook:

- Describes further details on where the data comes from.
- Makes a few adjustments to the data to make it easier to work with.
- Inspects the object after it has finished downloading.

The data is then saved to a file for later usage.

In [None]:
cat notebooks/import.Rmd

The notebook will now be added to the project.

In [None]:
sk add notebooks/import.Rmd

Checking the new state of the project with [sk status](help.html#status).

In [None]:
sk status

Scikick has determined that `import.Rmd` must be executed since it is missing its output report file and the `index.Rmd` only requires small changes to its final page output (to include a menu item for `import.Rmd`).

In [None]:
sk run

---

We can see that `import.Rmd` was added to the report site (under the navigation bar as "Import") with no additional configuration necessary.

[![](../../single-cell_analysis/imgs/import.png)](../../single-cell_analysis/report/out_html/notebooks/nestorowa/import.html)

## Quality Control

A notebook was developed which performs quality control on the data downloaded by the `import.Rmd` notebook. That is, `quality_control.Rmd` must be executed after `import.Rmd`.

In [None]:
cat notebooks/quality_control.Rmd

`sk add` with the `-d/--depends-on` flag is used to configure the execution order of these two notebooks. 

In [None]:
sk add notebooks/quality_control.Rmd --depends-on notebooks/import.Rmd

In [None]:
sk status

However, since `import.Rmd` has already run, only `quality_control.Rmd` requires execution.

In [None]:
sk run -q

[![](../../single-cell_analysis/imgs/quality_control.png)](../../single-cell_analysis/report/out_html/notebooks/nestorowa/quality_control.html)

## Inspecting the Project Map

With multiple notebooks present in the final report, the order of execution is now no longer immediately clear. Viewing the project maps generated by Scikick at the bottom of each page rectifys this by clearly outlining the connection made between `quality_control` and `import` (with `-d/--depends-on`). This map can also be used to navigate across pages of the report.

[![](../../single-cell_analysis/imgs/quality_control_project_map.png)](../../single-cell_analysis/report/out_html/notebooks/nestorowa/quality_control.html)

## Normalization 

A notebook is added for implementing normalization and variance modeling of the transcript counts for the samples which survived quality control in `quality_control.Rmd`.

In [None]:
cat notebooks/normalization.Rmd

In [None]:
sk add notebooks/normalization.Rmd -d notebooks/quality_control.Rmd

In [None]:
sk run -q

We have now obtained a meaningful dataset for exploring biological results.

## Further Exploration of the Nestorowa Data

Once the data has been cleaned and normalized into a meaningful format for interprettation, it is common to perform some exploration of the data. Some common scRNAseq data exploration tasks are performed in a `further_exploration.Rmd` notebook.

In [None]:
sk add notebooks/further_exploration.Rmd -d notebooks/normalization.Rmd

In [None]:
sk run -q

## Nestorowa Analysis Summary

We now have a complete analysis of the Nestorowa dataset. Each notebook's respective page is found in the Scikick report. The project map clearly shows the order of execution of these pages. 

[![](../../single-cell_analysis/imgs/nestorowa_final.png)](../../single-cell_analysis/report/out_html/notebooks/nestorowa/further_exploration.html)

With Scikick usage throughout this project thus far, we can now be confident that, when we remove the report directory, we are able to regenerate all results with `sk run`.

In [None]:
# E.g. We could remove the report and regenerate from scratch.
# This will be skipped to save computations.
# rm -rf report
# sk run -q

We can take a look at the files required to re-execute this project.

In [None]:
ls notebooks/*Rmd
ls scikick.yml

We can now come back and check on the state of this project at any time in the future to find that there are no pending updates and that the report in `report/out_html` represents a full execution of these notebooks.

In [None]:
# View notebooks, dependencies, and state
sk status -v

## Updating the Dataset

If we now update the original data import (e.g. modify `import.Rmd` to use a new version of the raw data), Scikick is aware of the state and relationship between multiple notebooks. 

In [None]:
# Using touch to simulate an update to import.Rmd
touch notebooks/import.Rmd
sk status

We see that `import` (`s--` indicating an update to "self") and all downstream notebooks (`-e-` indicating an update to an external dependency) must be re-executed.

In [None]:
sk run -q

# Additional Experiments

Scientific projects often develop as a series of experiments where analsyis is performed in stages. We will now add two similar parallel workflows for two new scRNAseq experiments.

## Reorganizing Content

As a project moves forward, more datasets may be added requiring reorganization of the project. Workflow configurations can be difficult to coordinate with file re-organization. Scikick provides features to accomodate this. The [sk mv](help.html#move) command enables dynamic reorganization of projects while attempting to minimize the need for re-execution. It does this by applying a standard shell `mv` command (or `git mv` if using `-g`), while also moving the cooresponding report files such that re-execution is not necessary.

For this project, we will use `sk mv` to migrate all analysis of the Nestorowa dataset above to a subfolder for this experiment such that new experiemnts can be added in a parallel fashion. 

In [None]:
mkdir notebooks/nestorowa
sk mv notebooks/*.Rmd notebooks/nestorowa

Note that no code requires re-execution.

In [None]:
sk status

## Add New Notebook Collections

A similar series of notebooks were developed for the two additional experiments.

In [None]:
sk add notebooks/grun/import.Rmd
sk add notebooks/grun/quality_control.Rmd -d notebooks/grun/import.Rmd
sk add notebooks/grun/normalization.Rmd -d notebooks/grun/quality_control.Rmd
sk add notebooks/grun/further_exploration.Rmd -d notebooks/grun/normalization.Rmd

sk add notebooks/paul/import.Rmd
sk add notebooks/paul/quality_control.Rmd -d notebooks/paul/import.Rmd
sk add notebooks/paul/normalization.Rmd -d notebooks/paul/quality_control.Rmd
sk add notebooks/paul/further_exploration.Rmd -d notebooks/paul/normalization.Rmd

## Merging Experiments

Finally, a set of notebooks are added which perform a combined analysis of the three datasets that were each prepared in parallel. This `merge.Rmd` stage depends on the results of each of the data preparations performed previously (*i.e.,* `quality_control.Rmd` and `normalization.Rmd` notebooks), however, it does not depend on the biological analysis that was performed for each dataset (*i.e.,* the analyses performed in `further_exploration.Rmd` notebooks are not used by `merge.Rmd`).

In [None]:
sk add notebooks/merged/merge.Rmd -d notebooks/grun/quality_control.Rmd -d notebooks/paul/quality_control.Rmd -d notebooks/nestorowa/normalization.Rmd
sk add notebooks/merged/combined_analysis.Rmd -d notebooks/merged/merge.Rmd

## Utilizing Parallelization

With a parallel series of notebooks like in the current state of this project, execution of notebooks in parallel can be utilized with a flag (`-j8`) passed to snakemake with the `-s` [sk run](help.html#run) flag.

The two additional experiments and merged analysis will now be executed.

In [None]:
sk run -q -s -j8

And a well organized final report site is generated. The project map in the report now features branching sets of analyses.

[![](../../single-cell_analysis/imgs/merged_project_map.png)](../../single-cell_analysis/report/out_html/notebooks/merged/combined_analaysis.html)

# Summary

This demonstration utilizes real datasets and real analyses to demonstrate how Scikick is used in a practical setting when adapting to new analysis additions. Use of Scikick in a less controlled setting (when the analysis is not predetermined) should present even further utility to maintain workflow connections and reporting capabilities.

# Appendix: Robust Archival

## Utilizing the Snakemake Backend

Scikick is implemented through snakemake workflows allowing for usage of many features of snakemake. Users familliar with snakemake usage may be able to take advantage of many flags while using Scikick. Some frequently used examples are provided here.

### Forced re-execution

Passing snakemake arguments with `-s` and utilizing the snakemake `-F` flag will force the entire project to execute from scratch. This can be a useful sanity check when time permits.

In [None]:
# This will be skipped to save computations.
# sk run -s -F

### Execution with Singularity

When readers attempt to reproduce this work or any other computational work, they may run into issues with software or data. In these cases, it is useful for the reader to see: 

1. That a fully reproducible archive exists (i.e. that the work is reproducible if the exact compute environment is used).
2. That, if needed, they could utilize this static software environment to reproduce the results.

The OSCA project maintains a docker image at [Docker Hub](https://hub.docker.com/r/bioconductor/orchestratingsinglecellanalysis). This image can first be assigned to the Scikick project with

In [None]:
# Using a fixed image tag for future reproducibility 
sk config --singularity docker://bioconductor/orchestratingsinglecellanalysis:RELEASE_3_12

Providing the flag `--use-singularity` to snakemake, will download the container and execute all notebooks within this container.

## Automated Re-execution (i.e. Continuous Analysis)

Configurations of singularity like the above effectively makes it possible to execute any Scikick data analysis project with only the core Scikick dependencies and no analysis software dependencies directly required.

This scRNAseq project additionally requires the image to have write access to the `/home/cache` directory for data downloads and so a writeable location must be provided for execution with the additional argument `--singularity-args`.  Additionally, the script is configured to distribute notebook executions across a SLURM cluster. A short `run.sh` script contains a call to Scikick with these necessary arguments for full execution.

In [None]:
cat run.sh

### Scikick Continuous Analysis with GitLab

Linking this execution with a continuous integration service implements a version of [Continuous Analysis](https://greenelab.github.io/continuous_analysis/) where the archived reports may be refered to at any time in the future knowing that they are reproducible. A template for such a configuration using GitLab CI is provided below:

In [None]:
cat .gitlab-ci.yml

The analysis codebase will now contain Scikick reports that are fully verified for reproducibility.