# Snakemake tutorial

In this tutorial, we will learn how to operate snakemake to create workflows. 

## Objectives
1. Basic understanding how dependencies between files is used in snakemake
2. Execute snakemake on the commandline
3. Being able to understand why and how map-reduce parallelism is pertinent

## 1. Hello, snakemake!
Snakemake executes workflows which consists of multiple rules. Each rule is a unit/step in the data analysis. You can think of a typical data analysis workflow:
1. Preprocessing of the dataset
2. Data cleansing and transforms
3. Analyze the data (compute metrics, training models)
4. Evaluate the results (calculate statistics, cross-validation)
5. Plot the results

We will more or less adhere to this data analysis pipeline. Let's execute our very first rule:

In [2]:
! snakemake hello

[33mBuilding DAG of jobs...[0m
[33mUsing shell: /usr/local/bin/bash[0m
[33mProvided cores: 4[0m
[33mRules claiming more threads will be scaled down.[0m
[33mJob counts:
	count	jobs
	1	hello
	1[0m
[32m[0m
[32m[Fri Feb 14 16:35:58 2020][0m
[32mrule hello:
    jobid: 0[0m
[32m[0m
[33mJob counts:
	count	jobs
	1	hello
	1[0m
hello world
[32m[Fri Feb 14 16:35:58 2020][0m
[32mFinished job 0.[0m
[32m1 of 1 steps (100%) done[0m
[33mComplete log: /Users/mk21womu/code/snakemake-tutorial/.snakemake/log/2020-02-14T163558.090656.snakemake.log[0m


Great! This worked well! The rule outputted ```hello world```, such a classic thing to do.

Next, let's look up which rules which rules exist in this tutorial:

In [4]:
! snakemake --list

[32mall[0m
[32mgenerate_data[0m
[32mchunk_dataset[0m
[32madd_country[0m
[32mmerge_results[0m
[32mplot_results[0m
[32mhello[0m
[32mclean[0m


Intriguing, the order of the rules indicate that
1. we first generate data, 
2. chunk the data in multiple pieces, 
3. apply a transform by add a country for each observation, 
4. merge the results
5. and finally plot the results.

Firstly, let's wipe the data and results from previous snakemake runs.

In [3]:
! snakemake clean

[33mBuilding DAG of jobs...[0m
[33mUsing shell: /usr/local/bin/bash[0m
[33mProvided cores: 4[0m
[33mRules claiming more threads will be scaled down.[0m
[33mJob counts:
	count	jobs
	1	clean
	1[0m
[32m[0m
[32m[Fri Feb 14 16:36:10 2020][0m
[32mrule clean:
    jobid: 0[0m
[32m[0m
[33mJob counts:
	count	jobs
	1	clean
	1[0m
[32m[Fri Feb 14 16:36:10 2020][0m
[32mFinished job 0.[0m
[32m1 of 1 steps (100%) done[0m
[33mComplete log: /Users/mk21womu/code/snakemake-tutorial/.snakemake/log/2020-02-14T163610.434969.snakemake.log[0m
