Skip to content

leahkemp/RezBaz2020_snakemake_workshop

Repository files navigation

Workflow languages – your foundation for accuracy and reproducibility in data analysis

This workshop is a part of RezBaz 2020: Pick n Mix
Date: Friday 27 November 2020
Time: 2:30pm - 3:30pm

Are you working with big data? Do you need to pass your data through various software? If you’ve ever been in this situation (as I have in a population genetics masters project), you would know it can become very difficult to maintain reproducibility and accuracy; wait, have I updated this output file? The more manual steps we do, the more human errors are inevitably introduced into our analysis, hampering accuracy and reproducibility.

Be lazy, the machine does it better.

Workflow languages automate your data analysis workflow . But this isn’t all, they ensure all your analysis logs are captured in an organised fashion, they explicitly outline the software (and exact software versions) used, the input and output files at each step. Lastly, when your data inevitably becomes big data, you can easily scale up from running your analysis on your laptop, to running your analysis on a high performance cluster (HPC) such as NeSi.

In this workshop, we will work through an introduction to Snakemake, a workflow language with its basis in the popular programming language, Python. This Workshop is intended for anyone who has several steps in their data analysis workflow, particularly when many different software are involved. Book here.

Workshop sections:

The workflows we will create:

Start the workshop!