# Computational Workflows for biomedical data

Welcome to the course Computational Workflows for Biomedical Data. Over the next two weeks, you will learn how to leverage nf-core pipelines to analyze biomedical data and gain hands-on experience in creating your own pipelines, with a strong emphasis on Nextflow and nf-core.

Course Structure:

- Week 1: You will use a variety of nf-core pipelines to analyze a publicly available biomedical study.
- Week 2: We will shift focus to learning the basics of Nextflow, enabling you to design and implement your own computational workflows.<br>
- Final Project: The last couple of days, you will apply your knowledge to create a custom pipeline for analyzing biomedical data using Nextflow and the nf-core template.

## Basics

If you have not installed all required software, please do so now asap!


If you already installed all software, please go on and start answering the questions in this notebook. If you have any questions, don't hesitate to approach us.

1. What is nf-core?

Nf-core is a community that provides Nextflow analysis pipelines, that are developed mostly for bioinformatics use cases. The pipelines meet certain standards, like being well documented, having stable releases and being portable and reproducible. 


Ewels, P.A., Peltzer, A., Fillinger, S. et al. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol 38, 276–278 (2020). https://doi.org/10.1038/s41587-020-0439-x<br>
Computational workflow seminar 2024: Nf-core and workflows  (seminar slides)

2. How many pipelines are there currently in nf-core?

Currently there are 112 pipelines in nf-core.

[https://nf-co.re/pipelines/](https://nf-co.re/pipelines/) (30.09.2024)

3. Are there any non-bioinformatic pipelines in nf-core?

Yes, for example the 'meerpipe'-pipeline is an astronomy pipeline. 

[https://nf-co.re/pipelines/](https://nf-co.re/pipelines/) (30.09.2024)

4. Let's go back a couple of steps. What is a pipeline and what do we use it for?

A pipeline consists of multiple steps, where the output of one step gets piped into the next step. The steps follow a defined order. We use pipelines to automate processes.

Computational workflow seminar 2024: Nf-core and workflows  (seminar slides)

5. Why do you think nf-core adheres to strict guidelines?

With strict guidelines, it can be ensured that everyone can understand the work of others, everything is compatible and the finished product can be used easily. This is particularly important when many people work together and the finished tool is used by a broad range of people, which is the case in nf-core.

6. What are the main features of nf-core pipelines?

The main features of nf-core pipelines are their documentation, continuous integration testing, stable releases, packaged software, portability and their reproducibility, as well as that they are cloud-ready. 

Computational workflow seminar 2024: Nf-core and workflows  (seminar slides)

## Let's start using the pipelines

1. Find the nf-core pipeline used to measure differential abundance of genes

The nf-core pipeline used to measure differential abundance is nf-core/differentialabundance.
[https://nf-co.re/pipelines/](https://nf-co.re/pipelines/) (30.09.2024)

In [1]:
# run the pipeline in a cell 
# to run bash in jupyter notebooks, simply use ! before the command
# e.g.

!pwd


# For the tasks in the first week, please use the command line to run your commands and simply paste the commands you used in the respective cells!


/home/jana/UNI/Master/IISemester/compworkflows


In [16]:
# run the pipeline in the test profile using docker containers
# make sure to specify the version you want to use (use the latest one)


!NXF_VER=23.10.0 nextflow run nf-core/differentialabundance -profile test,docker --outdir DAY1_RUN1/

[sudo] Passwort für jana: [33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.0
Launching `https://github.com/nf-core/differentialabundance` [suspicious_ampere] DSL2 - revision: 3dd360fed0 [master]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [

In [17]:
# repeat the run. What did change?
!NXF_VER=23.10.0 nextflow run nf-core/differentialabundance -profile test,docker --outdir DAY1_RUN1/

[sudo] Passwort für jana: [33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.0
Launching `https://github.com/nf-core/differentialabundance` [angry_snyder] DSL2 - revision: 3dd360fed0 [master]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32m

The runtime of the second run is a lot shorter.

In [5]:
# now set -resume to the command. What did change?
!NXF_VER=23.10.0 nextflow run nf-core/differentialabundance -profile test,docker --outdir DAY1_RUN1/ -resume

[sudo] Passwort für jana: [33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.0
Launching `https://github.com/nf-core/differentialabundance` [kickass_gates] DSL2 - revision: 3dd360fed0 [master]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32

The runtime of the last run is a lot shorter than the runtimes of the first two runs. 

Check out the current directory. Next to the outdir you specified, what else has changed?

There are the additional folders work, .nextflow and null. There are also some additional log files.

In [10]:
# delete the work directory and run the pipeline again using -resume. What did change?
!NXF_VER=23.10.0 nextflow run nf-core/differentialabundance -profile test,docker --outdir DAY1_RUN1/ -resume

[sudo] Passwort für jana: [33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.0
Launching `https://github.com/nf-core/differentialabundance` [maniac_joliot] DSL2 - revision: 3dd360fed0 [master]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32

What changed?

The runtime of the run is similar to the second run.

## Lets look at the results

### What is differential abundance analysis?


With differential abundance analysis the difference in the taxonomic composition between two samples gets analyzed.

Cappellato M, Baruzzo G, Di Camillo B. Investigating differential abundance methods in microbiome data: A benchmark study. PLoS Comput Biol. 2022 Sep 8;18(9):e1010467. doi: 10.1371/journal.pcbi.1010467. PMID: 36074761; PMCID: PMC9488820.

Give the most important plots from the report:<br>
![pca2d.png](attachment:pca2d.png)

![volcano.png](attachment:volcano.png)![volcano-2.png](attachment:volcano-2.png)

![sample_dendrogram.png](attachment:sample_dendrogram.png)