# Computational Workflows for biomedical data

Welcome to the course Computational Workflows for Biomedical Data. Over the next two weeks, you will learn how to leverage nf-core pipelines to analyze biomedical data and gain hands-on experience in creating your own pipelines, with a strong emphasis on Nextflow and nf-core.

Course Structure:

- Week 1: You will use a variety of nf-core pipelines to analyze a publicly available biomedical study.
- Week 2: We will shift focus to learning the basics of Nextflow, enabling you to design and implement your own computational workflows.<br>
- Final Project: The last couple of days, you will apply your knowledge to create a custom pipeline for analyzing biomedical data using Nextflow and the nf-core template.

## Basics

If you have not installed all required software, please do so now asap!


If you already installed all software, please go on and start answering the questions in this notebook. If you have any questions, don't hesitate to approach us.

1. What is nf-core?

nf-core is a global community effort collaborating to build open-source Nextflow components and pipelines. [1]

[1]: The nf-core framework for community-curated bioinformatics pipelines. Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x


2. How many pipelines are there currently in nf-core?

139

3. Are there any non-bioinformatic pipelines in nf-core?

There are also non-bioinformatic pipelines in nf-core. rangeland [2] is a pipeline for processing satelite imaging data to estimate a trend in land-cover changes.

[2]: FORCE on Nextflow: Scalable Analysis of Earth Observation Data on Commodity Clusters. Lehmann, F., Frantz, D., Becker, S., Leser, U., Hostert, P. (2021). FORCE on Nextflow: Scalable Analysis of Earth Observation Data on Commodity Clusters. In CIKM Workshops.

4. Let's go back a couple of steps. What is a pipeline and what do we use it for?

A pipeline is a series of computational steps or execution of programs which is executed in a defined ordering. They are used to facilliate automated data analysis, enable reproducibility of research and a effiient handling of large data.

5. Why do you think nf-core adheres to strict guidelines?

The FAIR principles as guidelines for all nf-core pipelines ensure that these are free, all components are accessible, interoperable on different systems, and results are reproducible.

6. What are the main features of nf-core pipelines?

All nf-core pipelines:
- have a extensive documentation
- provide stable releases
- use only open source tools
- provide a continuous-integration testing for changes made in the pipeline code
- can be ran everywhere
- are packaged using Docker or others

To summarize, nf-core implements the FAIR principles.

## Let's start using the pipelines

1. Find the nf-core pipeline used to measure differential abundance of genes

differentialabundance [3]

[3]: WackerO, Jonathan Manning, Azedine Zoufir, nf-core bot, Alexander Peltzer, Cristina Tuñí i Domínguez, Dave Carlson, Steffen Möller, Marcel Ribeiro-Dantas, Harshil Patel, & James A. Fellows Yates. (2023). nf-core/differentialabundance: v1.4.0 - 2023-11-27 (1.4.0). Zenodo. https://doi.org/10.5281/zenodo.10209675

In [1]:
# run the pipeline in a cell 
# to run bash in jupyter notebooks, simply use ! before the command
# e.g.

!pwd


# For the tasks in the first week, please use the command line to run your commands and simply paste the commands you used in the respective cells!


/mnt/c/Users/NicolaiOswald/OneDrive - UT Cloud/Dokumente/Studium Tübingen/Computational Workflows/computational-workflows-2025/notebooks/day_01


In [None]:
# run the pipeline in the test profile using docker containers
# make sure to specify the version you want to use (use the latest one)


!nextflow run nf-core/differentialabundance \
    --outdir differentialabundance_test  \
    -profile test,docker


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Pulling nf-core/differentialabundance ...
 downloaded from https://github.com/nf-core/differentialabundance.git
Launching[35m `https://github.com/nf-core/differentialabundance` [0;2m[[0;1;36mzen_lorenz[0;2m] DSL2 - [36mrevision: [0;36m3dd360fed0 [master][m
[K
Downloading plugin nf-validation@1.1.3
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m------

In [None]:
# repeat the run. What did change?

!nextflow run nf-core/differentialabundance \
    --outdir differentialabundance_test  \
    -profile test,docker


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `https://github.com/nf-core/differentialabundance` [0;2m[[0;1;36madmiring_noether[0;2m] DSL2 - [36mrevision: [0;36m3dd360fed0 [master][m
[K
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32mmaster[0m
  [0;

The pipeline ran much faster than in the first run (2m 48s vs. 11m 4s)

In [None]:
# now set -resume to the command. What did change?

!nextflow run nf-core/differentialabundance \
    --outdir differentialabundance_test  \
    -profile test,docker \
    -resume


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `https://github.com/nf-core/differentialabundance` [0;2m[[0;1;36mspontaneous_shannon[0;2m] DSL2 - [36mrevision: [0;36m3dd360fed0 [master][m
[K
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32mmaster[0m
  

The runtime was even faster (46.5s). Usage of cached files is indicated in the output of the workflow.

Check out the current directory. Next to the outdir you specified, what else has changed?

- a `.log` file has been created for each run of the pipeline
- `work` directory with temporary files documenting all intermediate steps and progress of the pipeline
- `.nextflow` directory for internal Nextflow files and a history of the pipeline runs

The `-resume` flag tells nextflow to restart a pipeline that has not been executed properly. Cached results from the previous run are used instead of running all steps from the beginning.

In [None]:
# delete the work directory and run the pipeline again using -resume. What did change?

!rm -r work

!nextflow run nf-core/differentialabundance \
    --outdir differentialabundance_test  \
    -profile test,docker \
    -resume

rm: cannot remove 'work': No such file or directory

[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `https://github.com/nf-core/differentialabundance` [0;2m[[0;1;36mcondescending_saha[0;2m] DSL2 - [36mrevision: [0;36m3dd360fed0 [master][m
[K
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34m

What changed?

The runtime was 2m 59s. That's approximately the runtime of the rerun of the pipeline without the -resume flag.

## Lets look at the results

### What is differential abundance analysis?

Give the most important plots from the report:

![Volcano_plot](./figures/volcano.png)

Volcano plot visualizing the differentially expressed genes between the conditions.

![PCA_plot](./figures/pca2d.png)

Principal Component Analysis plot visualizing the PCA result. A clustering of the samples based on their profiles can be recognized. 