# Computational Workflows for biomedical data

Welcome to the course Computational Workflows for Biomedical Data. Over the next two weeks, you will learn how to leverage nf-core pipelines to analyze biomedical data and gain hands-on experience in creating your own pipelines, with a strong emphasis on Nextflow and nf-core.

Course Structure:

- Week 1: You will use a variety of nf-core pipelines to analyze a publicly available biomedical study.
- Week 2: We will shift focus to learning the basics of Nextflow, enabling you to design and implement your own computational workflows.<br>
- Final Project: The last couple of days, you will apply your knowledge to create a custom pipeline for analyzing biomedical data using Nextflow and the nf-core template.

## Basics

If you have not installed all required software, please do so now asap!


If you already installed all software, please go on and start answering the questions in this notebook. If you have any questions, don't hesitate to approach us.

1. What is nf-core?

It is a global community collaborating to build open-source Nextflow components and pipelines

2. How many pipelines are there currently in nf-core?

There are 139 pipelines including released, under development and archieved

3. Are there any non-bioinformatic pipelines in nf-core?

There are a few other pipelines, including astronomy, finance and other topics.

4. Let's go back a couple of steps. What is a pipeline and what do we use it for?

A pipeline is an end-to-end sequence of digital processes used to collect, modify, and deliver data. It is typically reusable for different kinds of data making it efficient as not every time data needs to processed a new structure needs to be set up.

5. Why do you think nf-core adheres to strict guidelines?

The pipelines guarantee reproducibility, portability, consistency, quality control, and collaboration. 
This is so users can rely on these things when using the pipeline and don't need to enhance any features to publish it.

6. What are the main features of nf-core pipelines?

reproducibility, portability, consistency, quality control, customizable, CLI support, well-documented

## Let's start using the pipelines

1. Find the nf-core pipeline used to measure differential abundance of genes

In [1]:
# run the pipeline in a cell 
# to run bash in jupyter notebooks, simply use ! before the command
# e.g.

!pwd


# For the tasks in the first week, please use the command line to run your commands and simply paste the commands you used in the respective cells!


/home/lorena/loboehme1/notebooks/day_01


In [1]:
# run the pipeline in the test profile using docker containers
# make sure to specify the version you want to use (use the latest one)


!nextflow run nf-core/differentialabundance -profile test,docker --outdir diffabundance1 -r 1.5.0

^C


In [None]:
# repeat the run. What did change?


First it took 17 minutes and the second time it took only 7 minutes, this is most likely because the files were already downloaded and cached. Interestingly, the hashes in front of the steps changed.

Completed at: 29-Sep-2025 12:34:52

Duration    : 7m 19s

CPU hours   : 0.1

Succeeded   : 21

In [None]:
# now set -resume to the command. What did change?


this time, it actually said cached after almost every step except for one step where something was truly calculated. Hence, it finished very quickly. The hashes were exactly the same so the it took solutions/ calculations from the previous run. There was no print out of the times on the console of time.

Check out the current directory. Next to the outdir you specified, what else has changed?

There is a .nextflow folder and a work folder

In [None]:
# delete the work directory and run the pipeline again using -resume. What did change?


This time the files were not automatically cached and the hashes were different again. So there must be important information in this work folder which saves time for further analysis. When running this again, the work folder was recreated again.

Completed at: 29-Sep-2025 14:40:00

Duration    : 4m 2s

CPU hours   : 0.1

Succeeded   : 21

## Lets look at the results

### What is differential abundance analysis?

Give the most important plots from the report:

boxplot.png, sample_dendrogram.png