# Computational Workflows for biomedical data

Welcome to the course Computational Workflows for Biomedical Data. Over the next two weeks, you will learn how to leverage nf-core pipelines to analyze biomedical data and gain hands-on experience in creating your own pipelines, with a strong emphasis on Nextflow and nf-core.

Course Structure:

- Week 1: You will use a variety of nf-core pipelines to analyze a publicly available biomedical study.
- Week 2: We will shift focus to learning the basics of Nextflow, enabling you to design and implement your own computational workflows.<br>
- Final Project: The last couple of days, you will apply your knowledge to create a custom pipeline for analyzing biomedical data using Nextflow and the nf-core template.

## Basics

If you have not installed all required software, please do so now asap!


If you already installed all software, please go on and start answering the questions in this notebook. If you have any questions, don't hesitate to approach us.

1. What is nf-core?

A global community collaborating to build open-source Nextflow components and pipelines.
All nf-core code is community owned.
Everyone is welcome to use, contribute to, and help maintain nf-core.

Quelle: https://nf-co.re/about, https://nf-co.re/


2. How many pipelines are there currently in nf-core?

There are currently 139 pipelines available as part of nf-core.

Quelle: https://nf-co.re/pipelines/

3. Are there any non-bioinformatic pipelines in nf-core?

The most pipelines seem to be bioinformatic pipelines, however for example rangeland seem to be not bioinformatic in a classical way. It is a geographical best-practice analysis pipeline for remotely sensed imagery. The pipeline processes satellite imagery alongside auxiliary data in multiple steps to arrive at a set of trend files related to land-cover changes.

Quelle: https://nf-co.re/rangeland/1.0.0/

4. Let's go back a couple of steps. What is a pipeline and what do we use it for?

Pipeline

It is a set of ordered computational steps that take data as input, perform some specific analyses (e.g. alignment, quality control, ...), and produce results as output. Each step within a pipeline is usually reusable, and its ensures that they are connected in the correct order with the right dependencies and resources.

What do we use it for?

Instead of manually running each tool and handling intermediate files, the pipeline runs everything in the right order. With nf-core pipelines the version of each individual tool used/step done is version-controlled, tested, and containerized, so the same analysis can be reproduced anywhere.
They can be run on any laptop, cluster, or cloud without rewriting code and follow strict guidelines for best practices, making them reliable and easier to share.

Quelle: input lecture

5. Why do you think nf-core adheres to strict guidelines?

Nf-core follows strict guidelines because it wants all pipelines to follow the FAIR principles. It wants pipelines to be reproducible, consistent, maintainable (follow the same coding style), trustworthy (quality standards) and portable & scalable (as mentioned above).

QUelle: https://nf-co.re/docs/guidelines/pipelines/overview

6. What are the main features of nf-core pipelines?

Nf-core provides fully featured pipelines:

- Documentation covering installation, usage, and description of output files.
- Release of pipelines with tag to stable version.
- Open Source: licenced unter the MIT licence.
- CI Testing: Uses continuous-integration testing for changed made to pipelines.
- Pipelines are ultra-portable (can run everywhere: laptop, cluster, ...)
- Packaged Software: Also dependencies are downloaded and handeled automatically.

Quelle: https://nf-co.re/

## Let's start using the pipelines

1. Find the nf-core pipeline used to measure differential abundance of genes

nf-core/differentialabundance

In [None]:
# run the pipeline in a cell 
# to run bash in jupyter notebooks, simply use ! before the command
# e.g.

!pwd


# For the tasks in the first week, please use the command line to run your commands 
# and simply paste the commands you used in the respective cells!


/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_01


In [None]:
# run the pipeline in the test profile using docker containers
# make sure to specify the version you want to use (use the latest one)


!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,docker --outdir './results'
# --outdir specifies the output folder

# Other options one might want to use:
# --max_memory '16.GB'  # specify max memory
# --max_cpus '4'       # specify max cpus
# --max_time '2.h'     # specify max time
# --email '
# --reads 'data/*_{1,2}.fastq.gz'  # specify input files
# --design 'data/design.csv'       # specify design file
# --contrast 'condition A - condition B'  # specify contrast
# --save_reference  # save reference files (e.g. genome index files)
# --save_alignments  # save alignment files (e.g. bam files)
#
# sigularity can be used instead of docker if preferred
# 
# For more options, check the nf-core/differentialabundance documentation: https://nf-co.re/differentialabundance/usage

In [None]:
# repeat the run. What did change?

!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,docker --outdir './results'

# Runtime decreased from 9min 47s to 3min 28s minutes.
# Because docker pulls software that's packaged into a containers, it does not need to be installed again for a second run
# However, the pipeline still needs to download the test data again and re-run all steps (all steps are executed again)

In [None]:
# now set -resume to the command. What did change?

!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,docker --outdir './results' -resume

# -resume will use the files that were already downloaded and calculated in the previous runs
# So it will not download and calculate everything from scratch again
# It will only re-run steps that were not completed in the previous run (e.g. for minor changes in the command)
# Runtime is 1min 52s, which is much faster than the previous runtimes
# So -resume is very useful when re-running the same analysis with only small changes

Check out the current directory. Next to the outdir you specified, what else has changed?

- During the runs, folders are created within a work folder 
- also at the main directory netxflow.log-files were created for each run that was performed

In [None]:
# delete the work directory and run the pipeline again using -resume. What did change?
# Work folder was renamed to work_deleted

!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,docker --outdir './results' -resume

# It is again starting to download and calculate everything from scratch.
# The work file seems to include all the calculations needed to build on when using resume
# So when the work folder is deleted, resume cannot use any of the previous calculations

What changed?

- It is again starting to download and calculate everything from scratch.
- The work file seems to include all the calculations needed to build on when using resume

## Lets look at the results

### What is differential abundance analysis?

Give the most important plots from the report: