# Computational Workflows for biomedical data

Welcome to the course Computational Workflows for Biomedical Data. Over the next two weeks, you will learn how to leverage nf-core pipelines to analyze biomedical data and gain hands-on experience in creating your own pipelines, with a strong emphasis on Nextflow and nf-core.

Course Structure:

- Week 1: You will use a variety of nf-core pipelines to analyze a publicly available biomedical study.
- Week 2: We will shift focus to learning the basics of Nextflow, enabling you to design and implement your own computational workflows.<br>
- Final Project: The last couple of days, you will apply your knowledge to create a custom pipeline for analyzing biomedical data using Nextflow and the nf-core template.

## Basics

If you have not installed all required software, please do so now asap!


If you already installed all software, please go on and start answering the questions in this notebook. If you have any questions, don't hesitate to approach us.

1. What is nf-core?

nf-core is a community-driven effort that provides standardized guidelines and curated, pre-built workflows to improve productivity and ensure reproducibility in biomedical data analysis.

2. How many pipelines are there currently in nf-core?

139 pipelines (84 Released, 43 Under development, 12 Archived)

3. Are there any non-bioinformatic pipelines in nf-core?

No, there are none non-bioinformatic pipelines included in nf-core.

4. Let's go back a couple of steps. What is a pipeline and what do we use it for?

Pipelines enable reproducible execution of workflows by defining them as reusable code in a domain-specific language (DSL) based on Groovy/Java.

A process generally represents a single execution of a tool, which takes n inputs and produces m outputs. A workflow organizes multiple processes into a higher-level structure, while a pipeline can combine several workflows into a complete analysis.

Workflows can be reused across different pipelines (e.g., a workflow for quality control of FASTA files). Pipelines themselves can be shared, further enhancing reusability and supporting the execution of complete analysis workflows.

5. Why do you think nf-core adheres to strict guidelines?

nf-core pipelines are developed by an open-source community. Strict guidelines ensure consistent development and usage practices, enhancing both quality and reusability. Standardization further guarantees a uniform level of quality across the framework, improving reliability and reproducibility.

6. What are the main features of nf-core pipelines?

The main purpose of nf-core pipelines is to enhance reproducibility. They can be run on different machines and allow workload distribution across various computing environments. All pipelines are open source, so users can not only configure them by changing parameters but also modify the code and actively contribute to the community. Additionally, many pipeline parameters are standardized, and the pipelines themselves follow strict quality guidelines, with updates and improvements reviewed through community pull requests.

## Let's start using the pipelines

1. Find the nf-core pipeline used to measure differential abundance of genes

nf-core/differentialabundance

In [None]:
# run the pipeline in a cell 
# to run bash in jupyter notebooks, simply use ! before the command
# e.g.
!pwd

# fails due: gtf2featureAnnotation.R: command not found -> let´s use conda
!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test --outdir task_01

# fails due: Conda environment file does not exist: modules/nf-core/gunzip/environment.yml -> let´s use docker (next task)
!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,conda --outdir task_01

# For the tasks in the first week, please use the command line to run your commands and simply paste the commands you used in the respective cells!


In [None]:
# run the pipeline in the test profile using docker containers
# make sure to specify the version you want to use (use the latest one)

!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,docker --outdir task02

In [None]:
# repeat the run. What did change?
New log file is generated. The run is much faster as the containers are already downloaded and cached.
Output files are overwritten. However, the generated files are the same.

In [None]:
# now set -resume to the command. What did change?
!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,docker --outdir task02 -resume
Most of the steps are skipped as the results are already there.
Only the final steps are executed again and the log file is updated.

Check out the current directory. Next to the outdir you specified, what else has changed?

Running the pipeline generates a .nextflow directory (containing metadata, execution reports, and pipeline state), a work directory (containing intermediate results and temporary task files), and log files for each run.

# delete the work directory and run the pipeline again using -resume. What did change?


Again, all of the workflows were executed and not skipped by taking the results from the cache.

## Lets look at the results

### What is differential abundance analysis?

Differential abundance analysis compares the relative abundance of specific features across different groups to identify those showing significant changes (e.g., a gene whose expression differs markedly between healthy and diseased patients).

Give the most important plots from the report:

![volcano-plot](./plots/differential/treatment_mCherry_hND6_/png/volcano.png)

![volcano-plot](./plots/exploratory/treatment/png/pca2d.png)