# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [1]:
!docker info

Client:
 Version:    28.4.0
 Context:    default
 Debug Mode: false
 Plugins:
  ai: Docker AI Agent - Ask Gordon (Docker Inc.)
    Version:  v1.9.11
    Path:     /usr/local/lib/docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.28.0-desktop.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-buildx
  cloud: Docker Cloud (Docker Inc.)
    Version:  v0.4.29
    Path:     /usr/local/lib/docker/cli-plugins/docker-cloud
  compose: Docker Compose (Docker Inc.)
    Version:  v2.39.4-desktop.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.42
    Path:     /usr/local/lib/docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Docker Inc.)
    Version:  v0.2.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-desktop
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.31
    Path:     /usr/local/lib/docker/cli-plugins/docker

### What is a container?

A container is a lightweight, standalone package that includes everything needed to run a piece of software: the code, runtime, system tools, libraries, and settings. It can be described as a self-contained mini-computer that runs the same way on any host system, whether it’s a laptop, a server, or the cloud.  

Quelle: https://www.docker.com/resources/what-container/

### Why do we use containers?

We use containers because they solve a lot of common problems in software development and deployment:

- Consistency Across Environments: “It works on my machine” problems disappear.

- Isolation: This prevents conflicts between software, like two apps needing different versions of the same library.

- Portability: Containers can run anywhere Docker (or another container runtime) is installed.

- Efficiency: Containers are lightweight compared to virtual machines because they share the host OS kernel.

- Scalability: Containers make it easy to run multiple instances of an app to handle more users or traffic.

- Reproducibility: You can create an exact snapshot of an app and its environment.

In short: containers make software reliable, portable, efficient, and easy to manage

### What is a docker image?

A Docker image is like the blueprint or template for a container. It’s a read-only file that contains everything needed to run an application:
- The application code
- Libraries and dependencies
- System tools
- Settings and configurations

When you run a Docker image, it becomes a container.  

Quelle: introduction lecture

### Let's run our first docker image:

### Login to docker

In [2]:
# This you need to do on the command line directly

!docker login
# Concole output:
# Authenticating with existing credentials... [Username: cp1998]
# i Info → To login with a different account, run 'docker logout' followed by 'docker login'
# Login Succeeded

Authenticating with existing credentials... [Username: cp1998]

[1m[106m[30mi[0m[0m [96mInfo → [0m[0m[3mTo login with a different account, run 'docker logout' followed by 'docker login'[0m


Login Succeeded


### Run your first docker container

In [3]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



In [None]:
# Console output of first time runnign this command:

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
17eec7bbc9d7: Pull complete 
Digest: sha256:54e66cc1dd1fcb1c3c58bd8017914dbed8701e2d8c74d9262e26bd9cc1642d31
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

### Find the container ID

In [4]:
!docker ps -a

# Output:
# CONTAINER ID   IMAGE         COMMAND    CREATED         STATUS                     PORTS     NAMES
# d4c681913bee   hello-world   "/hello"   2 minutes ago   Exited (0) 2 minutes ago             hardcore_jones

CONTAINER ID   IMAGE         COMMAND    CREATED          STATUS                      PORTS     NAMES
543eecd37e4c   hello-world   "/hello"   26 seconds ago   Exited (0) 25 seconds ago             recursing_cerf


### Delete the container again, give prove its deleted

In [6]:
# Remove the container:

!docker rm 543eecd37e4c

# Remove all containers (if there are many, be careful with this command):
# !docker rm $(docker ps -a -q)
# was not run, as there was only one container

543eecd37e4c


In [7]:
# run again docker ps -a to see fi the container was removed:

!docker ps -a

# Output:
# CONTAINER ID   IMAGE         COMMAND    CREATED         STATUS                     PORTS     NAMES

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. Go to the website and klick the download now button 
2. From that website downloaf the FastQC v0.12.1 (Win/Linux zip file) file
3. Extract everything from the zip file
4. Fint the Install.txt file where installation is explained
5. Ensure that you have a suitable Java Runtime Environment (JRE) is installed
6. If JRE is installed, unzipping is enough 

7. Doubble-click run_fastqc.bat file
8. CLick File -> Open -> Choose your Fasta-File



### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [None]:
# pull the container

!docker pull community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29

# Console output:
# 0.12.1--af7a5314d5015c29: Pulling from library/fastqc
# 4f4fb700ef54: Pull complete 
# 0ea1a16bbe82: Pull complete 
# 92dc97a3ef36: Pull complete 
# bb36d6c3110d: Pull complete 
# dafa2b0c44d2: Pull complete 
# 030a47592a0a: Pull complete 
# f3c4c6865366: Pull complete 
# 10b8c00c10a5: Pull complete 
# 17dc7ea432cc: Pull complete 
# dec6b097362e: Pull complete 
# 0f93acc3b8ff: Pull complete 
# f88da01cff0b: Pull complete 
# 403f74b0f85e: Pull complete 
# Digest: sha256:b7f6caf359264cf86da901b0aa5f66735a6506fcfbf103c66db6987253ad44c1
# Status: Downloaded newer image for community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29
# community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29

!docker images

# Concole output.
# REPOSITORY                                                 TAG                        IMAGE ID       CREATED         SIZE
# hello-world                                                latest                     54e66cc1dd1f   7 weeks ago     20.3kB
# community.wave.seqera.io/library/fastqc                    0.12.1--af7a5314d5015c29   b7f6caf35926   11 months ago   1.37GB
# quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       e0de72408557   17 months ago   1.99GB
# quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          099d0e113ec8   18 months ago   1.82GB
# quay.io/nf-core/ubuntu                                     20.04                      59e9d08d8dc1   2 years ago     110MB
# quay.io/biocontainers/pandas                               1.5.2                      cbb54fcf8730   2 years ago     493MB
# quay.io/biocontainers/gsea                                 4.3.2--hdfd78af_0          0010041fff53   2 years ago     849MB
# quay.io/biocontainers/r-base                               4.2.1                      6721ee8bfba2   2 years ago     1.17GB
# quay.io/biocontainers/p7zip                                16.02                      731aaeef376a   3 years ago     47.7MB
# quay.io/biocontainers/python                               3.9--1                     d97d2b329b4e   4 years ago     275MB
# quay.io/biocontainers/gawk                                 5.1.0                      9d300f3d0a35   5 years ago     53.4MB
# quay.io/biocontainers/bioconductor-deseq2                  1.34.0--r41hc247a5b_3      c06884d353ef   55 years ago    452MB
# quay.io/biocontainers/wget                                 1.20.1                     7e5ba9e87a25   55 years ago    26.9MB

In [None]:
# run the container and save the results to a new "fastqc_results" directory

!docker run --rm -v /home/chrissi/BioPrak/computational-workflows-2025/notebooks:/data community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29 fastqc /data/day_02/SRR_data_fetch/fastq/SRX19144486_SRR23195516_1.fastq.gz \
    -o /data/day_03_part2

# Console output:
# application/gzip
# Started analysis of SRX19144486_SRR23195516_1.fastq.gz
# Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 35% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 40% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 45% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 50% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 55% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 60% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 65% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 70% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 75% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 80% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 85% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 90% complete for SRX19144486_SRR23195516_1.fastq.gz
# Approx 95% complete for SRX19144486_SRR23195516_1.fastq.gz
# Analysis complete for SRX19144486_SRR23195516_1.fastq.gz

### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

Using the docker container in the future will be much easier compared to the usage of FastQC manually, because the container already brings everything needed for the analysis. 

### What would you say, which approach is more reproducible?

Using the docker container is much more reproducible, because it does not depend on a manual installation of the tool.

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?

Since we could not download the fastq-files ourselves yesterday and also could not run nf-core/rnaseq, we do not have any fastq files to compare to. In general however, the files should be identical, if the same versions of fastQC were used in the nf-core pipeline and now. This is the big advantage of docker, which makes things more reproducible.

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [5]:
!pwd

!mkdir cowsay-docker
!cd cowsay-docker

/home/chrissi/BioPrak/computational-workflows-2025/notebooks/day_03_part2


In [None]:
# open the file "my_dockerfile" in a text editor

# Adjust the file with the following:

# this is the base image the container is built on. 
# In this case, it is a slim version of the Debian operating system. 
# FROM debian:bullseye-slim 
# Changed to:
FROM ubuntu:22.04

# these are the labels that are added to the image. 
# They are metadata that can be used to identify the author of the image. 

LABEL image.author.name "Christina Parpoulas"
LABEL image.author.email "christina.parpoulas@student.uni-tuebingen.de"

# !TODO: add the command that is run to install the dependencies for the image. 
# In this case, it should be updating the package list and installing curl and cowsay. 
# Install dependencies: update package list, install curl and cowsay
RUN apt-get update && \
    apt-get install -y curl cowsay && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# !TODO: add an ENV line to set environmental variables. 
# In this case, it should set the PATH variable to /usr/games. 
# Explain in the notebook why this is necessary.
# Set environmental variables
ENV PATH="/usr/games:${PATH}"

# Set default command
CMD ["cowsay", "Hello from Docker!"]

# Then save and close the file.

### Explain the RUN and ENV lines you added to the file

## RUN
apt-get update: updates the list of available packages and their versions.  
apt-get install -y curl cowsay: installs two packages:  
1. curl: a tool to transfer data from or to a server (useful for downloads or testing web requests).
2. cowsay: the fun ASCII cow program we want to use.
3. -y automatically confirms installation prompts.  

apt-get clean: removes cached package files to reduce image size.  
rm -rf /var/lib/apt/lists/*: removes temporary package list files to further shrink the image.

## ENV
Sets an environment variable inside the container.
Here, we are prepending /usr/games to the PATH.

In [14]:
# build the docker image
!docker build -t my-cowsay -f my_dockerfile .

# -t my-cowsay: this names the image "my-cowsay"
# -f my_dockerfile: this specifies the name of the Dockerfile to use.

[1A[1B[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                          docker:default
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 1.08kB                                     0.0s
[0m => [internal] load metadata for docker.io/library/ubuntu:22.04            0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                          docker:default
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 1.08kB                                     0.0s
[0m => [internal] load metadata for docker.io/library/ubuntu:22.04            0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                          docker:default
[34m => [internal] load build definition

In [15]:
# make sure that the image has been built

!docker images

REPOSITORY                                                 TAG                        IMAGE ID       CREATED              SIZE
my-cowsay                                                  latest                     ba22a2164322   About a minute ago   192MB
hello-world                                                latest                     54e66cc1dd1f   7 weeks ago          20.3kB
community.wave.seqera.io/library/fastqc                    0.12.1--af7a5314d5015c29   b7f6caf35926   11 months ago        1.37GB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       e0de72408557   17 months ago        1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          099d0e113ec8   18 months ago        1.82GB
quay.io/nf-core/ubuntu                                     20.04                      59e9d08d8dc1   2 years ago          110MB
quay.io/biocontainers/pandas                               1.5.2                      cbb54fcf8730   

In [16]:
# run the docker file 

!docker run --rm my-cowsay

 ____________________
< Hello from Docker! >
 --------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

### use the file "salmon_docker" in this directory to build a new docker image:

FROM debian:bullseye-slim  

LABEL image.author.name="Christina Parpoulas"  
LABEL image.author.email="christina.parpoulas@student.uni-tuebingen.de"  

### Install dependencies: curl and tar
RUN apt-get update && \  
    apt-get install -y curl tar && \  
    apt-get clean && \  
    rm -rf /var/lib/apt/lists/*  

### Download and install Salmon form the link given in the notebook  
RUN curl -L https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz \  
    -o /tmp/salmon.tar.gz && \  
    tar -xzf /tmp/salmon.tar.gz -C /opt && \  
    rm /tmp/salmon.tar.gz && \  
    ln -s /opt/salmon-1.5.2_linux_x86_64/bin/salmon /usr/local/bin/salmon  

### Set the PATH environment variable (to /usr/bin)  
ENV PATH="/usr/bin:${PATH}"  



In [17]:
# build the image
!docker build -t salmon_docker -f salmon_docker .
!docker images

[1A[1B[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (1/3)                                          docker:default
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 779B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.1s
 => [auth] library/debian:pull token for registry-1.docker.io              0.0s
[?25h[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (2/3)                                          docker:default
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 779B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[34m => [auth] library/debian:pull token for registry-1.docker.io   

In [18]:
# run the docker image to give out the version of salmon
!docker run --rm salmon_docker salmon --version

salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

No, many images already exist and can be downloaded and reused.  


Find the salmon docker image online and run it on your computer. (not done)

What is https://biocontainers.pro/ ?  

BioContainers is an open-source project that builds and distributes Docker, Singularity (Apptainer), and Conda packages for thousands of bioinformatics tools like FastQC, Salmon, STAR, BWA, etc.
It’s part of the ELIXIR infrastructure, aiming to make bioinformatics software portable, reproducible, and easy to deploy.

## Are there other ways to create Docker (or Apptainer) images?

If your tools are available on Bioconda, you can create a minimal Dockerfile that installs everything through Conda.  


What is https://seqera.io/containers/ ?  

Seqera Containers is a catalog of prebuilt, versioned Docker and Apptainer images for tools used in bioinformatics pipelines, especially nf-core and Nextflow workflows.