# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [1]:
!docker info

Client:
 Version:    28.4.0
 Context:    default
 Debug Mode: false
 Plugins:
  ai: Docker AI Agent - Ask Gordon (Docker Inc.)
    Version:  v1.9.11
    Path:     /usr/local/lib/docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.28.0-desktop.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-buildx
  cloud: Docker Cloud (Docker Inc.)
    Version:  v0.4.29
    Path:     /usr/local/lib/docker/cli-plugins/docker-cloud
  compose: Docker Compose (Docker Inc.)
    Version:  v2.39.4-desktop.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.42
    Path:     /usr/local/lib/docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Docker Inc.)
    Version:  v0.2.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-desktop
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.31
    Path:     /usr/local/lib/docker/cli-plugins/docker

### What is a container?

It's a lightweight, portable, and isolated environment that packages software and all its dependencies together.

### Why do we use containers?

Applications can be run reliably accross different systems. Containers use host system’s OS kernel. Processes, filesystems, and network settings are kept separate from other containers and the host.

### What is a docker image?

It's a blueprint for creating containers

### Let's run our first docker image:

### Login to docker

In [2]:
# This you need to do on the command line directly

### Run your first docker container

In [3]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

In [5]:
!docker ps -a

CONTAINER ID   IMAGE                                                                         COMMAND                  CREATED          STATUS                         PORTS     NAMES
353efb73ce30   hello-world                                                                   "/hello"                 30 seconds ago   Exited (0) 28 seconds ago                nostalgic_kare
032586fbb074   combinelab/salmon:latest                                                      "salmon --version"       15 minutes ago   Exited (0) 15 minutes ago                sad_satoshi
28126d898eaf   salmon:latest                                                                 "salmon --version"       15 minutes ago   Exited (0) 15 minutes ago                clever_fermat
31aef920d8f0   combinelab/salmon:latest                                                      "salmon --help"          20 minutes ago   Exited (0) 20 minutes ago                vigorous_chebyshev
0b0254db7b58   combinelab/salmon:latest               

In [None]:
# access Docker ID by IMAGE name
!docker ps -aq --filter ancestor=hello-world

353efb73ce30


### Delete the container again, give prove its deleted

In [18]:
!docker container rm 353efb73ce30

353efb73ce30


In [19]:
!docker ps -a

CONTAINER ID   IMAGE                                                                         COMMAND                  CREATED             STATUS                         PORTS     NAMES
032586fbb074   combinelab/salmon:latest                                                      "salmon --version"       20 minutes ago      Exited (0) 20 minutes ago                sad_satoshi
28126d898eaf   salmon:latest                                                                 "salmon --version"       20 minutes ago      Exited (0) 20 minutes ago                clever_fermat
31aef920d8f0   combinelab/salmon:latest                                                      "salmon --help"          26 minutes ago      Exited (0) 26 minutes ago                vigorous_chebyshev
0b0254db7b58   combinelab/salmon:latest                                                      "/bin/bash"              26 minutes ago      Exited (0) 26 minutes ago                upbeat_mendel
e937f4733cf5   combinelab/salmon:latest 

### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

Install the tool

1. download FASTQC zip file
2. extract it
3. make it executable `chmod 755 fastqc`
4. add to PATH

Run FastQC

5. download fastq file
6. run `fastqc <file.fastq> -o <out_dir>`

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [21]:
# pull the container
!docker pull community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29

0.12.1--af7a5314d5015c29: Pulling from library/fastqc
Digest: sha256:b7f6caf359264cf86da901b0aa5f66735a6506fcfbf103c66db6987253ad44c1
Status: Image is up to date for community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29
community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29
Digest: sha256:b7f6caf359264cf86da901b0aa5f66735a6506fcfbf103c66db6987253ad44c1
Status: Image is up to date for community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29
community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29


In [23]:
# run the container and save the results to a new "fastqc_results" directory
!mkdir fastqc_results

!docker run -v "${PWD}/../day_02/fetchngs/fastq:/data" \
    -v "${PWD}/fastqc_results:/output" \
    community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29 \
    fastqc /data/SRX19144486_SRR23195516_1.fastq.gz --outdir /output

mkdir: cannot create directory ‘fastqc_results’: File exists
application/gzip
application/gzip
Started analysis of SRX19144486_SRR23195516_1.fastq.gz
Started analysis of SRX19144486_SRR23195516_1.fastq.gz
Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 35% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 35% complete for SRX191

### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

### What would you say, which approach is more reproducible?

Creating the Docker container is a easy process using sequera.io. All dependencies are included and the container, once created, can be pulled and used from everywhere on every system. This makes the results reproducible.

Installing a tool locally with conda can be faster for the moment, but changing dependencies can make the tool break.

### Compare the file to last weeks fastqc results, are they identical?

TODO

### Is the fastqc version identical?

The FastQC version installed within the conda environment and in the Docker container are both the same (v0.12.1).

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [24]:
# open the file "my_dockerfile" in a text editor
!cat my_dockerfile

shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
FROM debian:bullseye-slim

# install cowsay
RUN apt-get update && apt-get install -y cowsay

# Ensure cowsay is on PATH
ENV PATH="/usr/games:${PATH}"

# Default command
CMD ["cowsay", "--help"]

### Explain the RUN and ENV lines you added to the file

`RUN`: command to be executed on the command line

`ENV`: specify PATH variable to find the software from everywhere within the system

In [1]:
!pwd

/mnt/c/Users/NicolaiOswald/OneDrive - UT Cloud/Dokumente/Studium Tübingen/Computational Workflows/computational-workflows-2025/notebooks/day_03_part2


In [8]:
# build the docker image
!docker build -f my_dockerfile -t my_dockerfile:test .

[1A[1B[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                          docker:default
[34m => [internal] load build definition from my_dockerfile                    0.1s
[0m[34m => => transferring dockerfile: 238B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.4s (1/2)                                          docker:default
[34m => [internal] load build definition from my_dockerfile                    0.1s
[0m[34m => => transferring dockerfile: 238B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)              

In [9]:
# make sure that the image has been built
!docker ps -a

CONTAINER ID   IMAGE                                                                         COMMAND                  CREATED              STATUS                          PORTS     NAMES
1f069c85a7a1   b465b107bfc1                                                                  "cowsay Mooh!"           About a minute ago   Exited (0) About a minute ago             hopeful_germain
7be793bd0c35   a112e7c5f39f                                                                  "cowsay --help"          2 minutes ago        Exited (0) 2 minutes ago                  busy_khorana
f2dbb852f663   community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29              "/usr/local/bin/_ent…"   20 minutes ago       Exited (130) 5 minutes ago                blissful_bell
82ef260fe8a9   quay.io/biocontainers/r-shinyngs:1.8.8--r43hdfd78af_0                         "/usr/local/env-exec…"   3 hours ago          Exited (0) 3 hours ago                    nxf-w2rL9HSdD4Ph3KB8orvdbexH
594d7f616a7c   qu

In [10]:
# run the docker file 
!docker run my_dockerfile:test

 ___________
< Moo, moo! >
 -----------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [11]:
# use the file "salmon_docker" in this directory to build a new docker image
!cat salmon_docker

FROM debian:bullseye-slim

LABEL image.author.name="Nicolai Oswald"
LABEL image.author.email="nicolai.oswald@student.uni-tuebingen.de"

# Install dependencies: wget for downloading the binaries
RUN apt-get update && apt-get install -y \
    wget

# Download and install Salmon
RUN wget https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz \
    && tar -xzf salmon-1.5.2_linux_x86_64.tar.gz \
    && mv salmon-1.5.2_linux_x86_64 /opt/salmon \
    && rm salmon-1.5.2_linux_x86_64.tar.gz

# Add Salmon binary to PATH
ENV PATH="/opt/salmon/bin:${PATH}"

# Default command to run when container starts
CMD ["salmon", "--help"]

In [12]:
# build the image
!docker build -t salmon:latest -f salmon_docker .

[1A[1B[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                          docker:default
[34m => [internal] load build definition from salmon_docker                    0.1s
[0m[34m => => transferring dockerfile: 643B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.4s (1/2)                                          docker:default
[34m => [internal] load build definition from salmon_docker                    0.1s
[0m[34m => => transferring dockerfile: 643B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)              

In [13]:
# run the docker image to give out the version of salmon
!docker run salmon:latest salmon --version

salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

You can find a salmon docker container at `combinelab/salmon`

BioContainers is an open-source registry providing ready-to-use containers for bioinformatics software, enabling easy and reproducible analyses. You can find and use containers for many popular tools directly from BioContainers.

In [14]:
!docker pull combinelab/salmon:latest
!docker run combinelab/salmon:latest salmon --version

latest: Pulling from combinelab/salmon
Digest: sha256:cefd8bb0b2ed9b07f22b5f0fc317ddda540e5b0dc00810d1ff0d92fee5d80370
Status: Image is up to date for combinelab/salmon:latest
docker.io/combinelab/salmon:latest
salmon 1.10.3


## Are there other ways to create Docker (or Apptainer) images?

What is https://seqera.io/containers/ ?

Besides creating a Dockerfile, you can also use automated build services like https://sequera.io/containers (as we did at the beginning).

Seqera Containers is a platform that provides ready-to-use containers for bioinformatics and scientific workflows. It offers a catalog of containers for popular tools which can be used and integrated into workflow systems.