# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [1]:
!docker info

Client:
 Version:    28.4.0
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  ai: Docker AI Agent - Ask Gordon (Docker Inc.)
    Version:  v1.9.11
    Path:     /Users/sophiag/.docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.28.0-desktop.1
    Path:     /Users/sophiag/.docker/cli-plugins/docker-buildx
  cloud: Docker Cloud (Docker Inc.)
    Version:  v0.4.29
    Path:     /Users/sophiag/.docker/cli-plugins/docker-cloud
  compose: Docker Compose (Docker Inc.)
    Version:  v2.39.4-desktop.1
    Path:     /Users/sophiag/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.42
    Path:     /Users/sophiag/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Docker Inc.)
    Version:  v0.2.0
    Path:     /Users/sophiag/.docker/cli-plugins/docker-desktop
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.31
    Path:     /Users/sophiag/.docker/cli-p

### What is a container?

A lightweight, portable, and self-sufficient package that contains everything needed to run a piece of software consistently across different environments. It bundles together the application code along with its dependencies, libraries, configuration files, and runtime environment.

### Why do we use containers?

We use containers because they bring consistency, efficiency, speed, scalability, and portability.

### What is a docker image?

A blueprint for a container. Snapshot that defines everything the application needs to run.

### Let's run our first docker image:

### Login to docker

In [None]:
# This you need to do on the command line directly

### Run your first docker container

In [2]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

In [7]:
!docker ps -l

CONTAINER ID   IMAGE         COMMAND    CREATED        STATUS                    PORTS     NAMES
c7cb70a3b596   hello-world   "/hello"   47 hours ago   Exited (0) 47 hours ago             mystifying_bouman


### Delete the container again, give prove its deleted

In [8]:
!docker rm $(docker ps -lq)

c7cb70a3b596


In [9]:
!docker ps -a

CONTAINER ID   IMAGE                                                             COMMAND                  CREATED      STATUS                    PORTS     NAMES
b251b22b31bc   hello-world                                                       "/hello"                 2 days ago   Exited (0) 2 days ago               friendly_poincare
460a738477d8   quay.io/biocontainers/bioconductor-deseq2:1.34.0--r41hc247a5b_3   "/usr/local/env-exec…"   2 days ago   Exited (255) 2 days ago             nxf-trN73ETGqThwA3JB0Y7NgP4T


### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. Download FastQC using curl: `curl -O https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.12.1.zip`
2. Unzip the downloaded file: `unzip fastqc_v0.12.1.zip`
3. Navigate to the FastQC directory: `cd FastQC/`
4. Make the FastQC script executable: `chmod +x fastqc`
7. Run FastQC on your FASTQ file: `cd ..`
`./FastQC/fastqc ./notebooks/day_02/SRFetch_results/fastq/SRX19144486_SRR23195516_1.fastq.gz`
8. View the generated HTML report in a web browser

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [10]:
# pull the container
!docker pull community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29

0.12.1--af7a5314d5015c29: Pulling from library/fastqc

[1Bc6865366: Pulling fs layer 
[1Bb700ef54: Pulling fs layer 
[1Ba01cff0b: Pulling fs layer 
[1B2b0c44d2: Pulling fs layer 
[1B97a3ef36: Pulling fs layer 
[1Ba16bbe82: Pulling fs layer 
[1B7ea432cc: Pulling fs layer 
[1Bd6c3110d: Pulling fs layer 
[1Bc00c10a5: Pulling fs layer 
[9Bb700ef54: Pulling fs layer 
[1Bacc3b8ff: Pulling fs layer 
[1B47592a0a: Pulling fs layer 
[13BDigest: sha256:b7f6caf359264cf86da901b0aa5f66735a6506fcfbf103c66db6987253ad44c1[2K[11A[2K[6A[2K[6A[2K[7A[2K[11A[2K[11A[2K[11A[2K[12A[2K[1A[2K[11A[2K[11A[2K[11A[2K[11A[2K[11A[2K[11A[2K[13A[2K[11A[2K[11A[2K[13A[2K[11A[2K[11A[2K[11A[2K[11A[2K[11A[2K[13A[2K[13A[2K[13A[2K[10A[2K[13A[2K[11A[2K[13A[2K[13A[2K[13A[2K[13A[2K[11A[2K[11A[2K[13A[2K[11A[2K[11A[2K[11A[2K[13A[2K[11A[2K[11A[2K[11A[2K[13A[2K[13A[2K[11A[2K[11A[2K[11A[2K[13A[2K[11A[2K[13A[2K[11A[2K

In [12]:
# run the container and save the results to a new "fastqc_results" directory

# Create output directory first
!mkdir -p fastqc_results

In [16]:
!pwd

/Users/sophiag/Documents/Dokumente/Studium/Master/FS4/ComputationalWorkflows/computational-workflows-2025/notebooks/day_03_part2


In [None]:
# run the container and save the results to a new "fastqc_results" directory/Users/sophiag/Documents/Dokumente/Studium/Master/FS4/ComputationalWorkflows/computational-workflows-2025/notebooks:/data
!docker run --rm -v  \
  community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29 \
  fastqc /data/day_02/SRFetch_results/fastq/SRX19144486_SRR23195516_1.fastq.gz \
  -o /data/day_03_part2/fastqc_results  

application/gzip
Started analysis of SRX19144486_SRR23195516_1.fastq.gz
Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 35% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 40% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 45% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 50% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 55% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 60% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 65% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 70% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 75% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 80% complete for SRX19144486_SRR23195

### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

Although pulling the Biocontainer took a longer time it will be more robust and fast when using it in the future. Doing fastqc both via the command line and docker needed some trial and error.

### What would you say, which approach is more reproducible?

I think using docker will be more reproducible. It doesnt need a lot of documentation and still will leverage the same versions and run in the same way when running it again at some later timepoint.

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?

Yes, the files look similar (manual inspection)

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [None]:
# open the file "my_dockerfile" in a text editor

### Explain the RUN and ENV lines you added to the file

RUN line: This command updates the package list (apt-get update), installs cowsay and curl (apt-get install -y cowsay curl), and then cleans up package cache to reduce image size (apt-get clean && rm -rf /var/lib/apt/lists/*).

ENV line: The PATH variable is set to include /usr/games because cowsay is installed in /usr/games/cowsay on Debian systems. By adding /usr/games to the PATH, we can run cowsay directly without specifying the full path /usr/games/cowsay.

In [20]:
# build the docker image
!docker build -t my-cowsay-image -f my_dockerfile .

[1A[1B[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.1s
[0m[34m => => transferring dockerfile: 844B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.1s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.4s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.1s
[0m[34m => => transferring dockerfile: 844B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5

In [22]:
# make sure that the image has been built
!docker images my-cowsay-image

REPOSITORY        TAG       IMAGE ID       CREATED              SIZE
my-cowsay-image   latest    f2bdfcc93251   About a minute ago   197MB


In [23]:
# run the docker file 
!docker run --rm my-cowsay-image cowsay "Yam yam yam"

 _____________
< Yam yam yam >
 -------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [None]:
# use the file "salmon_docker" in this directory to build a new docker image

In [37]:
# build the image
!docker build -t my-salmon-image -f salmon_docker .

[1A[1B[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 858B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 858B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                    docker:desktop-linux

In [38]:
# run the docker image to give out the version of salmon

!docker run --rm my-salmon-image salmon --version

salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

No, most tools already have pre-built Docker images available.

In [32]:
!docker pull community.wave.seqera.io/library/salmon:1.10.3--fcd0755dd8abb423

1.10.3--fcd0755dd8abb423: Pulling from library/salmon

[1B9ad3be4b: Pulling fs layer 
[2BDigest: sha256:b4519ea6d76868516e8c545fd709bf900638cb4b9130206730e09038b2e9e274[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2KDownloading  31.46MB/67.61MB[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[

In [33]:
!docker run --rm community.wave.seqera.io/library/salmon:1.10.3--fcd0755dd8abb423 salmon --version

salmon 1.10.3


BioContainers is a community effort that provides Docker containers for bioinformatics tools

## Are there other ways to create Docker (or Apptainer) images?

What is https://seqera.io/containers/ ?

Yes, e.g. GUI tools. Also via GitHub

Seqera Containers is a container provisioning service. You can search up the desireed docker and then pull it via the comand line easily.