# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [40]:
!docker info

Client:
 Version:    28.3.3
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  ai: Docker AI Agent - Ask Gordon (Docker Inc.)
    Version:  v1.9.11
    Path:     /Users/Nikita/.docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.27.0-desktop.1
    Path:     /Users/Nikita/.docker/cli-plugins/docker-buildx
  cloud: Docker Cloud (Docker Inc.)
    Version:  v0.4.21
    Path:     /Users/Nikita/.docker/cli-plugins/docker-cloud
  compose: Docker Compose (Docker Inc.)
    Version:  v2.39.2-desktop.1
    Path:     /Users/Nikita/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.42
    Path:     /Users/Nikita/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Docker Inc.)
    Version:  v0.2.0
    Path:     /Users/Nikita/.docker/cli-plugins/docker-desktop
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.31
    Path:     /Users/Nikita/.docker/cli-plugins/

### What is a container?

A container is a lightweight, portable package that includes an application and all its dependencies (libraries, system tools, code, runtime, settings) needed to run it. Unlike virtual machines, containers share the host OS kernel, making them more efficient. Think of it as a "shipping container" for software - it ensures the application runs consistently across different environments.

### Why do we use containers?

We use containers for several key reasons:
- **Reproducibility**: Same environment everywhere (dev, testing, production)
- **Portability**: Runs consistently across different systems and platforms
- **Isolation**: Applications don't interfere with each other
- **Efficiency**: Lightweight compared to virtual machines
- **Version control**: Can manage different versions of software easily
- **Dependency management**: All dependencies bundled together

A Docker image is a read-only template used to create containers. It's like a "blueprint" that contains:
- The application code
- Runtime environment
- Libraries and dependencies
- Environment variables
- Configuration files

Images are built in layers, making them efficient to store and transfer. When you run an image, it creates a container instance.

### Login to docker

In [41]:
# This you need to do on the command line directly

### Run your first docker container

In [42]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

In [43]:
# List all containers (running and stopped)
!docker ps -a

CONTAINER ID   IMAGE                                            COMMAND                  CREATED          STATUS                              PORTS     NAMES
eaf6b408ee55   hello-world                                      "/hello"                 1 second ago     Exited (0) Less than a second ago             elegant_wing
480d8fe75aa4   quay.io/biocontainers/salmon:1.9.0--h7e5ed60_1   "/usr/local/env-exec…"   3 minutes ago    Exited (0) About a minute ago                 goofy_curie
96b56c23e048   my-salmon                                        "salmon --version"       14 minutes ago   Exited (255) About a minute ago               suspicious_blackburn
e61fcee0bb89   my-salmon                                        "salmon --version"       19 minutes ago   Exited (255) About a minute ago               quirky_bose


### Delete the container again, give prove its deleted

In [66]:
# Delete container by ID (replace CONTAINER_ID with actual ID from above)
!docker rm fdce05fc6bb5

fdce05fc6bb5


In [67]:
# Verify container is deleted
!docker ps -a

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. Download FastQC from the website (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
2. Extract the downloaded archive
3. Make the fastqc script executable (`chmod +x fastqc`)
4. Install Java (required dependency)
5. Run FastQC on the example file: `./fastqc example.fastq`
6. View the generated HTML report

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [68]:
# Pull FastQC container from Seqera
!docker pull quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0

0.12.1--hdfd78af_0: Pulling from biocontainers/fastqc
Digest: sha256:e194048df39c3145d9b4e0a14f4da20b59d59250465b6f2a9cb698445fd45900
Status: Image is up to date for quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0
quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0


In [69]:
!docker images

REPOSITORY                                                 TAG                     IMAGE ID       CREATED          SIZE
my-cowsay                                                  latest                  9a9cbcbbcd65   21 minutes ago   188MB
my-salmon                                                  latest                  d92193ea5e48   22 minutes ago   400MB
hello-world                                                latest                  54e66cc1dd1f   7 weeks ago      16.9kB
quay.io/biocontainers/samtools                             1.21--h50ea8bc_0        783c6646029a   12 months ago    108MB
quay.io/biocontainers/fq                                   0.12.0--h9ee0642_0      74b59572f1d0   15 months ago    20MB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0    e0de72408557   17 months ago    1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0       099d0e113ec8   18 months ago    1.82GB
combinelab/salmon              

In [48]:
!pwd


/Users/Nikita/Desktop/Studium/Master/Comp Workflows/computational-workflows-2025/notebooks/day_03_part2


In [49]:
# Create output directory and run FastQC on the actual FASTQ file
!mkdir -p fastqc_results    
!docker run -v "$PWD/fastqc_results:/output" -v "computational-workflows-2025/notebooks/day_02/results/fastq/md5:/data" quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0 fastqc /data/SRX19144488_SRR23195511_1.fastq.gz -o /output


docker: Error response from daemon: create computational-workflows-2025/notebooks/day_02/results/fastq/md5: "computational-workflows-2025/notebooks/day_02/results/fastq/md5" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path

Run 'docker run --help' for more information


### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

**Docker approach is easier and more future-proof:**
- **Initial setup**: Docker is easier - just pull and run, no manual installation
- **Future use**: Docker is much easier - same command works everywhere
- **Maintenance**: Docker handles updates and dependencies automatically
- **Sharing**: Others can reproduce results with a single command
- **Multiple versions**: Can easily switch between different tool versions

**Docker is significantly more reproducible:**
- **Exact environment**: Same OS, libraries, and tool versions every time
- **No dependency conflicts**: Isolated from host system variations
- **Version locked**: Specific container version ensures identical results
- **Cross-platform**: Works identically on different operating systems
- **Time-proof**: Results reproducible months/years later with same container

Manual installation varies between systems, OS versions, and available dependencies.

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [50]:
# Display the contents of my_dockerfile
!cat my_dockerfile

# this is the base image the container is built on. In this case, it is a slim version of the Debian operating system.
FROM debian:bullseye-slim

# these are the labels that are added to the image. They are metadata that can be used to identify the author of the image.
LABEL image.author.name "Mykyta Borodin"
LABEL image.author.email "mykyta.borodin@student.uni-tuebingen.de"

# Update package list and install dependencies
RUN apt-get update && apt-get install -y curl cowsay && rm -rf /var/lib/apt/lists/*

# Set PATH to include /usr/games where cowsay is installed
ENV PATH="/usr/games:$PATH"


### Explain the RUN and ENV lines you added to the file

**RUN instruction:**
- Executes commands during image build process
- Each RUN creates a new layer in the image
- Used to install software, update packages, create files
- Example: `RUN apt-get update && apt-get install -y cowsay`

**ENV instruction:**
- Sets environment variables in the container
- Variables persist when container runs
- Used to configure application behavior
- Example: `ENV PATH="/usr/games:$PATH"` (adds cowsay to PATH)

In [51]:
# Build the Docker image from the Dockerfile
!docker build -t my-cowsay -f my_dockerfile .

[1A[1B[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 640B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 640B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition

In [52]:
# List Docker images to verify it was built
!docker images

REPOSITORY                                                 TAG                     IMAGE ID       CREATED          SIZE
my-cowsay                                                  latest                  9a9cbcbbcd65   19 minutes ago   188MB
my-salmon                                                  latest                  c3ee6bf1c079   20 minutes ago   400MB
hello-world                                                latest                  54e66cc1dd1f   7 weeks ago      16.9kB
quay.io/biocontainers/samtools                             1.21--h50ea8bc_0        783c6646029a   12 months ago    108MB
quay.io/biocontainers/fq                                   0.12.0--h9ee0642_0      74b59572f1d0   15 months ago    20MB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0    e0de72408557   17 months ago    1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0       099d0e113ec8   18 months ago    1.82GB
combinelab/salmon              

In [53]:
# Run the cowsay container
!docker run my-cowsay cowsay "Hello Docker!"

 _______________
< Hello Docker! >
 ---------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [54]:
# Display the contents of salmon_docker file
!cat salmon_docker

FROM debian:bullseye-slim

LABEL image.author.name="Mykyta Borodin"
LABEL image.author.email="mykyta.borodin@student.uni-tuebingen.de"

# Install dependencies
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*

# Download and install Salmon
RUN curl -L https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz -o salmon.tar.gz && \
    tar -xzf salmon.tar.gz && \
    mv salmon-1.5.2_linux_x86_64 /opt/salmon && \
    rm salmon.tar.gz

# Set the PATH environment variable to include Salmon
ENV PATH="/opt/salmon/bin:$PATH"



In [55]:
# Build the Salmon Docker image
!docker build -t my-salmon -f salmon_docker .

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 631B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (2/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 631B                                       0.0s
[0m[34m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[0m[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.4s (7/7) FINISHED                           docker:desktop-linux
[34m => [internal] load build definition from salmon

In [None]:
!docker pull combinelab/salmon:latest
!docker run --rm combinelab/salmon:latest salmon --version

zsh:1: command not found: ocker
salmon 1.10.3


In [57]:
!docker images

REPOSITORY                                                 TAG                     IMAGE ID       CREATED          SIZE
my-cowsay                                                  latest                  9a9cbcbbcd65   19 minutes ago   188MB
my-salmon                                                  latest                  d92193ea5e48   20 minutes ago   400MB
hello-world                                                latest                  54e66cc1dd1f   7 weeks ago      16.9kB
quay.io/biocontainers/samtools                             1.21--h50ea8bc_0        783c6646029a   12 months ago    108MB
quay.io/biocontainers/fq                                   0.12.0--h9ee0642_0      74b59572f1d0   15 months ago    20MB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0    e0de72408557   17 months ago    1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0       099d0e113ec8   18 months ago    1.82GB
combinelab/salmon              

In [58]:
# Instead, let's use the official BioContainers Salmon image
# This is properly built for multiple architectures

# Pull the official Salmon container
!docker pull quay.io/biocontainers/salmon:1.9.0--h7e5ed60_1

1.9.0--h7e5ed60_1: Pulling from biocontainers/salmon
Digest: sha256:e56485bfa26913aebaa6351b2ddb1308d0dc0352bf15e7f5431bc58ba5465809
Status: Image is up to date for quay.io/biocontainers/salmon:1.9.0--h7e5ed60_1
quay.io/biocontainers/salmon:1.9.0--h7e5ed60_1


In [70]:
!docker run --platform linux/amd64 quay.io/biocontainers/salmon:1.9.0--h7e5ed60_1 salmon --version

salmon 1.9.0


**No, bioinformaticians don't need to create Docker images every time!**

**BioContainers (https://biocontainers.pro/) are:**
- A community-driven project providing Docker containers for bioinformatics tools
- Automatically builds containers for tools in Bioconda
- Provides standardized, tested containers for 1000+ bioinformatics tools
- Ensures reproducible bioinformatics analyses
- Integrated with workflow managers like Nextflow and Snakemake

**Example using existing Salmon container:**
```bash
# Pull official Salmon container
docker pull quay.io/biocontainers/salmon:1.9.0--h7e5ed60_1

# Run Salmon version
docker run quay.io/biocontainers/salmon:1.9.0--h7e5ed60_1 salmon --version
```

**Yes, there are other ways to create container images:**

**Seqera Containers (https://seqera.io/containers/) provides:**
- **Multi-architecture support**: Containers for x86_64, ARM64, and other architectures
- **Multiple container formats**: Docker, Singularity/Apptainer, and Podman
- **Enhanced BioContainers**: Extended versions with additional optimizations
- **Enterprise features**: Security scanning, compliance, and registry management
- **Wave service**: On-demand container building and augmentation

**Other methods include:**
- **Singularity/Apptainer**: HPC-focused containers that don't require root privileges
- **Podman**: Rootless Docker alternative
- **Buildah**: Build OCI-compliant images without Docker daemon
- **Cloud services**: AWS ECR, Google Container Registry, Azure Container Registry
- **GitHub Actions**: Automated container building in CI/CD pipelines