# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [1]:
!docker info

Client:
 Version:    28.4.0
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  ai: Docker AI Agent - Ask Gordon (Docker Inc.)
    Version:  v1.9.11
    Path:     /Users/fbarlow/.docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.28.0-desktop.1
    Path:     /Users/fbarlow/.docker/cli-plugins/docker-buildx
  cloud: Docker Cloud (Docker Inc.)
    Version:  v0.4.29
    Path:     /Users/fbarlow/.docker/cli-plugins/docker-cloud
  compose: Docker Compose (Docker Inc.)
    Version:  v2.39.4-desktop.1
    Path:     /Users/fbarlow/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.42
    Path:     /Users/fbarlow/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Docker Inc.)
    Version:  v0.2.0
    Path:     /Users/fbarlow/.docker/cli-plugins/docker-desktop
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.31
    Path:     /Users/fbarlow/.docker/cli-p

### What is a container?
A container is a fully functional and portable computing environment surrounding apps and keeping them independent of other environments running in parallel. [1]

[1] https://en.wikipedia.org/wiki/Containerization_(computing)

### Why do we use containers?
Containers are used because they make it easy to share CPU, memory, storage and network resources at the OS level and offer a logical packing mechanism for applications. [2]

[2] https://cloud.google.com/learn/what-are-containers

### What is a docker image?
Docker images are read only templates that contain instructions for creating a container. [3]

[3] https://aws.amazon.com/compare/the-difference-between-docker-images-and-containers/

### Let's run our first docker image:

### Login to docker

In [2]:
# This you need to do on the command line directly

### Run your first docker container

In [3]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

In [4]:
!docker container ls -a
# id: 7b5717ed564a

CONTAINER ID   IMAGE                                                              COMMAND                  CREATED              STATUS                              PORTS     NAMES
f219a38c3956   hello-world                                                        "/hello"                 1 second ago         Exited (0) Less than a second ago             fervent_dirac
f1fdd534a5b0   312ccfaa7abf                                                       "salmon --version"       38 seconds ago       Exited (0) 37 seconds ago                     affectionate_curie
79d642d44241   1f751b69ce84                                                       "salmon --version"       About a minute ago   Exited (0) 58 seconds ago                     sweet_merkle
7063c254258c   545290230b77                                                       "salmon --version"       3 minutes ago        Exited (133) 3 minutes ago                    lucid_wu
d6994f9f65ee   b339c3c68396                                          

### Delete the container again, give prove its deleted

In [None]:
!docker container rm magical_mclean # this worked the first time with the container name

Error response from daemon: No such container: magical_mclean


In [17]:
!docker container ls -a

CONTAINER ID   IMAGE                                                              COMMAND                  CREATED              STATUS                           PORTS     NAMES
f219a38c3956   hello-world                                                        "/hello"                 2 seconds ago        Exited (0) 1 second ago                    fervent_dirac
f1fdd534a5b0   312ccfaa7abf                                                       "salmon --version"       39 seconds ago       Exited (0) 38 seconds ago                  affectionate_curie
79d642d44241   1f751b69ce84                                                       "salmon --version"       About a minute ago   Exited (0) 59 seconds ago                  sweet_merkle
7063c254258c   545290230b77                                                       "salmon --version"       3 minutes ago        Exited (133) 3 minutes ago                 lucid_wu
d6994f9f65ee   b339c3c68396                                                       "s

### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. I clicked on the download section and picked the correct version for my architecture.
2. To run it I pressed the file -> open and selected a fastq from an earlier notebook

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [25]:
# pull the container
!docker image pull community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29
!docker container create community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29

0.12.1--af7a5314d5015c29: Pulling from library/fastqc
Digest: sha256:b7f6caf359264cf86da901b0aa5f66735a6506fcfbf103c66db6987253ad44c1
Status: Image is up to date for community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29
community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29
67500e1a575d6424dc229bff26a193d833ed47c83752e22ec38d56522329e899


In [34]:
# run the container and save the results to a new "fastqc_results" directory
!mkdir fastqc_results

!docker run --rm \
  -v /Users/fbarlow/Desktop/VSCODE-MAIN/CWBD/computational-workflows-2025/notebooks/day_02/fetchngs-out/fastq:/data \
  -v $(pwd)/fastqc_results:/out \
  community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29 \
  fastqc -o /out /data/SRX19144486_SRR23195516_1.fastq.gz

mkdir: cannot create directory ‘fastqc_results’: File exists
^C


### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

### What would you say, which approach is more reproducible?
The container is more reproducible as it's more of an isolated system with it's own dependencies

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [26]:
# open the file "my_dockerfile" in a text editor


### Explain the RUN and ENV lines you added to the file

RUN indicates the commands that are to be ran while building the image while ENV sets environment variables inside the image.

In [27]:
# build the docker image
!docker build -t mycowsay -f my_dockerfile .

[1A[1B[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 859B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 859B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition

In [29]:
# make sure that the image has been built
!docker images

REPOSITORY                                                 TAG                        IMAGE ID       CREATED          SIZE
mycowsay                                                   latest                     5f8d4eeac047   21 seconds ago   183MB
mysalmon                                                   latest                     f2188558e7b7   19 hours ago     440MB
hello-world                                                latest                     54e66cc1dd1f   7 weeks ago      16.9kB
community.wave.seqera.io/library/fastqc                    0.12.1--af7a5314d5015c29   b7f6caf35926   11 months ago    1.37GB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       e0de72408557   17 months ago    1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          099d0e113ec8   18 months ago    1.82GB
combinelab/salmon                                          latest                     cefd8bb0b2ed   18 months ago    152MB
quay.

In [31]:
# run the docker file 
!docker run mycowsay cowsay "wo ist mein Hund?"

 ___________________
< wo ist mein Hund? >
 -------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [11]:
# use the file "salmon_docker" in this directory to build a new docker image
!docker buildx build --platform linux/amd64 -t mysalmon -f salmon_docker .

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 546B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.1s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 546B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker  

In [12]:
# run the docker image to give out the version of salmon
!docker run --platform linux/amd64 mysalmon

salmon 1.5.2


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

It's not necessary to create a docker image every time as we can find images on https://hub.docker.com/ if they're uploaded. 

BioContainers is a project that aims to create, manage and distribute bioinformatics packages and containers based on the frameworks Conda, Docker and Singularity. [4]

[4] https://biocontainers-edu.readthedocs.io/en/latest/what_is_biocontainers.html

In [13]:
!docker pull combinelab/salmon

Using default tag: latest
latest: Pulling from combinelab/salmon

[1B9485d7ab: Pulling fs layer 
[1B893c1bc1: Pulling fs layer 
[1B0bdd40c3: Pulling fs layer 
[1B1ed9ab84: Pulling fs layer 
[3BDigest: sha256:cefd8bb0b2ed9b07f22b5f0fc317ddda540e5b0dc00810d1ff0d92fee5d80370[2K[1A[2K[3A[2K[3A[2K[2A[2K[2A[2K[1A[2K[3A[2K[3A[2K[1A[2K[1A[2K[1A[2K[3A[2KDownloading  5.243MB/16.42MB[3A[2K[1A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[3A[2K[4A[2K[3A[2K
Status: Downloaded newer image for combinelab/salmon:latest
docker.io/combinelab/salmon:latest


In [25]:
!docker run --platform linux/amd64 combinelab/salmon salmon

salmon v1.10.3

Usage:  salmon -h|--help or 
        salmon -v|--version or 
        salmon -c|--cite or 
        salmon [--no-version-check] <COMMAND> [-h | options]

Commands:
     index      : create a salmon index
     quant      : quantify a sample
     alevin     : single cell analysis
     swim       : perform super-secret operation
     quantmerge : merge multiple quantifications into a single file


## Are there other ways to create Docker (or Apptainer) images?

What is https://seqera.io/containers/ ?

Docker images can be created by selecting desired packages at https://seqera.io/containers/ and adding them to a container. When we are satisfied with the desired packages we can generate a container and Seqera will build  an image with all the selected packages which we can build on our machine.