# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [1]:
!docker info

Client:
 Version:    28.4.0
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  ai: Docker AI Agent - Ask Gordon (Docker Inc.)
    Version:  v1.9.11
    Path:     /Users/leo/.docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.28.0-desktop.1
    Path:     /Users/leo/.docker/cli-plugins/docker-buildx
  cloud: Docker Cloud (Docker Inc.)
    Version:  v0.4.29
    Path:     /Users/leo/.docker/cli-plugins/docker-cloud
  compose: Docker Compose (Docker Inc.)
    Version:  v2.39.4-desktop.1
    Path:     /Users/leo/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.42
    Path:     /Users/leo/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Docker Inc.)
    Version:  v0.2.0
    Path:     /Users/leo/.docker/cli-plugins/docker-desktop
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.31
    Path:     /Users/leo/.docker/cli-plug

### What is a container?

A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a  package of software that includes everything needed to run an application.

### Why do we use containers?

Containers are a solution to the problem of how to get software to run reliably when moved from one computing environment to another. This could be from a developer's laptop to a test environment or from a staging environment into production.

### What is a docker image?

A docker image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.

### Let's run our first docker image:

### Login to docker

In [3]:
# This you need to do on the command line directly
!docker login

Authenticating with existing credentials... [Username: skzw]

[1m[106m[30mi[0m[0m [96mInfo → [0m[0m[3mTo login with a different account, run 'docker logout' followed by 'docker login'[0m


Login Succeeded


### Run your first docker container

In [8]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

In [5]:
!docker ps -a
# the container ID is 92ae47af51a3

CONTAINER ID   IMAGE                                 COMMAND                  CREATED          STATUS                      PORTS     NAMES
92ae47af51a3   hello-world                           "/hello"                 22 seconds ago   Exited (0) 22 seconds ago             jolly_franklin
80d8d833cd77   quay.io/biocontainers/python:3.9--1   "/usr/local/env-exec…"   22 hours ago     Exited (255) 2 hours ago              nxf-6d0YItbab874y0T8TebPnAlr
70ba5de57850   quay.io/biocontainers/wget:1.20.1     "/bin/bash -c 'eval …"   24 hours ago     Exited (137) 23 hours ago             nxf-XBb0768BNUBDQS2SVyU4Pc3K
e3b9b050d1f6   quay.io/biocontainers/wget:1.20.1     "/bin/bash -c 'eval …"   24 hours ago     Exited (137) 23 hours ago             nxf-J27Y8WVESHou1jHtKbbttwlc


### Delete the container again, give prove its deleted

In [10]:
!docker ps -a
# delete the container
!docker rm 0d02280d0b49
# check if its deleted
!docker ps -a

CONTAINER ID   IMAGE                                 COMMAND                  CREATED          STATUS                      PORTS     NAMES
0d02280d0b49   hello-world                           "/hello"                 27 seconds ago   Exited (0) 26 seconds ago             jolly_mclean
80d8d833cd77   quay.io/biocontainers/python:3.9--1   "/usr/local/env-exec…"   22 hours ago     Exited (255) 3 hours ago              nxf-6d0YItbab874y0T8TebPnAlr
70ba5de57850   quay.io/biocontainers/wget:1.20.1     "/bin/bash -c 'eval …"   24 hours ago     Exited (137) 23 hours ago             nxf-XBb0768BNUBDQS2SVyU4Pc3K
e3b9b050d1f6   quay.io/biocontainers/wget:1.20.1     "/bin/bash -c 'eval …"   24 hours ago     Exited (137) 23 hours ago             nxf-J27Y8WVESHou1jHtKbbttwlc
0d02280d0b49
CONTAINER ID   IMAGE                                 COMMAND                  CREATED        STATUS                      PORTS     NAMES
80d8d833cd77   quay.io/biocontainers/python:3.9--1   "/usr/local/env-exe

### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. Go to fastqc website and dowload the zip file for command line use
2. add execution permissions to the fastqc file
3. make it availabl everywhere
...

In [11]:
!fastqc --help


            FastQC - A high throughput sequence QC analysis tool

SYNOPSIS

	fastqc seqfile1 seqfile2 .. seqfileN

    fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam] 
           [-c contaminant file] seqfile1 .. seqfileN

DESCRIPTION

    FastQC reads a set of sequence files and produces from each one a quality
    control report consisting of a number of different modules, each one of 
    which will help to identify a different potential type of problem in your
    data.
    
    If no files to process are specified on the command line then the program
    will start as an interactive graphical application.  If files are provided
    on the command line then the program will run with no user interaction
    required.  In this mode it is suitable for inclusion into a standardised
    analysis pipeline.
    
    The options for the program as as follows:
    
    -h --help       Print this help file and exit
    
    -v --version    Print the vers

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [13]:
# pull the container
!docker pull quay.io/biocontainers/fastqc:0.11.9--0

0.11.9--0: Pulling from biocontainers/fastqc

[1BDigest: sha256:70de12400206b9c1784c8dfd019cfe4e42eed9a42eabf6c61eb68342843bdaab[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[2K[1A[

In [27]:
!docker run --rm -v $(pwd):/data quay.io/biocontainers/fastqc:0.11.9--0 fastqc /data/SRX19144486_SRR23195516_1.fastq.gz -o /data/fastqc_results



Started analysis of SRX19144486_SRR23195516_1.fastq.gz
Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 35% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 40% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 45% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 50% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 55% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 60% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 65% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 70% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 75% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 80% complete for SRX19144486_SRR231955

### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

### What would you say, which approach is more reproducible?
The docker approach is more reproducible because the container holds all dependencies and versions of the software.

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?
They are probably not identical. 

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [None]:
# open the file "my_dockerfile" in a text editor


### Explain the RUN and ENV lines you added to the file

In [28]:
# build the docker image
!docker build -t mycowsay -f my_dockerfile .

[1A[1B[0G[?25l
[?25h[1A[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 943B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.1s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 943B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                    docker:desktop-linux
[34m =>

In [29]:
# make sure that the image has been built
!docker images


REPOSITORY                                                 TAG                     IMAGE ID       CREATED          SIZE
mycowsay                                                   latest                  51872f9f9aad   15 seconds ago   197MB
hello-world                                                latest                  54e66cc1dd1f   7 weeks ago      20.3kB
quay.io/biocontainers/samtools                             1.21--h50ea8bc_0        783c6646029a   12 months ago    108MB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0    e0de72408557   17 months ago    1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0       099d0e113ec8   18 months ago    1.82GB
quay.io/biocontainers/pandas                               1.5.2                   cbb54fcf8730   2 years ago      493MB
quay.io/biocontainers/r-base                               4.2.1                   6721ee8bfba2   2 years ago      1.17GB
quay.io/biocontainers

In [30]:
# run the docker file 
!docker run mycowsay


 __
<  >
 --
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [37]:
# use the file "salmon_docker" in this directory to build a new docker image 
! docker build -t mysalmon -f salmon_docker .



[1A[1B[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 542B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.1s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/3)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 542B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
 => [auth] library/debian:pull token for registry-1.docker.io              0.0s
[?25h[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.4s (2/3)   

In [38]:
# build the image
!docker images


REPOSITORY                                                 TAG                     IMAGE ID       CREATED          SIZE
mysalmon                                                   latest                  5948c446ed3e   35 seconds ago   407MB
mycowsay                                                   latest                  51872f9f9aad   9 minutes ago    197MB
hello-world                                                latest                  54e66cc1dd1f   7 weeks ago      20.3kB
quay.io/biocontainers/samtools                             1.21--h50ea8bc_0        783c6646029a   12 months ago    108MB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0    e0de72408557   17 months ago    1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0       099d0e113ec8   18 months ago    1.82GB
quay.io/biocontainers/pandas                               1.5.2                   cbb54fcf8730   2 years ago      493MB
quay.io/biocontainers/

In [39]:
# run the docker image to give out the version of salmon
!docker run mysalmon salmon


salmon v1.5.2

Usage:  salmon -h|--help or 
        salmon -v|--version or 
        salmon -c|--cite or 
        salmon [--no-version-check] <COMMAND> [-h | options]

Commands:
     index      : create a salmon index
     quant      : quantify a sample
     alevin     : single cell analysis
     swim       : perform super-secret operation
     quantmerge : merge multiple quantifications into a single file


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.
https://hub.docker.com/r/combinelab/salmon

What is https://biocontainers.pro/ ?
A community-driven project to create and manage bioinformatics software containers. (from github)

## Are there other ways to create Docker (or Apptainer) images?
Docker images can be created with other tools for example with conda and mamba.

What is https://seqera.io/containers/ ?
A registry for container images optimized for bioinformatics. (from their website)
