# A short introduction to containerized software

After spending using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [1]:
!docker info

Client: Docker Engine - Community
 Version:    27.3.1
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.2-desktop.1
    Path:     /usr/lib/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.2-desktop.2
    Path:     /usr/lib/docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.34
    Path:     /usr/lib/docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Alpha) (Docker Inc.)
    Version:  v0.0.15
    Path:     /usr/lib/docker/cli-plugins/docker-desktop
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /usr/lib/docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.25
    Path:     /usr/lib/docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.5
    Path:     /usr

### What is a container?

A docker container is an environments that includes the code and all dependencies of the code that an application needs to run [[source]](https://www.docker.com/resources/what-container/). 

### Why do we use containers?

We use containers because they include everything a application needs to run and they make results reproducible across different environments [[source]](https://www.docker.com/resources/what-container/).

### What is a docker image?

A docker container needs files, binaries, libraries and configurations to run[[source]](https://docs.docker.com/get-started/docker-concepts/the-basics/what-is-an-image/). The docker image of a docker contains these information. 

### Let's run our first docker image:

### Login to docker

In [2]:
# This you need to do on the command line directly

### Run your first docker container

In [131]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

In [132]:
!docker ps --all 

CONTAINER ID   IMAGE         COMMAND    CREATED         STATUS                    PORTS     NAMES
e27a1ad13e52   hello-world   "/hello"   2 seconds ago   Exited (0) 1 second ago             serene_aryabhata


### Delete the container again, give prove its deleted

In [134]:
!docker rm serene_aryabhata

serene_aryabhata


In [135]:
!docker ps --all

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. Check if java is installed.
2. Download of fastqc_v0.12.1.zip.
3. Create a directory for the installation.
4. Unzip the file.
5. Add the path of the tool to $PATH.
6. mkdir DAY4_fastqc
7. fastqc fastq/SRX19144486_SRR23195516_1.fastq.gz fastq/SRX19144486_SRR23195516_2.fastq.gz fastq/SRX19144488_SRR23195511_1.fastq.gz fastq/SRX19144488_SRR23195511_2.fastq.gz -o DAY4_fastqc


### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [145]:
# pull the container
!docker pull community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42

0.12.1--5cfd0f3cb6760c42: Pulling from library/fastqc

[1B6e1d0b98: Pulling fs layer 
[1Bf787139d: Pulling fs layer 
[1Bb700ef54: Pulling fs layer 
[1B62c12ca7: Pulling fs layer 
[1Bcbe24e91: Pulling fs layer 
[1B55536720: Pulling fs layer 
[1Bb3717211: Pulling fs layer 
[1B1e94977b: Pulling fs layer 
[1Bf7ad9b3c: Pulling fs layer 
[1B529b9f20: Pulling fs layer 
[1Bca300600: Pulling fs layer 
[1Bd418774c: Pulling fs layer 
[13Be1d0b98: Download complete MB/404.5MBA[2K[9A[2K[10A[2K[12A[2K[11A[2K[11A[2K[12A[2K[5A[2K[8A[2K[1A[2K[13A[2K[1A[2K[3A[2K[7A[2K[3A[2K[3A[2K[7A[2K[7A[2K[7A[2K[13A[2K[7A[2K[13A[2K[3A[2K[3A[2K[13A[2K[13A[2K[13A[2K[13A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[13A[2K[7A[2K[13A[2K[13A[2K[7A[2K[13A[2K[13A[2K[7A[2K[7A[2K[13A[2K[7A[2K[7A[2K[7A[2K[13A[2K[13A[2K[13A[2K[13A[2K[7A[2K[13A[2K[7A[2K[13A[2K[13A[2K[7A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[7A

In [283]:
# run the container and save the results to a new "fastqc_results" directory
!docker run -v $(pwd):$(pwd) -w $(pwd) community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42 fastqc fastq/SRX19144486_SRR23195516_1.fastq.gz fastq/SRX19144486_SRR23195516_2.fastq.gz fastq/SRX19144488_SRR23195511_1.fastq.gz fastq/SRX19144488_SRR23195511_2.fastq.gz -o fastqc_results


/home/jana/UNI/Master/IISemester/compworkflows/day4
application/gzip
application/gzip
Started analysis of SRX19144486_SRR23195516_1.fastq.gz
application/gzip
application/gzip
Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 35% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 40% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 45% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 50% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 55% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 60% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 65% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 70% complete for SRX19144486_SRR23195516_1.fastq.gz


### Now that you know how to use a docker container, which approach between running everything manually and using docker was easier and which approach will be easier in the future?

It was easier to pull the docker image instead of doing the seven steps of the manual installation. It will also be easier in the future.

### What would you say, which approach is more reproducible?

The docker approach is more reproducible. In that approach, everyone can use the same computing environment defined by the docker container and local differences, for example with the operating system, do not make a difference.

### Compare the file to last weeks fastqc results, are they identical?

Yes, they are identical.

### Is the fastqc version identical?


rnaseq used fastqc version 0.12.1. The created docker container uses the same fasqc version 

In [51]:
!docker run -v $(pwd):$(pwd) -w $(pwd) community.wave.seqera.io/library/fastqc:0.12.1--5cfd0f3cb6760c42 fastqc --version

FastQC v0.12.1


## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [52]:
# open the file "my_dockerfile" in a text editor

### Explain the RUN and ENV lines you added to the file

RUN apt-get update<br>
RUN apt-get install -y curl<br>
RUN apt-get install -y cowsay<br>
updates the apt-get repository and installs curl and cowsay


ENV PATH="$PATH:/usr/games"<br>
adds the path /usr/games to PATH

In [53]:
# build the docker image
!pwd
!docker build -f ./my_dockerfile . -t jana/cowsay

/home/jana/UNI/Master/IISemester/compworkflows/day4
[1A[1B[0G[?25l[+] Building 0.0s (0/0)  docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.0s (0/0)  docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 857B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from my_dockerfile                    0.0s
[0m[34m => => transferring dockerfile: 857B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-sli

In [54]:
# check if image has been build
!docker image ls

REPOSITORY                                TAG                        IMAGE ID       CREATED          SIZE
jana/salmon                               latest                     fd962ac93118   13 minutes ago   820MB
jana/salmon_docker                        latest                     4401540b17c9   13 minutes ago   820MB
jana/cowsay                               latest                     787b98791d60   2 hours ago      230MB
<none>                                    <none>                     18624251e7df   2 hours ago      230MB
community.wave.seqera.io/library/fastqc   0.12.1--5cfd0f3cb6760c42   0c524d3abe26   4 months ago     1.39GB
combinelab/salmon                         latest                     cefd8bb0b2ed   6 months ago     152MB
hello-world                               latest                     91fb4b041da2   17 months ago    24.4kB


In [55]:
# run the docker file 
!docker run jana/cowsay cowsay muh

 _____
< muh >
 -----
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [56]:
# use the file "salmon_docker" in this directory to build a new docker image
!docker build -f ./salmon_docker . -t jana/salmon_docker

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 676B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.1s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 676B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.5s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker  

In [57]:
# build the image
!docker build -f ./salmon_docker . -t jana/salmon

[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                    docker:desktop-linux
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 676B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (2/2)                                    docker:desktop-linux
[34m => [internal] load build definition from salmon_docker                    0.0s
[0m[34m => => transferring dockerfile: 676B                                       0.0s
[0m[34m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[0m[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.4s (12/13)                                  docker:desktop-linux
[34m => [internal] load build definition from salmon

In [58]:
# run the docker image to give out the version of salmon

!docker run jana/salmon salmon

salmon v1.5.2

Usage:  salmon -h|--help or 
        salmon -v|--version or 
        salmon -c|--cite or 
        salmon [--no-version-check] <COMMAND> [-h | options]

Commands:
     index      : create a salmon index
     quant      : quantify a sample
     alevin     : single cell analysis
     swim       : perform super-secret operation
     quantmerge : merge multiple quantifications into a single file


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

In [59]:
#!docker pull combinelab/salmon:latest
!docker run combinelab/salmon:latest salmon

salmon v1.10.3

Usage:  salmon -h|--help or 
        salmon -v|--version or 
        salmon -c|--cite or 
        salmon [--no-version-check] <COMMAND> [-h | options]

Commands:
     index      : create a salmon index
     quant      : quantify a sample
     alevin     : single cell analysis
     swim       : perform super-secret operation
     quantmerge : merge multiple quantifications into a single file


Biocontainers is a project that gives guidelines to help developing reproducible pipelines and good software[[source]](https://biocontainers-edu.readthedocs.io/en/latest/what_is_biocontainers.html). Reproducibility gets ensured by utilizing Conda, Docker and Singularity containers. In addition Biocontainers also provides some containers.
