# A short introduction to containerized software

After spending time using nf-core pipelines to answer bioinformatic questions, we will focus on the processes that lie behind these pipelines now.

Today, we will focus on containerization, namely via Docker. 



1. Check if Docker is installed.

In [1]:
!docker info

Client:
 Version:    28.4.0
 Context:    default
 Debug Mode: false
 Plugins:
  ai: Docker AI Agent - Ask Gordon (Docker Inc.)
    Version:  v1.9.11
    Path:     /usr/local/lib/docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.28.0-desktop.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-buildx
  cloud: Docker Cloud (Docker Inc.)
    Version:  v0.4.29
    Path:     /usr/local/lib/docker/cli-plugins/docker-cloud
  compose: Docker Compose (Docker Inc.)
    Version:  v2.39.4-desktop.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.42
    Path:     /usr/local/lib/docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Docker Inc.)
    Version:  v0.2.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-desktop
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.31
    Path:     /usr/local/lib/docker/cli-plugins/docker

### What is a container?

In [None]:
# skipped is told to ignore

### Why do we use containers?

In [None]:
# We use containers in order to be able to run the same code in different environments without having to 
# rely on virtual machines. We connect to the OS kernel through the container directly.
# This is more efficient than using a VM.

### What is a docker image?

In [None]:
# a docker image is a package whcih contains everything needed to run a piece of software, 
# including the code, runtime, libraries, environment variables, and config files.
# It is devided into layers, each layer representing an instruction in the image's Dockerfile.
# It is lightweight and read-only, as well as portable meaning that it can be run on different systems 
# in the same way.

### Let's run our first docker image:

### Login to docker

In [None]:
# This you need to do on the command line directly
# i am automatically connected to the docker.

### Run your first docker container

In [2]:
!docker run hello-world


Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/



### Find the container ID

In [3]:
!docker ps -a

CONTAINER ID   IMAGE                               COMMAND                  CREATED              STATUS                          PORTS     NAMES
13c115c429a1   hello-world                         "/hello"                 About a minute ago   Exited (0) About a minute ago             distracted_shannon
112f0a5f7ca5   quay.io/biocontainers/wget:1.20.1   "/bin/bash -c 'eval …"   25 hours ago         Exited (4) 25 hours ago                   nxf-q0zow7CTGlZGcsxa42VIK2wZ
fe77f786ce8a   hello-world                         "/hello"                 2 days ago           Exited (0) 2 days ago                     confident_faraday


In [None]:
# The container ID is 13c115c429a1 for the hello-world container

### Delete the container again, give prove its deleted

In [None]:
!docker rm 13c115c429a1

#deletes the container with this ID (hello-world comtainer)

13c115c429a1


In [None]:
!docker ps -a

# here we can see that the container is not there anymore

CONTAINER ID   IMAGE                               COMMAND                  CREATED        STATUS                    PORTS     NAMES
112f0a5f7ca5   quay.io/biocontainers/wget:1.20.1   "/bin/bash -c 'eval …"   25 hours ago   Exited (4) 25 hours ago             nxf-q0zow7CTGlZGcsxa42VIK2wZ
fe77f786ce8a   hello-world                         "/hello"                 2 days ago     Exited (0) 2 days ago               confident_faraday


### FASTQC is a very useful tool as you've learned last week. Let's try and run it from command line

Link to the software: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Please describe the steps you took to download and run the software for the example fastq file from last week below:

1. I used conda install to install fastqc in my environment (conda install fastqc)
2. Then I used the command fastqc sample.fastq.gz to run the program with input from rnaseq fastq files from day2

...

In [None]:
!fastqc ../day_02/usb_data/fastq/SRX19144486_SRR23195516_1.fastq.gz -o fastqc_out_1

### Very well, now let's try to make use of its docker container

1. create a container holding fastqc using seqera containers (https://seqera.io/containers/)
2. use the container to generate a fastqc html of the example fastq file

In [10]:
# pull the container

!docker pull community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29


0.12.1--af7a5314d5015c29: Pulling from library/fastqc

[1Bc6865366: Pulling fs layer 
[2BDigest: sha256:b7f6caf359264cf86da901b0aa5f66735a6506fcfbf103c66db6987253ad44c1[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[2A[2K[

In [17]:
!pwd

/home/lorena/loboehme1/notebooks/day_03_part2


In [20]:
# run the container and save the results to a new "fastqc_results" directory

!docker run -v /home/lorena/loboehme1/notebooks:/data community.wave.seqera.io/library/fastqc:0.12.1--af7a5314d5015c29 fastqc /data/day_02/usb_data/fastq/SRX19144486_SRR23195516_1.fastq.gz -o /data/day_03_part2/fastqc_out_2


application/gzip
Started analysis of SRX19144486_SRR23195516_1.fastq.gz
Approx 5% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 10% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 15% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 20% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 25% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 30% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 35% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 40% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 45% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 50% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 55% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 60% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 65% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 70% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 75% complete for SRX19144486_SRR23195516_1.fastq.gz
Approx 80% complete for SRX19144486_SRR23195

### Now that you know how to use a docker container, which approach was easier and which approach will be easier in the future?

In [None]:
# at first the normal fastqc ist still easier because I already knew how to but I would guess in the future
# once we get used to containers it will be easier to use them because we do not have to install anything
# on our local machine and we can be sure that the code will run the same way on different machines.
# also we can use different versions of the same software without having to install them on our local machine
# and without having to worry about dependencies.

### What would you say, which approach is more reproducible?

In [None]:
# docker is definitely more reproducible and more portable as stated above.

### Compare the file to last weeks fastqc results, are they identical?
### Is the fastqc version identical?

In [32]:
# Yes, it seems to be identical which is to be expected as we did the same thing.

## Dockerfiles

We now used Docker containers and images directly to boost our research. 

Let's create our own toy Dockerfile including the "cowsay" tool (https://en.wikipedia.org/wiki/Cowsay)

Hints:
1. Docker is Linux, so you need to know the apt-get command to install "cowsay"

In [None]:
# open the file "my_dockerfile" in a text editor

[1B[m[m a slim version of the Debian operating system.the dependencies for the image.[m[7m>[13;1H[m[36m# !TODO: add an ENV line to set environmental variables. In this case, it shoul[m[7m>[2;1H[m    [1;79H[m[22;11H[7m[ line  1/13 ( 7%), col  1/119 (  0%), char   0/675 ( 0%) ][m[22;11H                  [7m[ Justified paragraph ][m[K[1;51H[7m*[27C[m[3;21r[3;1HM[1;24r[2;73H[K

### Explain the RUN and ENV lines you added to the file

RUN apt-get update && apt-get install -y cowsay curl: update and install cowsay and curl in linux environment

ENV PATH="/usr/games:${PATH}": add the path /usr/games/ to the current path to make sure programs installed at this address (which is common) are found. This is searched first before the rest of the path is searched for the program

In [30]:
# build the docker image
!docker build -t my-cowsay -f Dockerfile .

# docker built to build the docker
# -t to name my docker image
# -f to specify the dockerfile
# . to specify the build context (here the current directory)

[1A[1B[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (1/2)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 803B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.2s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (1/2)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 803B                                       0.0s
[0m => [internal] load metadata for docker.io/library/debian:bullseye-slim    0.3s
[?25h[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (2/2)                                          docker:default
[34m => [internal] load build definition

In [26]:
# make sure that the image has been built

!docker images


REPOSITORY                                                 TAG                        IMAGE ID       CREATED          SIZE
my-cowsay                                                  latest                     8b562a3d9e65   44 seconds ago   229MB
hello-world                                                latest                     54e66cc1dd1f   7 weeks ago      20.3kB
community.wave.seqera.io/library/fastqc                    0.12.1--af7a5314d5015c29   b7f6caf35926   11 months ago    1.37GB
quay.io/biocontainers/r-shinyngs                           1.8.8--r43hdfd78af_0       e0de72408557   17 months ago    1.99GB
quay.io/biocontainers/atlas-gene-annotation-manipulation   1.1.1--hdfd78af_0          099d0e113ec8   18 months ago    1.82GB
quay.io/nf-core/ubuntu                                     20.04                      59e9d08d8dc1   2 years ago      110MB
quay.io/biocontainers/pandas                               1.5.2                      cbb54fcf8730   2 years ago      493MB
quay.

In [28]:
# run the docker file 

!docker run my-cowsay cowsay "Hello from my custom docker image!"


 ____________________________________
< Hello from my custom docker image! >
 ------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


## Let's do some bioinformatics with the docker file and create a new docker file that holds the salmon tool used in rnaseq

To do so, use "curl" in your new dockerfile to get salmon from https://github.com/COMBINE-lab/salmon/releases/download/v1.5.2/salmon-1.5.2_linux_x86_64.tar.gz

In [None]:
# use the file "salmon_docker" in this directory to build a new docker image

In [None]:
# build the image


In [1]:
# run the docker image to give out the version of salmon


## Do you think bioinformaticians have to create a docker image every time they want to run a tool?

Find the salmon docker image online and run it on your computer.

What is https://biocontainers.pro/ ?

## Are there other ways to create Docker (or Apptainer) images?

What is https://seqera.io/containers/ ?