Skip to content

Lab 05: Finding and Using Containers

Beant Kapoor edited this page Jan 23, 2024 · 5 revisions

Finding and Using Containers

One of the first steps in using existing hosted images is to find them. There are many sites hosting images, but it's important to select signed images from a trusted source. Fortunately, many of the commonly used images in bioinformatics are hosted via the BioContainers community project or similarly curated groups.

Good places to find bioinformatics containers include:

Practice Dataset

For this pipeline, we'll be using a practice RNA-Seq dataset from the Griffith Lab. This dataset is ideal for practice because it consists of 6 human RNA samples subset down to the 22nd chromosome only.

If you haven't already done so, navigate to the /lustre/isaac/proj/UTK0262/<your_username>/NF_workshop/raw_data/fastq_seqs. Run the get_fastq_seqs.sh script to start downloading the raw fastq reads. While they download, continue below.

Find FastQC Image

Let's find the first application we want to use in our pipeline, FastQC.

quay.io BioContainers FastQC

Although we can just pull the latest image, it's a good practice to look at the recent releases under the "tags" page to see if there are major security concerns. Additionally, providing a image link to a specific release will in theory keep your data analysis pipeline from having interoperability issues down the road if package updates suddenly produce output that clashes with downstream processes.

After assessing the tags and finding the desired version, click on "Fetch Tag" and choose a platform to see an easily copy-and-paste link to the container. For example, quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0.

Because ISAAC-NG has singularity already available, we can go ahead and pull this image using the singularity pull command. Given that this image is a docker image, we will need to pull the image by prefixing the above "quay" path with docker://

singularity pull docker://quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0

HOWEVER, you don't have to run this command as this has already been copied to a bash script for you in <your_username>/NF_workshop/exercises/05_singularity/pull_fastqc.sh. Go to dir 05_singularity and run -

bash pull_fastqc.sh

Now we should have "fastqc:0.12.1--hdfd78af_0.sif" sitting in our current directory. Let's take a look at it!

Interactive Container

Let's run our container in such a way that we can "hop inside" the container and see what it has to offer. Run the following:

singularity shell -B <full path to fastq sequences> fastqc_0.12.1--hdfd78af_0.sif

⚠️ By default, Singularity only mounts the container to your home directory. To allow access to other locations, the "-B" or "bind" variable must be provided. If your directory contains symlinks with source files outside of the mounted directories, singularity will be unable to locate these files.

Once the image loads, try the following commands:

cat /etc/os-release

The output tells us that the architecture of this container is built on "Debian GNU/Linux 10 (buster)". This is pretty cool because at the time of writing this our host node on ISAAC-NG is running "Red Hat Enterprise Linux 8.7 (Ootpa)".

Now let's make sure fastqc is available in our path by running "--version". Let's also change to our "fastq_seqs" directory to actually test out the image.

cd <path to fastq seqs>

fastqc --version

mkdir shell_test
fastqc -o shell_test \
  HBR_Rep1_ERCC-Mix2_Build37-ErccTranscripts-chr22.read1.fastq.gz \
  HBR_Rep2_ERCC-Mix2_Build37-ErccTranscripts-chr22.read2.fastq.gz

To exit the interactive shell, simply type:

exit

Running Container Commands with "Exec"

Now that we've seen the functionality of running a Singularity container interactively, we can run the same FastQC command outside the container using exec.

cd <path to fastq seqs>

mkdir exec_test
singularity exec -B $PWD ../../exercises/05_singularity/fastqc_0.12.1--hdfd78af_0.sif \
fastqc -o exec_test \
  HBR_Rep1_ERCC-Mix2_Build37-ErccTranscripts-chr22.read1.fastq.gz \
  HBR_Rep2_ERCC-Mix2_Build37-ErccTranscripts-chr22.read2.fastq.gz

Here we used the $PWD environmental variable to tell Singularity to bind to our current working directory.