<a href="https://colab.research.google.com/github/eharrelson17/metagenomics_qiime2/blob/main/qiime2_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a quick tutorial on how to use qiime2 on google colab. We will be using preprocessed fasta files. These files have been trimmed of their primers and barcodes.

Note: Google colab is a python notebook that runs through a Linux Virtual Machine. To run our commands, we have to make sure to use proper syntax to let the notebook know we want to run terminal commands not python script. That is why before commands, there are **!** and **%**. When doing command line work locally in terminal, you do not need to use this syntax.

First step is to download the files needed for this tutorial.

In [1]:
!git clone https://github.com/eharrelson17/metagenomics_qiime2.git

Cloning into 'metagenomics_qiime2'...
remote: Enumerating objects: 20, done.[K
remote: Counting objects: 100% (20/20), done.[K
remote: Compressing objects: 100% (18/18), done.[K
remote: Total 20 (delta 5), reused 0 (delta 0), pack-reused 0[K
Receiving objects: 100% (20/20), 2.53 MiB | 3.16 MiB/s, done.
Resolving deltas: 100% (5/5), done.


This next command will take us into the directory with the correct files.

In [2]:
%cd metagenomics_qiime2/

/content/metagenomics_qiime2


Do not be scared of this next part, all you have to do is run the code. This is what installs miniconda in qiime2. Installation locally is very different, and doesn't require python script. Do not try to run this code outside of google colab. You can find this code in the file labled qiime2_installation.py

In [3]:
#!/usr/bin/env python3

"""Set up Qiime 2 on Google colab.

Do not use this on o local machine, especially not as an admin!
"""

import os
import sys
import shutil
from subprocess import Popen, PIPE

r = Popen(["pip", "install", "rich"])
r.wait()
from rich.console import Console  # noqa
con = Console()

PREFIX = "/usr/local/miniforge3/"

has_conda = "conda version" in os.popen("%s/bin/conda info" % PREFIX).read()
has_qiime = "QIIME 2 release:" in os.popen("qiime info").read()


MINICONDA_PATH = (
    "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"
)

QIIME_YAML_TEMPLATE = (
    "https://data.qiime2.org/distro/amplicon/qiime2-amplicon-{version}-py{python}-linux-conda.yml"
)

if len(sys.argv) == 2:
    version = sys.argv[1]
else:
    version = "2023.9"

if tuple(float(v) for v in version.split(".")) < (2021, 4):
    pyver = "36"
else:
    pyver = "38"

CONDA = "mamba"
CONDA_ARGS = ["-q"]

if tuple(float(v) for v in version.split(".")) < (2023, 9):
    QIIME_YAML_TEMPLATE = (
        "https://data.qiime2.org/distro/amplicon/qiime2-amplicon-2023.9-py38-linux-conda.yml"
    )

QIIME_YAML_URL = QIIME_YAML_TEMPLATE.format(version=version, python=pyver)
QIIME_YAML = os.path.basename(QIIME_YAML_URL)


def cleanup():
    """Remove downloaded files."""
    if os.path.exists(os.path.basename(MINICONDA_PATH)):
        os.remove(os.path.basename(MINICONDA_PATH))
    if os.path.exists(QIIME_YAML):
        os.remove(QIIME_YAML)
    if os.path.exists("/content/sample_data"):
        shutil.rmtree("/content/sample_data")
    con.log(":broom: Cleaned up unneeded files.")


def run_and_check(args, check, message, failure, success, console=con):
    """Run a command and check that it worked."""
    console.log(message)
    r = Popen(args, env=os.environ, stdout=PIPE, stderr=PIPE,
              universal_newlines=True)
    o, e = r.communicate()
    out = o + e
    if r.returncode == 0 and check in out:
        console.log("[blue]%s[/blue]" % success)
    else:
        console.log("[red]%s[/red]" % failure, out)
        cleanup()
        sys.exit(1)


if __name__ == "__main__":
    if not has_conda:
        run_and_check(
            ["wget", MINICONDA_PATH],
            "saved",
            ":snake: Downloading miniforge...",
            "failed downloading miniforge :sob:",
            ":snake: Done."
        )

        run_and_check(
            ["bash", os.path.basename(MINICONDA_PATH), "-bfp", PREFIX],
            "installation finished.",
            ":snake: Installing miniforge...",
            "could not install miniforge :sob:",
            ":snake: Installed miniforge to `/usr/local`."
        )
    else:
        con.log(":snake: Miniforge is already installed. Skipped.")

    if not has_qiime:
        run_and_check(
            ["wget", QIIME_YAML_URL],
            "saved",
            ":mag: Downloading Qiime 2 package list...",
            "could not download package list :sob:",
            ":mag: Done."
        )

        run_and_check(
            [PREFIX + "bin/" + CONDA, "env", "create", *CONDA_ARGS, "--prefix", "/usr/local", "--file", QIIME_YAML],
            "Verifying transaction: ...working... done",
            ":mag: Installing Qiime 2. This may take a little bit.\n :clock1:",
            "could not install Qiime 2 :sob:",
            ":mag: Done."
        )

        run_and_check(
            ["pip", "install", "empress"],
            "Successfully installed empress-",
            ":evergreen_tree: Installing Empress...",
            "could not install Empress :sob:",
            ":evergreen_tree: Done."
        )
    else:
        con.log(":mag: Qiime 2 is already installed. Skipped.")

    run_and_check(
        ["qiime", "info"],
        "QIIME 2 release:",
        ":bar_chart: Checking that Qiime 2 command line works...",
        "Qiime 2 command line does not seem to work :sob:",
        ":bar_chart: Qiime 2 command line looks good :tada:"
    )

    if sys.version_info[0:2] == (int(pyver[0]), int(pyver[1])):
        sys.path.append("/usr/local/lib/python3.{}/site-packages".format(pyver[1]))
        con.log(":mag: Fixed import paths to include Qiime 2.")

        con.log(":bar_chart: Checking if Qiime 2 import works...")
        try:
            import qiime2  # noqa
        except Exception:
            con.log("[red]Qiime 2 can not be imported :sob:[/red]")
            sys.exit(1)
        con.log("[blue]:bar_chart: Qiime 2 can be imported :tada:[/blue]")

    cleanup()

    con.log("[green]Everything is A-OK. "
            "You can start using Qiime 2 now :thumbs_up:[/green]")


Now lets make sure qiime2 is really installed properly!!!

When installing anything that is ran through terminal, it is helpful to check installation by using the --help command.

In [4]:
!qiime tools --help

Usage: [94mqiime tools[0m [OPTIONS] COMMAND [ARGS]...

  Tools for working with QIIME 2 files.

[1mOptions[0m:
  [94m--help[0m      Show this message and exit.

[1mCommands[0m:
  [94mcache-create[0m              Create an empty cache at the given location.
  [94mcache-fetch[0m               Fetches an artifact out of a cache into a .qza.
  [94mcache-garbage-collection[0m  Runs garbage collection on the cache at the
                            specified location.
  [94mcache-remove[0m              Removes a given key from a cache.
  [94mcache-status[0m              Checks the status of the cache.
  [94mcache-store[0m               Stores a .qza in the cache under a key.
  [94mcast-metadata[0m             Designate metadata column types.
  [94mcitations[0m                 Print citations for a QIIME 2 result.
  [94mexport[0m                    Export data from a QIIME 2 Artifact or a
                            Visualization
  [94mextract[0m                   

Now we will do some actual work with qiime!!!
First step is to import our fasta file into qiime and create a qiime2 artifact. These are specific file types that qiime2 can read. To learn more about qiime artifacts, please visit: https://docs.qiime2.org/2023.9/concepts/#data-files-qiime-2-artifacts

The file endings of the artifacts can be .qza or .qzv. qzv files can be put into [qiime2 view](https://view.qiime2.org/) and are visuals that can be downloaded and modified through qiime2 view's site. Qza files are not visual files, however are the files that are inputed and outputed after different qiime commands.

In [5]:
!qiime tools import \
  --input-path practice.fasta \
  --output-path seqs.qza \
  --type 'SampleData[Sequences]'


[32mImported practice.fasta as QIIME1DemuxDirFmt to seqs.qza[0m
[0m

Now that we have our sequences in the qiime2 artifact mode, we can move on with our analysis. This next command dereplicates the sequences using a tool in qiime called vsearch. This will give us a table that we will use later on in analysis.

In [6]:
!qiime vsearch dereplicate-sequences \
  --i-sequences seqs.qza \
  --o-dereplicated-table table.qza \
  --o-dereplicated-sequences rep-seqs.qza

[32mSaved FeatureTable[Frequency] to: table.qza[0m
[32mSaved FeatureData[Sequence] to: rep-seqs.qza[0m
[0m

You might notice that qiime gives us a message that says:

 Saved FeatureTable[Frequency] to: table.qza
Saved FeatureData[Sequence] to: rep-seqs.qza

To find out more about the different file types in qiime please go here, but I will explain these really quick [here](https://docs.qiime2.org/2023.9/semantic-types/).

FeatureTable[Frequency]: A feature table (e.g., samples by OTUs) where each value indicates the frequency of an OTU in the corresponding sample expressed as raw counts.

FeatureData[Sequence]: A single unaligned sequence associated with a feature identifier (e.g. a representative sequence).

Next step we are going to take is De Novo clustering. This particular type of clustering does not use a reference database. Clustering into OTUs makes our analysis easier to handle computationally. We will be clustering our OTUs by 99% identity.

This command uses our feature table and feature data and creates a new clustered version.

In [7]:
!qiime vsearch cluster-features-de-novo \
  --i-table table.qza \
  --i-sequences rep-seqs.qza \
  --p-perc-identity 0.99 \
  --o-clustered-table table-dn-99.qza \
  --o-clustered-sequences rep-seqs-dn-99.qza

[32mSaved FeatureTable[Frequency] to: table-dn-99.qza[0m
[32mSaved FeatureData[Sequence] to: rep-seqs-dn-99.qza[0m
[0m

Now that we have our clustered sequences, we can do taxonomic analysis using the green gene 16S database. This is a small database that isn't robust, but since it's smaller it is easier to use with the little computational power we have through google colab.
Qiime has qiime formatted classifier files of different databases on their website. You can find this classifier and others [here](https://docs.qiime2.org/2023.9/data-resources/). This next command downloads the database.

In [11]:
!wget https://data.qiime2.org/2023.9/common/gg-13-8-99-nb-weighted-classifier.qza

--2024-02-02 17:52:40--  https://data.qiime2.org/2023.9/common/gg-13-8-99-nb-weighted-classifier.qza
Resolving data.qiime2.org (data.qiime2.org)... 54.200.1.12
Connecting to data.qiime2.org (data.qiime2.org)|54.200.1.12|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2023.9/common/gg-13-8-99-nb-weighted-classifier.qza [following]
--2024-02-02 17:52:40--  https://s3-us-west-2.amazonaws.com/qiime2-data/2023.9/common/gg-13-8-99-nb-weighted-classifier.qza
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.92.148.112, 52.92.229.152, 52.92.153.224, ...
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.92.148.112|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104964916 (100M) [binary/octet-stream]
Saving to: ‘gg-13-8-99-nb-weighted-classifier.qza’


2024-02-02 17:52:43 (36.5 MB/s) - ‘gg-13-8-99-nb-weighted-classifier.qza’ saved [104964916/1049649

Now lets use the classifier against our sequences.

In [12]:
!qiime feature-classifier classify-sklearn \
  --i-classifier gg-13-8-99-nb-weighted-classifier.qza \
  --i-reads rep-seqs-dn-99.qza \
  --o-classification taxonomy.qza

[32mSaved FeatureData[Taxonomy] to: taxonomy.qza[0m
[0m

This next step makes a visual file you can look at in the [qiime view website](https://view.qiime2.org/).

After getting the output, put the download and drop the qzv file into the view.

In [13]:
!qiime metadata tabulate \
  --m-input-file taxonomy.qza \
  --o-visualization taxonomy.qzv

[32mSaved Visualization to: taxonomy.qzv[0m
[0m

Now we can make a taxonomy bar plot! For this we have to use a metadata table.
A metadata table is made in a spreadsheet (excel), and is saved into a tab delimited text file. This file tells the program how to seperate the sequences into the samples that they came from. It also gives the program information about the samples which can be used to change our bar plot based on different factors within our sample. I recommend looking at the metadata table provided in this tutorial, and also checking out qiime2 metatable doc [here.](https://docs.qiime2.org/2023.9/tutorials/metadata/)

In [14]:
!qiime taxa barplot \
  --i-table table-dn-99.qza \
  --i-taxonomy taxonomy.qza \
  --m-metadata-file metadata1.txt \
  --o-visualization taxa-bar-plots.qzv

[32mSaved Visualization to: taxa-bar-plots.qzv[0m
[0m

You can then download and drag the taxa-bar-plots.qzv file into [qiime view](https://view.qiime2.org/) and see the bacteria within your fasta!