# qiime2 v.2019.4 dunfield lab tutorial

* Relative paths are used
* All commands imply terminal is in the current working directory
* Knowledge of linux terminal is highly incouraged

## Activate qiime2 environment

In [None]:
!conda activate qiime2-2019.4

## Optionally enable qiime-specific autocompletion

In [None]:
!source tab-qiime

## Navigate to your working directory
* qiime2_lab_tutorial already folder contains raw data
* lab_pipeline folder contains trained NaiveBayes Classifier
* mapping file

Next step expects that the folder exists (example)

In [None]:
!cd ~/Desktop/qiime2_lab_tutorial/lab_pipeline

## Let's view the directory content

In [7]:
!ls -l

total 216512
-rwxrwxrwx 1 root root     51124 May 27 20:50 dunfield_lab_qiime2_pipeline.ipynb
-rwxrwxrwx 1 root root      2580 Mar 25 12:39 mappingfile_upd4.csv
-rwxrwxrwx 1 root root 221649684 May 18 09:59 v3v4_silva132_classifier_wps2_2groups.qza


## Trim primers

* Various primer trimming tools exist
    * cutadapt
    * bbmap
    * qiime2 native tools
    * trimmomatic
    * manual removal...

It is critical to remove non-biological sequences from the data.<br>
We will remove our 16S V3-V4 region (Bacteria-specific primer set) primers sequences using cutadapt <br>
* f-primer CCTACGGGNGGCWGCAG
* r-primer GACTACHVGGGTATCTAATCC

### Making directories

In [8]:
!mkdir primer_trimmed_fastqs; mkdir cutadapt_logs

### Primer trimming w\ cutadapt with a help of a little script :)
#### ! Expects to contain our data in the <font color=red>raw_data</font> folder in a parent directory
* cutadapt logs could be found ./primer_trimmed_fastqs/logs

In [9]:
%%bash
for file1 in ../raw_data/*_R1_*.fastq.gz; do
    file2="${file1%_R1_001.fastq.gz}_R2_001.fastq.gz"
    fname1=`basename $file1`
    fname2=`basename $file2`
    `cutadapt --pair-filter any -j 4 --no-indels --discard-untrimmed \
    -g CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC \
    -o primer_trimmed_fastqs/$fname1 -p primer_trimmed_fastqs/$fname2 \
    $file1 $file2 \
    > cutadapt_logs/${fname1}_cutadapt_log.txt`
done

## Import trimmed FASTQs as a QIIME2 artifact

To keep the directory clean you can put the artifact files in a new directory

In [11]:
!mkdir paired_reads_qza

### Casava 1.8 single-end demultiplexed fastq
Format description

In the Casava 1.8 demultiplexed (single-end) format, there is one fastq.gz file for each sample in the study which contains the single-end reads for that sample. The file name includes the sample identifier and should look like L2S357_15_L001_R1_001.fastq.gz. The underscore-separated fields in this file name are:

    the sample identifier,
    the barcode sequence or a barcode identifier,
    the lane number,
    the direction of the read (i.e. only R1, because these are single-end reads), and
    the set number.

Obtaining example data

### Importing...

In [12]:
!qiime tools import --type SampleData[PairedEndSequencesWithQuality] \
                   --input-path primer_trimmed_fastqs \
                   --output-path paired_reads_qza/reads_trimmed.qza \
                   --input-format CasavaOneEightSingleLanePerSampleDirFmt

[32mImported primer_trimmed_fastqs as CasavaOneEightSingleLanePerSampleDirFmt to paired_reads_qza/reads_trimmed.qza[0m


* Our reads are now ready to be used by qiime2

## Quality control w/ deblur:
Currently deblur doesn't support paired-end reads <br>
### Using VSEARCH for joining:

In [13]:
!qiime vsearch join-pairs \
--i-demultiplexed-seqs paired_reads_qza/reads_trimmed.qza \
--o-joined-sequences paired_reads_qza/reads_trimmed_joined.qza

[32mSaved SampleData[JoinedSequencesWithQuality] to: paired_reads_qza/reads_trimmed_joined.qza[0m


### Filter out low-quality reads.

This command will filter out low-quality reads based on the default options.<br>
(this step may take a while)

In [15]:
!qiime quality-filter q-score-joined \
--i-demux paired_reads_qza/reads_trimmed_joined.qza \
--o-filter-stats filt_stats.qza \
--o-filtered-sequences paired_reads_qza/reads_trimmed_joined_filt.qza

[32mSaved SampleData[JoinedSequencesWithQuality] to: paired_reads_qza/reads_trimmed_joined_filt.qza[0m
[32mSaved QualityFilterStats to: filt_stats.qza[0m


### Deblur Workflow

This workflow is 16S sequences, for other amplicon regions, you can use the denoise-other option in the command and specify a reference database.

Note that you will need to trim all sequences to the same length with the --p-trim-length option. In order to determine the correct length to trim down to, run the following QC:

### To find appropriate deblur parameters we need to summarize our joined reads

In [16]:
!qiime demux summarize \
--i-data paired_reads_qza/reads_trimmed_joined_filt.qza \
--o-visualization reads_trimmed_joined_filt_summary.qzv

[32mSaved Visualization to: reads_trimmed_joined_filt_summary.qzv[0m


### View the obtained visualization

In [17]:
!qiime tools view reads_trimmed_joined_filt_summary.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[8916:8916:0527/210335.038135:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[8870:8892:0527/210335.089703:ERROR:browser_process_sub_thread.cc(209)] Waited 5 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

### Qiime help on importing/exporting/viewing artefacts

In [52]:
!qiime tools --help

Usage: [34m[24mqiime tools[0m [OPTIONS] COMMAND [ARGS]...

  Tools for working with QIIME 2 files.

[1mOptions[0m:
  [34m[24m--help[0m      Show this message and exit.

[1mCommands[0m:
  [34m[24mcitations[0m         Print citations for a QIIME 2 result.
  [34m[24mexport[0m            Export data from a QIIME 2 Artifact or a Visualization
  [34m[24mextract[0m           Extract a QIIME 2 Artifact or Visualization archive.
  [34m[24mimport[0m            Import data into a new QIIME 2 Artifact.
  [34m[24minspect-metadata[0m  Inspect columns available in metadata.
  [34m[24mpeek[0m              Take a peek at a QIIME 2 Artifact or Visualization.
  [34m[24mvalidate[0m          Validate data in a QIIME 2 Artifact.
  [34m[24mview[0m              View a QIIME 2 Visualization.


### Explore provenance w/ https://view.qiime2.org

#### Showing on denoise-16S

In [26]:
!qiime deblur denoise-16S --help

Usage: [34m[24mqiime deblur denoise-16S[0m [OPTIONS]

  Perform sequence quality control for Illumina data using the Deblur
  workflow with a 16S reference as a positive filter. Only forward reads are
  supported at this time. The specific reference used is the 88% OTUs from
  Greengenes 13_8. This mode of operation should only be used when data were
  generated from a 16S amplicon protocol on an Illumina platform. The
  reference is only used to assess whether each sequence is likely to be 16S
  by a local alignment using SortMeRNA with a permissive e-value; the
  reference is not used to characterize the sequences.

[1mInputs[0m:
  [34m[4m--i-demultiplexed-seqs[0m ARTIFACT [32mSampleData[SequencesWithQuality |[0m
    [32mPairedEndSequencesWithQuality | JoinedSequencesWithQuality][0m
                         The demultiplexed sequences to be denoised.
                                                                    [35m[required][0m
[1mParameters[0m:

### Denoising w/ deblur
* Here I'm using a default behaviour of --p-min-reads = 10
* Reads are trimmed to 402nt which retains is at least 98% of the reads<br>
(this step may take a while depending on the size of your data ...)

In [18]:
!qiime deblur denoise-16S \
--i-demultiplexed-seqs paired_reads_qza/reads_trimmed_joined_filt.qza \
--p-trim-length 402 \
--p-sample-stats \
--p-jobs-to-start 8 \
--p-min-reads 10 \
--output-dir deblur_output

[32mSaved FeatureTable[Frequency] to: deblur_output/table.qza[0m
[32mSaved FeatureData[Sequence] to: deblur_output/representative_sequences.qza[0m
[32mSaved DeblurStats to: deblur_output/stats.qza[0m


### Output is saved in the deblur_output folder
#### let's summarise our deblur output

In [19]:
!qiime deblur visualize-stats \
  --i-deblur-stats deblur_output/stats.qza \
  --o-visualization deblur_output/deblur-stats.qzv

[32mSaved Visualization to: deblur_output/deblur-stats.qzv[0m


In [20]:
!qiime tools view deblur_output/deblur-stats.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[19774:19774:0527/211133.320546:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[19734:19755:0527/211133.338602:ERROR:browser_process_sub_thread.cc(209)] Waited 3 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

In [21]:
!qiime feature-table summarize \
--i-table deblur_output/table.qza \
--o-visualization deblur_output/deblur_table_summary.qzv

[32mSaved Visualization to: deblur_output/deblur_table_summary.qzv[0m


In [14]:
!qiime tools view deblur_output/deblur_table_summary.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[26457:26478:0527/212556.145422:ERROR:browser_process_sub_thread.cc(209)] Waited 3 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

### Tabulate representative sequences

In [25]:
!qiime feature-table tabulate-seqs \
  --i-data deblur_output/representative_sequences.qza \
  --o-visualization representative_sequences.qzv

[32mSaved Visualization to: representative_sequences.qzv[0m


In [26]:
!qiime tools view representative_sequences.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[20669:20669:0527/211258.819190:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[20629:20650:0527/211258.842724:ERROR:browser_process_sub_thread.cc(209)] Waited 10 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

## Building phylogeny with FastTree
### Making multiple-sequence alignment

We'll need to make a multiple-sequence alignment of the ASVs before running FastTree.

In [1]:
!mkdir fast_tree_out

In [3]:
!qiime alignment mafft \
--i-sequences deblur_output/representative_sequences.qza \
--p-n-threads 8 \
--o-alignment fast_tree_out/rep_seqs_mafft.qza


[32mSaved FeatureData[AlignedSequence] to: fast_tree_out/rep_seqs_mafft.qza[0m


### Filtering multiple-sequence alignment

Variable positions in the alignment need to be masked before FastTree is run, which can be done with this command:

In [9]:
!qiime alignment mask --i-alignment fast_tree_out/rep_seqs_mafft.qza \
  --o-masked-alignment fast_tree_out/rep_seqs_mafft_masked.qza

[32mSaved FeatureData[AlignedSequence] to: fast_tree_out/rep_seqs_mafft_masked.qza[0m


### Running FastTree

Finally FastTree can be run on this masked multiple-sequence alignment:

In [12]:
!qiime phylogeny fasttree \
--i-alignment fast_tree_out/rep_seqs_mafft_masked.qza \
--p-n-threads 4 \
--o-tree fast_tree_out/rep_seqs_aligned_masked_tree

[32mSaved Phylogeny[Unrooted] to: fast_tree_out/rep_seqs_aligned_masked_tree.qza[0m


### Add root to tree

Use midpoint root

In [13]:
!qiime phylogeny midpoint-root \
--i-tree fast_tree_out/rep_seqs_aligned_masked_tree.qza \
--o-rooted-tree fast_tree_out/rep_seqs_mafft_masked_tree_rooted.qza

[32mSaved Phylogeny[Rooted] to: fast_tree_out/rep_seqs_mafft_masked_tree_rooted.qza[0m


### Generate rarefaction curves

* Useful QC step
* Determine maximum depth for the rarefaction using following (I'm using 8000):


In [15]:
!qiime tools view deblur_output/deblur_table_summary.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[26954:26954:0527/212715.122700:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[26915:26936:0527/212715.126015:ERROR:browser_process_sub_thread.cc(209)] Waited 8 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

In [22]:
!qiime diversity alpha-rarefaction --help

Usage: [34m[24mqiime diversity alpha-rarefaction[0m [OPTIONS]

  Generate interactive alpha rarefaction curves by computing rarefactions
  between `min_depth` and `max_depth`. The number of intermediate depths to
  compute is controlled by the `steps` parameter, with n `iterations` being
  computed at each rarefaction depth. If sample metadata is provided,
  samples may be grouped based on distinct values within a metadata column.

[1mInputs[0m:
  [34m[4m--i-table[0m ARTIFACT [32mFeatureTable[Frequency][0m
                          Feature table to compute rarefaction curves from.
                                                                    [35m[required][0m
  [34m[24m--i-phylogeny[0m ARTIFACT  Optional phylogeny for phylogenetic metrics.
    [32mPhylogeny[Rooted][0m                                               [35m[optional][0m
[1mParameters[0m:
  [34m[4m--p-max-depth[0m INTEGER   The maximum rarefaction depth. Must be greater than
    

In [24]:
!qiime diversity alpha-rarefaction \
--i-table deblur_output/table.qza \
--p-max-depth 8000 \
--p-metrics simpson \
--p-metrics faith_pd \
--p-metrics dominance \
--p-metrics chao1 \
--p-metrics observed_otus \
--p-metrics shannon \
--p-steps 20 \
--i-phylogeny fast_tree_out/rep_seqs_mafft_masked_tree_rooted.qza \
--o-visualization rarefaction_curves.qzv

[32mSaved Visualization to: rarefaction_curves.qzv[0m


In [25]:
!qiime tools view rarefaction_curves.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[28609:28609:0527/213539.402725:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[28570:28591:0527/213539.407075:ERROR:browser_process_sub_thread.cc(209)] Waited 5 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

### Using metadata w\ rarefaction

In [29]:
!qiime diversity alpha-rarefaction \
--i-table deblur_output/table.qza \
--p-max-depth 8000 \
--p-steps 20 \
--i-phylogeny fast_tree_out/rep_seqs_mafft_masked_tree_rooted.qza \
--m-metadata-file mappingfile_upd4.csv \
--o-visualization rarefaction_metadata_curves.qzv

[31m[1mPlugin error from diversity:

  The following IDs are not present in the metadata: 'nm1-9a', 'o1', 'o29'

Debug info has been saved to /tmp/qiime2-q2cli-err-b4nza0qd.log[0m


### Ups! We seem to have an error
* We need to remove those samples from FeatureTable
* Also, we have 2 samples that contain no data, let's remove them as well

In [30]:
!echo SampleID > samples-to-exclude.tsv
!echo nm1-9a >> samples-to-exclude.tsv
!echo o1 >> samples-to-exclude.tsv
!echo o29 >> samples-to-exclude.tsv
!echo o20 >> samples-to-exclude.tsv
!echo o7 >> samples-to-exclude.tsv

### Filtering out samples

In [31]:
!qiime feature-table filter-samples \
  --p-exclude-ids \
  --i-table deblur_output/table.qza \
  --m-metadata-file samples-to-exclude.tsv \
  --o-filtered-table id-filtered-deblur-table.qza


[32mSaved FeatureTable[Frequency] to: id-filtered-deblur-table.qza[0m


### Let's run it again
* Pay attention that we are supplying updated FeatureTable as its --i-table argument

In [32]:
!qiime diversity alpha-rarefaction \
--i-table id-filtered-deblur-table.qza \
--p-max-depth 8000 \
--p-steps 20 \
--i-phylogeny fast_tree_out/rep_seqs_mafft_masked_tree_rooted.qza \
--m-metadata-file mappingfile_upd4.csv \
--o-visualization rarefaction_metadata_curves.qzv

[32mSaved Visualization to: rarefaction_metadata_curves.qzv[0m


In [33]:
!qiime tools view rarefaction_metadata_curves.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[658:658:0527/215802.451478:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[617:639:0527/215802.473356:ERROR:browser_process_sub_thread.cc(209)] Waited 9 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

### Optional step to re-summarize our filtered FeatureTable

In [34]:
!qiime feature-table summarize \
  --i-table id-filtered-deblur-table.qza \
  --o-visualization id-filtered-deblur-table.qzv \
  --m-sample-metadata-file mappingfile_upd4.csv


[32mSaved Visualization to: id-filtered-deblur-table.qzv[0m


In [35]:
!qiime tools view id-filtered-deblur-table.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[1365:1365:0527/220124.952594:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[1315:1343:0527/220124.971747:ERROR:browser_process_sub_thread.cc(209)] Waited 3 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

## Assign taxonomy
* Could be assigned to ASVs using a Naive-Bayes classifier
* This classifier was trained using SILVA 132 database and is specific for v3v4 region
* Contains edits for WPS-2 (Rubrimentifilales and AS-11)
* Could be trained <i>de novo</i>, but RAM intensive
* Qiime version sensitive

(this step may take a long time to complete ...)

In [36]:
!qiime feature-classifier classify-sklearn \
--i-reads deblur_output/representative_sequences.qza \
--i-classifier v3v4_silva132_classifier_wps2_2groups.qza \
--output-dir taxonomy

[32mSaved FeatureData[Taxonomy] to: taxa/classification.qza[0m


### Our taxonomy folder now contains classification.qza file
let's explore the results..

#### Following command export the classification as a tsv-file


In [37]:
!qiime tools export --input-path taxonomy/classification.qza --output-path taxonomy

[32mExported taxonomy/classification.qza as TSVTaxonomyDirectoryFormat to directory taxonomy[0m


### At last..., Our Beloved Bar-Chart :)

In [38]:
!qiime taxa barplot \
--i-table id-filtered-deblur-table.qza \
--i-taxonomy taxonomy/classification.qza \
--m-metadata-file mappingfile_upd4.csv \
--o-visualization taxonomy/taxa_barplot.qzv

[32mSaved Visualization to: taxonomy/taxa_barplot.qzv[0m


In [39]:
!qiime tools view taxonomy/taxa_barplot.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[10096:10122:0527/225735.024110:ERROR:browser_process_sub_thread.cc(209)] Waited 3 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

## Finally, let's calculate core diversity metrics
* For this step we need to select a reasonable rarefaction value
* Let's have a look at our FeatureTable again

In [40]:
!qiime tools view id-filtered-deblur-table.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[11163:11184:0527/230408.762775:ERROR:browser_process_sub_thread.cc(209)] Waited 3 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

#### 400 seems to be a good number in this case, we are loosing only 2 samples

In [42]:
!qiime diversity core-metrics-phylogenetic \
--i-phylogeny fast_tree_out/rep_seqs_mafft_masked_tree_rooted.qza \
--i-table id-filtered-deblur-table.qza \
--p-sampling-depth 400 \
--m-metadata-file mappingfile_upd4.csv \
--output-dir core-metrics

[32mSaved FeatureTable[Frequency] to: core-metrics/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] % Properties('phylogenetic') to: core-metrics/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics/observed_otus_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics/evenness_vector.qza[0m
[32mSaved DistanceMatrix % Properties('phylogenetic') to: core-metrics/unweighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix % Properties('phylogenetic') to: core-metrics/weighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: core-metrics/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: core-metrics/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: core-metrics/unweighted_unifrac_pcoa_results.qza[0m
[32mSaved PCoAResults to: core-metrics/weighted_unifrac_pcoa_results.qza[0m
[32mSaved PCoAResults to: core-metr

#### let's view an ordination plot

In [43]:
!qiime tools view core-metrics/weighted_unifrac_emperor.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[11991:11991:0527/230830.276348:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[11950:11971:0527/230830.290089:ERROR:browser_process_sub_thread.cc(209)] Waited 3 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

### Alpha diversity group significance test
* An example of just one metric

In [45]:
!qiime diversity alpha-group-significance \
--i-alpha-diversity core-metrics/faith_pd_vector.qza \
--m-metadata-file mappingfile_upd4.csv \
--o-visualization core-metrics/faith-pd-group-significance.qzv

[32mSaved Visualization to: core-metrics/faith-pd-group-significance.qzv[0m


In [46]:
!qiime tools view core-metrics/faith-pd-group-significance.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[12746:12767:0527/231245.650969:ERROR:browser_process_sub_thread.cc(209)] Waited 3 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

### Beta diversity group significance test
* lets test weighted unifrac

In [47]:
!qiime diversity beta-group-significance \
--i-distance-matrix core-metrics/weighted_unifrac_distance_matrix.qza \
--m-metadata-file mappingfile_upd4.csv \
--m-metadata-column Location \
--p-pairwise \
--o-visualization core-metrics/unweighted-unifrac-bodysite-significance.qzv

[32mSaved Visualization to: core-metrics/unweighted-unifrac-bodysite-significance.qzv[0m


In [49]:
!qiime tools view core-metrics/unweighted-unifrac-bodysite-significance.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[13743:13743:0527/231840.312005:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[13704:13725:0527/231840.319508:ERROR:browser_process_sub_thread.cc(209)] Waited 5 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

## Bonus part: Exporting FeatureTables (biom files)
* qiime2 keeps taxonomy separately
* therefore exporting biom files with taxonomy needs some additional steps

In [50]:
!sed -i -e '1 s/Feature/#OTUID/' -e '1 s/Taxon/taxonomy/' taxonomy/taxonomy.tsv

In [51]:
!qiime tools export \
--input-path id-filtered-deblur-table.qza \
--output-path id-filtered-deblur-table-exported

[32mExported id-filtered-deblur-table.qza as BIOMV210DirFmt to directory id-filtered-deblur-table-exported[0m


In [54]:
!biom add-metadata \
-i id-filtered-deblur-table-exported/feature-table.biom \
-o id-filtered-deblur-table-exported/feature-table_w_tax.biom \
--observation-metadata-fp taxonomy/taxonomy.tsv \
--sc-separated taxonomy

### And finally a familiar biom-convert :)

In [55]:
!biom convert \
-i id-filtered-deblur-table-exported/feature-table_w_tax.biom \
-o id-filtered-deblur-table-exported/feature-table.tsv \
--to-tsv --header-key taxonomy

## little transformations to get fractions at a given level

## and For BIOM w/taxonomy:

In [24]:
#this seems to differ https://forum.qiime2.org/t/exporting-and-modifying-biom-tables-e-g-adding-taxonomy-annotations/3630
!sed -i -e '1 s/Feature/#OTUID/' -e '1 s/Taxon/taxonomy/' taxa/taxonomy.tsv

In [20]:
!qiime tools export --input-path id-filtered-deblur-table.qza --output-path id-filtered-deblur-table-exported


[32mExported id-filtered-deblur-table.qza as BIOMV210DirFmt to directory id-filtered-deblur-table-exported[0m


In [28]:
!biom add-metadata -i id-filtered-deblur-table-exported/feature-table.biom -o id-filtered-deblur-table-exported/feature-table_w_tax.biom --observation-metadata-fp taxa/taxonomy.tsv --sc-separated taxonomy

In [30]:
!biom convert -i id-filtered-deblur-table-exported/feature-table_w_tax.biom -o id-filtered-deblur-table-exported/feature-table.tsv --to-tsv --header-key taxonomy

## Making feature table w\ fractions

In [48]:
!qiime feature-table relative-frequency \
--i-table id-filtered-deblur-table.qza \
--o-relative-frequency-table frac-id-filtered-deblur-table.qza

[32mSaved FeatureTable[RelativeFrequency] to: frac-id-filtered-deblur-table.qza[0m


In [49]:
!qiime tools export --input-path frac-id-filtered-deblur-table.qza --output-path frac-id-filtered-deblur-table

[32mExported frac-id-filtered-deblur-table.qza as BIOMV210DirFmt to directory frac-id-filtered-deblur-table[0m


In [51]:
!biom add-metadata \
-i frac-id-filtered-deblur-table/feature-table.biom \
-o frac-id-filtered-deblur-table/feature-table_w_tax.biom \
--observation-metadata-fp taxa/taxonomy.tsv --sc-separated taxonomy

In [52]:
!biom convert \
-i frac-id-filtered-deblur-table/feature-table_w_tax.biom \
-o frac-id-filtered-deblur-table/feature-table.tsv \
--to-tsv \
--header-key taxonomy

In [17]:
!qiime tools export --help

Usage: qiime tools export [OPTIONS]

  Exporting extracts (and optionally transforms) data stored inside an
  Artifact or Visualization. Note that Visualizations cannot be transformed
  with --output-format

Options:
  --input-path FILE     Path to file that should be exported  [required]
  --output-path PATH    Path to file or directory where data should be
                        exported to  [required]
  --output-format TEXT  Format which the data should be exported as. This
                        option cannot be used with Visualizations
  --help                Show this message and exit.


In [7]:
!qiime diversity alpha-rarefaction --i-table id-filtered-deblur-table.qza \
                                  --p-max-depth 8000 \
                                  --p-steps 20 \
                                  --i-phylogeny tree_out/rep_seqs_aligned_masked_tree_rooted.qza \
                                  --m-metadata-file mappingfile_upd4.csv \
                                  --o-visualization rarefaction_curves.qzv

[32mSaved Visualization to: rarefaction_curves.qzv[0m


In [8]:
!qiime tools view rarefaction_curves.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[7130:7130:0518/222018.657832:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[7090:7111:0518/222018.674405:ERROR:browser_process_sub_thread.cc(209)] Waited 3 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

For some reason, the QIIME2 default in the above curves with the metadata file (which you can see in the visualization) is to not give you the option of seeing each sample's rarefaction curve individually (even though this is the default later on in stacked barplots!), only the "grouped" curves by each metadata type. As it can be quite important in data QC to see if you have inconsistent samples, we need to rerun the above command, but this time omitting the metadata file (use the same X for the maximum depth, as above).

In [10]:
!qiime diversity alpha-rarefaction --i-table id-filtered-deblur-table.qza \
                                  --p-max-depth 8000 \
                                  --p-steps 20 \
                                  --i-phylogeny tree_out/rep_seqs_aligned_masked_tree_rooted.qza \
                                  --o-visualization rarefaction_curves_eachsample.qzv

[32mSaved Visualization to: rarefaction_curves_eachsample.qzv[0m


In [11]:
!qiime tools view rarefaction_curves_eachsample.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[7719:7719:0518/222244.309758:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[7678:7699:0518/222244.311503:ERROR:browser_process_sub_thread.cc(209)] Waited 3 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

## 6. Assign taxonomy

You can assign taxonomy to your ASVs using a Naive-Bayes approach implemented in the scikit learn Python library and the SILVA database. Note that we have trained classifiers for a few different amplicon regions already (which are available in the /home/shared/taxa_classifiers folder), but you will need to generate your own if your region of interest isn't there. The classifier filename below is for the 

In [6]:
!rmdir taxa

!!! chokes on memory, seem to work with a single thread

In [1]:
!qiime feature-classifier classify-sklearn --i-reads deblur_output/representative_sequences.qza \
                                          --i-classifier v3v4_silva132_classifier_wps2_2groups.qza \
                                          --output-dir taxa  


[32mSaved FeatureData[Taxonomy] to: taxa/classification.qza[0m


In [3]:
!qiime tools export --input-path taxa/classification.qza --output-path taxa

[32mExported taxa/classification.qza as TSVTaxonomyDirectoryFormat to directory taxa[0m


In [32]:
!qiime taxa barplot --i-table id-filtered-deblur-table.qza \
                   --i-taxonomy taxa/classification.qza \
                   --m-metadata-file mappingfile_upd4.csv \
                   --o-visualization taxa/taxa_barplot.qzv

[32mSaved Visualization to: taxa/taxa_barplot.qzv[0m


In [33]:
!qiime tools view taxa/taxa_barplot.qzv

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.[15795:15795:0519/205546.956028:ERROR:sandbox_linux.cc(364)] InitializeSandbox() called with multiple threads in process gpu-process.
[15751:15772:0519/205546.958136:ERROR:browser_process_sub_thread.cc(209)] Waited 3 ms for network service
Opening in existing browser session.

Press the 'q' key, Control-C, or Control-D to quit. This view may no longer be accessible or work correctly after quitting.

## Making biom tables with fraciton by taxonomic level

In [34]:
!mkdir taxa-levels

In [35]:
!qiime taxa collapse \
--i-table id-filtered-deblur-table.qza \
--i-taxonomy taxa/classification.qza \
--p-level 2 \
--o-collapsed-table taxa-levels/table-l2.qza

[32mSaved FeatureTable[Frequency] to: taxa-levels/table-l2.qza[0m


### Convert counts to fractions

In [36]:
!qiime feature-table relative-frequency \
--i-table taxa-levels/table-l2.qza \
--o-relative-frequency-table taxa-levels/frac-table-l2.qza

[32mSaved FeatureTable[RelativeFrequency] to: taxa-levels/frac-table-l2.qza[0m


## Convert to tsv w\ taxonomy

In [37]:
!qiime tools export \
--input-path taxa-levels/frac-table-l2.qza \
--output-path taxa-levels/frac-table-l2


[32mExported taxa-levels/frac-table-l2.qza as BIOMV210DirFmt to directory taxa-levels/frac-table-l2[0m


## This step seems unnecessary since it already comes with taxonomy

In [38]:
!biom add-metadata \
-i taxa-levels/frac-table-l2/feature-table.biom \
-o taxa-levels/frac-table-l2/feature-table_w_tax.biom \
--observation-metadata-fp taxa/taxonomy.tsv \
--sc-separated taxonomy


In [39]:
!biom convert \
-i taxa-levels/frac-table-l2/feature-table_w_tax.biom \
-o taxa-levels/frac-table-l2/feature-table.tsv \
--to-tsv \
--header-key taxonomy

## Level 4

In [41]:
!qiime taxa collapse \
--i-table id-filtered-deblur-table.qza \
--i-taxonomy taxa/classification.qza \
--p-level 4 \
--o-collapsed-table taxa-levels/table-l4.qza

[32mSaved FeatureTable[Frequency] to: taxa-levels/table-l4.qza[0m


In [43]:
!qiime feature-table relative-frequency \
--i-table taxa-levels/table-l4.qza \
--o-relative-frequency-table taxa-levels/frac-table-l4.qza

[32mSaved FeatureTable[RelativeFrequency] to: taxa-levels/frac-table-l4.qza[0m


In [44]:
!qiime tools export \
--input-path taxa-levels/frac-table-l4.qza \
--output-path taxa-levels/frac-table-l4

[32mExported taxa-levels/frac-table-l4.qza as BIOMV210DirFmt to directory taxa-levels/frac-table-l4[0m


In [46]:
!biom convert \
-i taxa-levels/frac-table-l4/feature-table.biom \
-o taxa-levels/frac-table-l4/feature-table.tsv \
--to-tsv \
--header-key taxonomy

# Level5

In [47]:
!qiime taxa collapse \
--i-table id-filtered-deblur-table.qza \
--i-taxonomy taxa/classification.qza \
--p-level 5 \
--o-collapsed-table taxa-levels/table-l5.qza

!qiime feature-table relative-frequency \
--i-table taxa-levels/table-l5.qza \
--o-relative-frequency-table taxa-levels/frac-table-l5.qza

!qiime tools export \
--input-path taxa-levels/frac-table-l5.qza \
--output-path taxa-levels/frac-table-l5

!biom convert \
-i taxa-levels/frac-table-l5/feature-table.biom \
-o taxa-levels/frac-table-l5/feature-table.tsv \
--to-tsv \
--header-key taxonomy

[32mSaved FeatureTable[Frequency] to: taxa-levels/table-l5.qza[0m
[32mSaved FeatureTable[RelativeFrequency] to: taxa-levels/frac-table-l5.qza[0m
[32mExported taxa-levels/frac-table-l5.qza as BIOMV210DirFmt to directory taxa-levels/frac-table-l5[0m
