# QIIME 2 enables comprehensive end-to-end analysis of diverse microbiome data and comparative studies with publicly available data

this is a QIIME 2 Artifact CLI notebook which replicated analyses in the QIIME 2 protocol

**environment:** qiime2-2019.10

## How to use this notebook:

1. Activate the `qiime2-2019.10` conda environment.
    ```
    conda activate qiime2-2019.10
    ```
      
2. Install additional dependencies:
    ```
    conda install songbird -c conda-forge
    conda install -c conda-forge redbiom
    conda install -c bioconda bowtie2
    pip install https://github.com/knights-lab/SHOGUN/archive/master.zip
    pip install https://github.com/qiime2/q2-shogun/archive/master.zip
    conda install cytoolz
    qiime dev refresh-cache
    ```  

3. Restart and run the notebook

In [1]:
## Hide excessive warnings (optional):
import warnings
warnings.filterwarnings('ignore')

## Acquire data from ECAM study 

In [2]:
!mkdir qiime2-ecam-tutorial

In [3]:
!cd qiime2-ecam-tutorial

In [4]:
!wget -O 81253.zip "https://qiita.ucsd.edu/public_artifact_download/?artifact_id=81253"

--2020-01-14 14:21:06--  https://qiita.ucsd.edu/public_artifact_download/?artifact_id=81253
Resolving qiita.ucsd.edu... 169.228.46.38
Connecting to qiita.ucsd.edu|169.228.46.38|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1116152564 (1.0G) [application/zip]
Saving to: ‘81253.zip’


2020-01-14 14:26:13 (3.47 MB/s) - ‘81253.zip’ saved [1116152564/1116152564]



In [5]:
!unzip 81253.zip

Archive:  81253.zip
 extracting: per_sample_FASTQ/81253/10249.C018.01SS.r.fastq.gz   bad CRC 408b78ca  (should be 82882250)
 extracting: per_sample_FASTQ/81253/10249.C030.01SS.r.fastq.gz   bad CRC 2746113c  (should be 58903356)
 extracting: per_sample_FASTQ/81253/10249.C031.01SS.r.fastq.gz   bad CRC 32768cb1  (should be 46630065)
 extracting: per_sample_FASTQ/81253/10249.C022.01SS.r.fastq.gz   bad CRC c9d8e76c  (should be 86435436)
 extracting: per_sample_FASTQ/81253/10249.C010.01SS.r.fastq.gz   bad CRC 56f95c51  (should be 59182673)
 extracting: per_sample_FASTQ/81253/10249.C014.01SS.r.fastq.gz   bad CRC 549966a6  (should be 19339430)
 extracting: per_sample_FASTQ/81253/10249.C043.01SS.r.fastq.gz   bad CRC 7da85b6d  (should be 08185453)
 extracting: per_sample_FASTQ/81253/10249.C020.01SS.r.fastq.gz   bad CRC 045eb75a  (should be 73316186)
 extracting: per_sample_FASTQ/81253/10249.C035.01SS.r.fastq.gz   bad CRC 4dff8d5c  (should be 08593500)
 extracting: per_sample_FASTQ/81253/10249.C0

 extracting: per_sample_FASTQ/81253/10249.C032.15SS.fastq.gz   bad CRC 3ce7b59f  (should be 21818271)
 extracting: per_sample_FASTQ/81253/10249.M055.01SS.fastq.gz   bad CRC 84748e24  (should be 22231076)
 extracting: per_sample_FASTQ/81253/10249.M043.01SS.fastq.gz   bad CRC 2c4f52b3  (should be 43396019)
 extracting: per_sample_FASTQ/81253/10249.C016.09SS.fastq.gz   bad CRC 0abe65e5  (should be 80250085)
 extracting: per_sample_FASTQ/81253/10249.C044.01SS.fastq.gz   bad CRC d1cd7637  (should be 19903287)
 extracting: per_sample_FASTQ/81253/10249.C007.23SD.fastq.gz   bad CRC 73c8d981  (should be 42542721)
 extracting: per_sample_FASTQ/81253/10249.C010.20SD.fastq.gz   bad CRC 09912103  (should be 60506115)
 extracting: per_sample_FASTQ/81253/10249.C049.09SS.fastq.gz   bad CRC 804ee66f  (should be 52654447)
 extracting: per_sample_FASTQ/81253/10249.M044.01SS.fastq.gz   bad CRC 46ceef24  (should be 87966756)
 extracting: per_sample_FASTQ/81253/10249.C016.01SS.fastq.gz   bad CRC ac09ef79  (

 extracting: per_sample_FASTQ/81253/10249.C046.07SS.fastq.gz   bad CRC a5b95e82  (should be 80388994)
 extracting: per_sample_FASTQ/81253/10249.C037.01SS.fastq.gz   bad CRC fd345391  (should be 48064913)
 extracting: per_sample_FASTQ/81253/10249.C001.01SS.fastq.gz   bad CRC 51be6d84  (should be 71434372)
 extracting: per_sample_FASTQ/81253/10249.C001.34SD.fastq.gz   bad CRC 4691b47c  (should be 83954044)
 extracting: per_sample_FASTQ/81253/10249.M022.01SS.fastq.gz   bad CRC 0519cb82  (should be 85576578)
 extracting: per_sample_FASTQ/81253/10249.C035.01SS.fastq.gz   bad CRC 6f57a5bb  (should be 68015035)
 extracting: per_sample_FASTQ/81253/10249.C034.14SS.fastq.gz   bad CRC 62d2673d  (should be 57956157)
 extracting: per_sample_FASTQ/81253/10249.M024.01SS.fastq.gz   bad CRC 1c50bc17  (should be 75053079)
 extracting: per_sample_FASTQ/81253/10249.C025.08SS.fastq.gz   bad CRC b4b3fce0  (should be 31694560)
 extracting: per_sample_FASTQ/81253/10249.C043.01SS.fastq.gz   bad CRC 7561fe2e  (

 extracting: per_sample_FASTQ/81253/10249.M001.03R.fastq.gz   bad CRC 15859c19  (should be 61077785)
 extracting: per_sample_FASTQ/81253/10249.M021.03V.fastq.gz   bad CRC 5e2b2c4d  (should be 79887693)
 extracting: per_sample_FASTQ/81253/10249.M043.03R.fastq.gz   bad CRC 515cc2f1  (should be 65033713)
 extracting: per_sample_FASTQ/81253/10249.M044.03V.fastq.gz   bad CRC 27ac7c08  (should be 65615368)
 extracting: per_sample_FASTQ/81253/10249.M034.03V.fastq.gz   bad CRC 6ff40f29  (should be 78265641)
 extracting: per_sample_FASTQ/81253/10249.M053.02V.fastq.gz   bad CRC 4c6c16ce  (should be 82152142)
 extracting: per_sample_FASTQ/81253/10249.M050.01V.fastq.gz   bad CRC 88668210  (should be 88419344)
 extracting: per_sample_FASTQ/81253/10249.M019.02V.fastq.gz   bad CRC 329e2b45  (should be 49226565)
 extracting: per_sample_FASTQ/81253/10249.M035.02V.fastq.gz   bad CRC c84e0bf6  (should be 60558070)
 extracting: per_sample_FASTQ/81253/10249.M001.04V.fastq.gz   bad CRC b5485300  (should be 

 extracting: per_sample_FASTQ/81253/10249.M046.02V.fastq.gz   bad CRC dcc78032  (should be 04062002)
 extracting: per_sample_FASTQ/81253/10249.M034.01R.fastq.gz   bad CRC 0c70c9d9  (should be 08718297)
 extracting: per_sample_FASTQ/81253/10249.M018.03V.fastq.gz   bad CRC e06f4a1c  (should be 65389852)
 extracting: per_sample_FASTQ/81253/10249.M027.01R.fastq.gz   bad CRC 8c0b551c  (should be 49552924)
 extracting: per_sample_FASTQ/81253/10249.M027.03V.fastq.gz   bad CRC 99e5a6ec  (should be 81964524)
 extracting: per_sample_FASTQ/81253/10249.M037.03R.fastq.gz   bad CRC dec271bb  (should be 37285051)
 extracting: per_sample_FASTQ/81253/10249.M007.03R.fastq.gz   bad CRC ece8edba  (should be 74688186)
 extracting: per_sample_FASTQ/81253/10249.M027.03R.fastq.gz   bad CRC 375a07fc  (should be 28647164)
 extracting: per_sample_FASTQ/81253/10249.M046.03V.fastq.gz   bad CRC 90ca2ccb  (should be 29168843)
 extracting: per_sample_FASTQ/81253/10249.M010.01R.fastq.gz   bad CRC 62769629  (should be 

 extracting: per_sample_FASTQ/81253/10249.M051.01R.fastq.gz   bad CRC 2eb2e715  (should be 83476501)
 extracting: per_sample_FASTQ/81253/10249.M018.01R.fastq.gz   bad CRC 1bb9352e  (should be 65122606)
 extracting: per_sample_FASTQ/81253/10249.M025.02R.fastq.gz   bad CRC bbfa9af6  (should be 53763062)
 extracting: per_sample_FASTQ/81253/10249.M019.01V.fastq.gz   bad CRC 8c031782  (should be 49012866)
 extracting: per_sample_FASTQ/81253/10249.M015.01R.fastq.gz   bad CRC be59db03  (should be 93559811)
 extracting: per_sample_FASTQ/81253/10249.M043.01R.fastq.gz   bad CRC 0986a258  (should be 59818328)
 extracting: per_sample_FASTQ/81253/10249.M050.01R.fastq.gz   bad CRC 7817e683  (should be 14832259)
 extracting: per_sample_FASTQ/81253/10249.M042.01V.fastq.gz   bad CRC 6d869133  (should be 37535539)
 extracting: per_sample_FASTQ/81253/10249.M020.03V.fastq.gz   bad CRC 5e5d1198  (should be 83157656)
 extracting: per_sample_FASTQ/81253/10249.M031.01R.fastq.gz   bad CRC b11341d4  (should be 

In [6]:
!mv mapping_files/81253_mapping_file.txt metadata.tsv

## Importe DNA sequence data into QIIME 2 & create a visual summary

### 1. Create the manifest file with the required column headers

In [7]:
!echo "sample-id\tabsolute-filepath" > manifest.tsv

### 2. Use a loop function to insert the sample names into the sample-id column and add the full paths to the sequence files in the absolute-filepath column

In [8]:
!for f in `ls per_sample_FASTQ/81253/*.gz`; \
do n=`basename $f`; echo "12802.${n%.fastq.gz}\t$PWD/$f"; done >> manifest.tsv

### 3. Use the manifest file to import the sequences into QIIME 2

In [9]:
!qiime tools import \
  --input-path manifest.tsv \
  --type 'SampleData[SequencesWithQuality]' \
  --input-format SingleEndFastqManifestPhred33V2 \
  --output-path se-demux.qza

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mImported manifest.tsv as SingleEndFastqManifestPhred33V2 to se-demux.qza[0m


### 4. Create a summary of the demultiplexed artifact

In [12]:
!qiime demux summarize \
  --i-data se-demux.qza \
  --o-visualization se-demux.qzv

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Visualization to: se-demux.qzv[0m


## Sequence quality control and feature table construction

### 1. Apply intial quality filtering 

In [13]:
!qiime quality-filter q-score \
 --i-demux se-demux.qza \
 --o-filtered-sequences demux-filtered.qza \
 --o-filter-stats demux-filter-stats.qza

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved SampleData[SequencesWithQuality] to: demux-filtered.qza[0m
[32mSaved QualityFilterStats to: demux-filter-stats.qza[0m


### 2. Apply Deblur workflow

In [14]:
!qiime deblur denoise-16S \
  --i-demultiplexed-seqs demux-filtered.qza \
  --p-trim-length 150 \
  --p-sample-stats \
  --p-jobs-to-start 1 \
  --o-stats deblur-stats.qza \
  --o-representative-sequences rep-seqs-deblur.qza \
  --o-table table-deblur.qza

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved FeatureTable[Frequency] to: table-deblur.qza[0m
[32mSaved FeatureData[Sequence] to: rep-seqs-deblur.qza[0m
[32mSaved DeblurStats to: deblur-stats.qza[0m


### 3. Create a visualization summary of deblur statistics

In [15]:
!qiime deblur visualize-stats \
  --i-deblur-stats deblur-stats.qza \
  --o-visualization deblur-stats.qzv

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Visualization to: deblur-stats.qzv[0m


### 4. Visualize representative sequences

In [16]:
!qiime feature-table tabulate-seqs \
  --i-data rep-seqs-deblur.qza \
  --o-visualization rep-seqs-deblur.qzv

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Visualization to: rep-seqs-deblur.qzv[0m


### 5. Visualize feature table

In [17]:
!qiime feature-table summarize \
  --i-table table-deblur.qza \
  --m-sample-metadata-file metadata.tsv \
  --o-visualization table-deblur.qzv 

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Visualization to: table-deblur.qzv[0m


## Generate a phylogenetic tree

### 1. Download a backbone tree

In [18]:
!wget \
  -O "sepp-refs-gg-13-8.qza" \
  "https://data.qiime2.org/2019.10/common/sepp-refs-gg-13-8.qza"

--2020-01-14 17:18:12--  https://data.qiime2.org/2019.10/common/sepp-refs-gg-13-8.qza
Resolving data.qiime2.org... 52.35.38.247
Connecting to data.qiime2.org|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2019.10/common/sepp-refs-gg-13-8.qza [following]
--2020-01-14 17:18:12--  https://s3-us-west-2.amazonaws.com/qiime2-data/2019.10/common/sepp-refs-gg-13-8.qza
Resolving s3-us-west-2.amazonaws.com... 52.218.245.192
Connecting to s3-us-west-2.amazonaws.com|52.218.245.192|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 50161069 (48M) [binary/octet-stream]
Saving to: ‘sepp-refs-gg-13-8.qza’


2020-01-14 17:18:24 (4.28 MB/s) - ‘sepp-refs-gg-13-8.qza’ saved [50161069/50161069]



### 2. Create an insertion tree

In [19]:
!qiime fragment-insertion sepp \
  --i-representative-sequences rep-seqs-deblur.qza \
  --i-reference-database sepp-refs-gg-13-8.qza \
  --p-threads 1 \
  --o-tree insertion-tree.qza \
  --o-placements insertion-placements.qza 

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Phylogeny[Rooted] to: insertion-tree.qza[0m
[32mSaved Placements to: insertion-placements.qza[0m


### 3. Filter feature table

In [20]:
!qiime fragment-insertion filter-features \
  --i-table table-deblur.qza \
  --i-tree insertion-tree.qza \
  --o-filtered-table filtered-table-deblur.qza \
  --o-removed-table removed-table.qza

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved FeatureTable[Frequency] to: filtered-table-deblur.qza[0m
[32mSaved FeatureTable[Frequency] to: removed-table.qza[0m


## Taxonomic classification

### 1. Download required files

In [21]:
!wget https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/human-stool.qza

--2020-01-14 21:29:02--  https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/human-stool.qza
Resolving github.com... 192.30.255.113
Connecting to github.com|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/BenKaehler/readytowear/master/data/gg_13_8/515f-806r/human-stool.qza [following]
--2020-01-14 21:29:02--  https://raw.githubusercontent.com/BenKaehler/readytowear/master/data/gg_13_8/515f-806r/human-stool.qza
Resolving raw.githubusercontent.com... 151.101.196.133
Connecting to raw.githubusercontent.com|151.101.196.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 209813 (205K) [application/octet-stream]
Saving to: ‘human-stool.qza’


2020-01-14 21:29:03 (5.20 MB/s) - ‘human-stool.qza’ saved [209813/209813]



In [22]:
!wget https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/ref-seqs-v4.qza

--2020-01-14 21:29:03--  https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/ref-seqs-v4.qza
Resolving github.com... 192.30.255.113
Connecting to github.com|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/BenKaehler/readytowear/master/data/gg_13_8/515f-806r/ref-seqs-v4.qza [following]
--2020-01-14 21:29:04--  https://raw.githubusercontent.com/BenKaehler/readytowear/master/data/gg_13_8/515f-806r/ref-seqs-v4.qza
Resolving raw.githubusercontent.com... 151.101.196.133
Connecting to raw.githubusercontent.com|151.101.196.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9319426 (8.9M) [application/octet-stream]
Saving to: ‘ref-seqs-v4.qza’


2020-01-14 21:29:05 (8.12 MB/s) - ‘ref-seqs-v4.qza’ saved [9319426/9319426]



In [23]:
!wget https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/ref-tax.qza

--2020-01-14 21:29:06--  https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/ref-tax.qza
Resolving github.com... 192.30.255.113
Connecting to github.com|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/BenKaehler/readytowear/master/data/gg_13_8/515f-806r/ref-tax.qza [following]
--2020-01-14 21:29:06--  https://raw.githubusercontent.com/BenKaehler/readytowear/master/data/gg_13_8/515f-806r/ref-tax.qza
Resolving raw.githubusercontent.com... 151.101.196.133
Connecting to raw.githubusercontent.com|151.101.196.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2604632 (2.5M) [application/octet-stream]
Saving to: ‘ref-tax.qza’


2020-01-14 21:29:07 (7.56 MB/s) - ‘ref-tax.qza’ saved [2604632/2604632]



### 2. Train a classifier

In [24]:
!qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads ref-seqs-v4.qza \
  --i-reference-taxonomy ref-tax.qza \
  --i-class-weight human-stool.qza \
  --o-classifier gg138_v4_human-stool_classifier.qza

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved TaxonomicClassifier to: gg138_v4_human-stool_classifier.qza[0m


### 3. Assign taxonomy

In [25]:
!qiime feature-classifier classify-sklearn \
  --i-reads rep-seqs-deblur.qza \
  --i-classifier gg138_v4_human-stool_classifier.qza \
  --o-classification bespoke-taxonomy.qza

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved FeatureData[Taxonomy] to: bespoke-taxonomy.qza[0m


### 4. Visualize taxonomies

In [26]:
!qiime metadata tabulate \
  --m-input-file bespoke-taxonomy.qza \
  --m-input-file rep-seqs-deblur.qza \
  --o-visualization bespoke-taxonomy.qzv

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Visualization to: bespoke-taxonomy.qzv[0m


## Filter ECAM data to contain children samples only

### 1. Filter feature table

In [27]:
!qiime feature-table filter-samples \
  --i-table filtered-table-deblur.qza \
  --m-metadata-file metadata.tsv \
  --p-where "[mom_or_child]='C'" \
  --o-filtered-table child-table.qza

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved FeatureTable[Frequency] to: child-table.qza[0m


### 2. Visualize new feature table

In [28]:
!qiime feature-table summarize \
  --i-table child-table.qza \
  --m-sample-metadata-file metadata.tsv \
  --o-visualization child-table.qzv

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Visualization to: child-table.qzv[0m


## Alpha rarefaction plots

In [29]:
!qiime diversity alpha-rarefaction \
  --i-table child-table.qza \
  --i-phylogeny insertion-tree.qza \
  --p-max-depth 10000 \
  --m-metadata-file metadata.tsv \
  --o-visualization child-alpha-rarefaction.qzv

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Visualization to: child-alpha-rarefaction.qzv[0m


## Basic data exploration and diversity analyses

### 0. Filter feature table to include only one sample per subject per month

In [30]:
!qiime feature-table filter-samples \
  --i-table child-table.qza \
  --m-metadata-file metadata.tsv \
  --p-where "[month_replicate]='no'" \
  --o-filtered-table child-table-norep.qza

# create a visualization summary of new table
!qiime feature-table summarize \
  --i-table child-table-norep.qza \
  --m-sample-metadata-file metadata.tsv \
  --o-visualization child-table-norep.qzv

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved FeatureTable[Frequency] to: child-table-norep.qza[0m
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qi

### 1. Generate taxonomic barplot

In [31]:
!qiime taxa barplot \
  --i-table child-table-norep.qza \
  --i-taxonomy bespoke-taxonomy.qza \
  --m-metadata-file metadata.tsv \
  --o-visualization child-bar-plots.qzv

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Visualization to: child-bar-plots.qzv[0m


### 2. Compute alpha and beta diversity

In [32]:
!qiime diversity core-metrics-phylogenetic \
  --i-table child-table-norep.qza \
  --i-phylogeny insertion-tree.qza \
  --p-sampling-depth 3400 \
  --m-metadata-file metadata.tsv \
  --p-n-jobs 1 \
  --output-dir child-norep-core-metrics-results

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved FeatureTable[Frequency] to: child-norep-core-metrics-results/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] % Properties('phylogenetic') to: child-norep-core-metrics-results/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: child-norep-core-metrics-results/observed_otus_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: child-norep-c

## Perform statistical tests on diversity and generate interactive visualization

### 1. Statistical test on alpha diversity

#### A. Across all time points

In [33]:
!qiime diversity alpha-group-significance \
  --i-alpha-diversity child-norep-core-metrics-results/shannon_vector.qza \
  --m-metadata-file metadata.tsv \
  --o-visualization child-norep-core-metrics-results/shannon-group-significance.qzv

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Visualization to: child-norep-core-metrics-results/shannon-group-significance.qzv[0m


#### B. At last time point (month 24)

In [34]:
# filter the feature table to have the final time point only
!qiime feature-table filter-samples \
  --i-table child-table-norep.qza \
  --m-metadata-file metadata.tsv \
  --p-where "[month]='24'" \
  --o-filtered-table table-norep-C24.qza

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved FeatureTable[Frequency] to: table-norep-C24.qza[0m


In [35]:
# recalculate diversities
!qiime diversity core-metrics-phylogenetic \
  --i-table table-norep-C24.qza \
  --i-phylogeny insertion-tree.qza \
  --p-sampling-depth 3400 \
  --m-metadata-file metadata.tsv \
  --p-n-jobs 1 \
  --output-dir norep-C24-core-metrics-results

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved FeatureTable[Frequency] to: norep-C24-core-metrics-results/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] % Properties('phylogenetic') to: norep-C24-core-metrics-results/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: norep-C24-core-metrics-results/observed_otus_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: norep-C24-core-metr

In [36]:
# statistical test on alpha diversity at last time point
!qiime diversity alpha-group-significance \
  --i-alpha-diversity norep-C24-core-metrics-results/shannon_vector.qza \
  --m-metadata-file metadata.tsv \
  --o-visualization norep-C24-core-metrics-results/shannon-group-significance.qzv

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Visualization to: norep-C24-core-metrics-results/shannon-group-significance.qzv[0m


### 2. Statistical test on beta diversity

In [37]:
!qiime diversity beta-group-significance \
  --i-distance-matrix norep-C24-core-metrics-results/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata.tsv \
  --m-metadata-column delivery \
  --p-pairwise \
  --o-visualization norep-C24-core-metrics-results/uw_unifrac-delivery-significance.qzv

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Visualization to: norep-C24-core-metrics-results/uw_unifrac-delivery-significance.qzv[0m


## Longitudinal data analysis

### 1. Linear mixed effects models

In [38]:
# recalculate diversities based on the full dataset
!qiime diversity core-metrics-phylogenetic \
  --i-table child-table.qza \
  --i-phylogeny insertion-tree.qza \
  --p-sampling-depth 3400 \
  --m-metadata-file metadata.tsv \
  --p-n-jobs 1 \
  --output-dir child-core-metrics-results

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved FeatureTable[Frequency] to: child-core-metrics-results/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] % Properties('phylogenetic') to: child-core-metrics-results/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: child-core-metrics-results/observed_otus_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: child-core-metrics-results/shan

In [39]:
# LME model on testing the effects of delivery model and diet
!qiime longitudinal linear-mixed-effects \
  --m-metadata-file metadata.tsv \
  --m-metadata-file child-core-metrics-results/shannon_vector.qza \
  --p-metric shannon \
  --p-random-effects day_of_life \
  --p-group-columns delivery,diet \
  --p-state-column day_of_life \
  --p-individual-id-column host_subject_id \
  --o-visualization lme-shannon.qzv

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Visualization to: lme-shannon.qzv[0m


### 2. Volatility visualization

In [40]:
!qiime longitudinal volatility \
  --m-metadata-file metadata.tsv \
  --m-metadata-file child-core-metrics-results/shannon_vector.qza \
  --p-default-metric shannon \
  --p-default-group-column delivery \
  --p-state-column day_of_life \
  --p-individual-id-column host_subject_id \
  --o-visualization shannon-volatility.qzv

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Visualization to: shannon-volatility.qzv[0m


## Differential abundance testing

### Option 1: ANCOM

In [41]:
# Create a new feature table that contains only samples from children at 6 months
!qiime feature-table filter-samples \
  --i-table child-table-norep.qza \
  --m-metadata-file metadata.tsv \
  --p-where "[month]='6'" \
  --o-filtered-table table-norep-C6.qza

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved FeatureTable[Frequency] to: table-norep-C6.qza[0m


In [42]:
# filter out low abundant features
!qiime feature-table filter-features \
  --i-table table-norep-C6.qza \
  --p-min-samples 5 \
  --p-min-frequency 20 \
  --o-filtered-table filtered-table-C6.qza

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved FeatureTable[Frequency] to: filtered-table-C6.qza[0m


In [43]:
# add pseudo count
!qiime composition add-pseudocount \
  --i-table filtered-table-C6.qza \
  --o-composition-table comp-table-C6.qza

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved FeatureTable[Composition] to: comp-table-C6.qza[0m


In [44]:
# run ANCOM
!qiime composition ancom \
  --i-table comp-table-C6.qza \
  --m-metadata-file metadata.tsv \
  --m-metadata-column delivery \
  --o-visualization ancom-C6-delivery.qzv

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved Visualization to: ancom-C6-delivery.qzv[0m


### Option 2: songbird

In [45]:
# make a folder to store songbird results
!mkdir songbird-results

In [46]:
# run songbird
!qiime songbird multinomial \
  --i-table table-norep-C6.qza \
  --m-metadata-file metadata.tsv \
  --p-formula "delivery+abx_exposure+diet+sex" \
  --p-epochs 10000 \
  --p-differential-prior 0.5 \
  --o-differentials songbird-results/differentials6monthControlled.qza \
  --o-regression-stats songbird-results/regression-stats6monthControlled.qza \
  --o-regression-biplot songbird-results/regression-biplot6monthControlled.qza

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved FeatureData[Differential] to: songbird-results/differentials6monthControlled.qza[0m
[32mSaved SampleData[SongbirdStats] to: songbird-results/regression-stats6monthControlled.qza[0m
[32mSaved PCoAResults % Properties('biplot') to: songbird-results/regression-biplot6monthControlled.qza[0m


In [47]:
# examine estimated coefficients
!qiime tools export \
  --input-path songbird-results/differentials6monthControlled.qza \
  --output-path songbird-results/exported-differentials6monthControlled

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mExported songbird-results/differentials6monthControlled.qza as DifferentialDirectoryFormat to directory songbird-results/exported-differentials6monthControlled[0m


## Meta-analysis through Qiita database using redbiom

In [48]:
# check the name of contexts and number of samples and features indexed
!redbiom summarize contexts

ContextName	SamplesWithData	FeaturesWithData	Description
Pick_closed-reference_OTUs-Greengenes-Illumina-16S-V34-5c6506	72	2221	Pick closed-reference OTUs (reference-seq: |databases|gg|13_8|rep_set|97_otus.fasta) | Split libraries FASTQ
Deblur-Illumina-16S-V3-V4-150nt-780653	60	869	Deblur (Reference phylogeny for SEPP: Greengenes_13.8, BIOM: reference-hit.biom) | Trimming (length: 150)
Pick_closed-reference_OTUs-Greengenes-IonTorrent-16S-V3-100nt-a243a1	32	1089	Pick closed-reference OTUs (reference-seq: |databases|gg|13_8|rep_set|97_otus.fasta) | Trimming (length: 100)
Pick_closed-reference_OTUs-SILVA-Illumina-16S-V3-54d83f	138	1014	Pick closed-reference OTUs (reference-seq: |projects|qiita_data|reference|silva_119_Silva_119_rep_set97.fna) | Split libraries FASTQ
Pick_closed-reference_OTUs-Greengenes-Illumina-16S-V4-150nt-bd7d4d	137343	71900	Pick closed-reference OTUs (reference-seq: |databases|gg|13_8|rep_set|97_otus.fasta) | Trimming (length: 150)
Pick_closed-reference_OTUs-Gree

In [49]:
# identify samples where interested sequence was observed
!redbiom search features --context Deblur-Illumina-16S-V4-150nt-780653 \
TACGTAGGGTGCAAGCGTTATCCGGAATTATTGGGCGTAAAGGGCTCGTAGGCGGTTCGTCGCGTCCGGTGTGAAAGTCCATCGCTTAACGGTGGATCTGCGCCGGGTACGGGCGGGCTGGAGTGCGGTAGGGGAGACTGGAATTCCCGG > observed_samples.txt


In [50]:
# search against only EMP samples
!redbiom summarize samples \
  --category empo_3 \
  --from observed_samples.txt

Animal distal gut	7124
Animal surface	331
Surface (non-saline)	204
Sterile water blank	102
Animal secretion	91
animal distal gut	68
Animal corpus	58
Water (non-saline)	15
Plant corpus	13
Animal proximal gut	12
Aerosol (non-saline)	9
Soil (non-saline)	6
Water (saline)	6
Single strain	6
not provided	2
Sediment (saline)	2
Surface (saline)	1

Total samples	8050


In [51]:
# search against infant samples
!redbiom select samples-from-metadata \
  --context Deblur-Illumina-16S-V4-150nt-780653 \
  --from observed_samples.txt "where (host_age < 3 or age < 3) and qiita_study_id != 10249" > infant_samples.txt


In [52]:
# summarize the metadata of infant samples
!redbiom search metadata \
  --categories birth

!redbiom summarize metadata birth_method birth_mode

!redbiom summarize samples \
     --category birth_mode \
     --from infant_samples.txt

birth
child_3_preterm_birth
birth_year
child_3_birth_weight
child_1_birth_weight_unit
child_2_birth_weight_unit
birth_head_cir
birth_length
birth_date
child_2_birth_length
birth_weight_units
year_of_birth
birth_length_units
antibiotics_at_birth
birth_head_cir_units
live_births
birth_ga_w_units
birth_complications
place_of_birth
child_3_birth_weight_unit
country_of_birth
birth_route_2cat
birth_ga_d_units
birth_mode
child_3_birth_length
date_of_birth
mouse_birth
type_birth_location
child_1_birth_weight
child_1_birth_length
birth_ga_d
child_2_birth_length_unit
birth_ga_w
birth_season
birth_days
birth_wt
child_2_birth_weight
antibiotics_after_birth
child_2_preterm_birth
child_1_preterm_birth
weight_at_birth
child_1_birth_length_unit
birth_control
still_births
birth_weight
baby_birth_date
birth_method
child_3_birth_length_unit
birth_wt_sd
birth_location
birth_method	72
birth_mode	2176
Vaginal	38
Cesarea	16
Vag	3
CSseed	1

Total samples	58


In [53]:
# check sample balance in modes of delivery
!redbiom summarize metadata-category \
  --counter \
  --category birth_mode

Category value	count
Cesarea	47
Vaginal	135
CSseed	335
Vag	689
CS	970


In [54]:
# summarize samples over study id category
!redbiom summarize samples \
  --category qiita_study_id \
  --from infant_samples.txt

10581	54
10918	30
11076	19
1064	15
11947	10
11358	10
2010	4
10512	3
11284	1

Total samples	146


## Supprot Protocols: Exporting QIIME 2 data

In [55]:
!qiime tools export \
  --input-path insertion-tree.qza \
  --output-path extracted-insertion-tree

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mExported insertion-tree.qza as NewickDirectoryFormat to directory extracted-insertion-tree[0m


## Support protocols: Analysis of shotgun metagenomic data

### Download all the required example files

In [56]:
!for i in query refseqs taxonomy bt2-database; \
do wget https://github.com/qiime2/q2-shogun/raw/master/q2_shogun/tests/data/$i.qza; done

--2020-01-14 21:47:24--  https://github.com/qiime2/q2-shogun/raw/master/q2_shogun/tests/data/query.qza
Resolving github.com... 192.30.255.112
Connecting to github.com|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/qiime2/q2-shogun/master/q2_shogun/tests/data/query.qza [following]
--2020-01-14 21:47:24--  https://raw.githubusercontent.com/qiime2/q2-shogun/master/q2_shogun/tests/data/query.qza
Resolving raw.githubusercontent.com... 151.101.196.133
Connecting to raw.githubusercontent.com|151.101.196.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12700 (12K) [application/octet-stream]
Saving to: ‘query.qza’


2020-01-14 21:47:25 (5.52 MB/s) - ‘query.qza’ saved [12700/12700]

--2020-01-14 21:47:25--  https://github.com/qiime2/q2-shogun/raw/master/q2_shogun/tests/data/refseqs.qza
Resolving github.com... 192.30.255.112
Connecting to github.com|192.30.255.112|:443... connected.
HTTP 

### Run shotgun metagenomics pipeline

In [57]:
!qiime shogun nobunaga \
  --i-query query.qza \
  --i-reference-reads refseqs.qza \
  --i-reference-taxonomy taxonomy.qza \
  --i-database bt2-database.qza \
  --o-taxa-table taxatable.qza

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[32mSaved FeatureTable[Frequency] to: taxatable.qza[0m
