# Mackerel Data Analysis
Roughly following the structure of the QIIME 2 "moving pictures" tutorial -- this focuses on just getting the data ready for analysis in Songbird and Qurro.

The data is from [study 11721 on Qiita](https://qiita.ucsd.edu/study/description/11721), and is associated with a manuscript currently in preparation (Minich et al. 2019, preprint available on bioRxiv [here](https://www.biorxiv.org/content/10.1101/721555v1)).

The input data in this notebook is 150nt sOTU data, corresponding to artifact ID `56427` on Qiita. These data were processed on Qiita using QIIME 1.9.1 (`Split libraries FASTQ` and `Trimming`), then denoised using Deblur 1.1.0. The `reference-hit` BIOM table and FASTA file (both located in the `input/` directory) are used as the starting point for this analysis.

**If you run into any trouble trying to replicate this analysis, feel free to raise an issue [on this analysis' GitHub repository](https://github.com/knightlab-analyses/qurro-mackerel-analysis) and I'll do my best to help troubleshoot the problem.**

(Or, alternately, if you don't like how I did something here then also feel free to raise an issue ;)

## 0. Setting up
### 0.1. Declare the locations of the input files and output directory
Before rerunning this notebook on your system, there are two (hopefully painless) things you'll need to do:

1. Download the SILVA 132 database ([link here](https://www.arb-silva.de/download/archive/qiime)), and
2. Update the absolute filepaths in the cell below according to where on your system this notebook and/or the SILVA 132 database files are located.

For reference, the taxonomic classification steps in this notebook use the SILVA 132 99% reference 16S sequences and majority 7-level taxonomy (downloaded from [here](https://www.arb-silva.de/download/archive/qiime) on November 10, 2019).

In [1]:
# Input Data Locations (trimmed-to-150-nt and deblurred BIOM table and sequences,
# as well as sample metadata)
%env INPUT_BIOM_TABLE_PATH=/home/mfedarko/analyses/qurro-mackerel-analysis/input/reference-hit.biom
%env INPUT_REP_SEQS_PATH=/home/mfedarko/analyses/qurro-mackerel-analysis/input/reference-hit.seqs.fa
%env INPUT_SAMPLE_METADATA_PATH=/home/mfedarko/analyses/qurro-mackerel-analysis/input/11721_prep_4638_qiime_20190722-104633.txt
%env INPUT_REFERENCE_SEQS_PATH=/home/mfedarko/analyses/silva-132/SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna
%env INPUT_REFERENCE_TAXONOMY_PATH=/home/mfedarko/analyses/silva-132/SILVA_132_QIIME_release/taxonomy/16S_only/99/majority_taxonomy_7_levels.txt

# Output directory (will contain all .qza and .qzv files generated by this analysis)
%env OUTPUT_DIRECTORY=/home/mfedarko/analyses/qurro-mackerel-analysis/output

env: INPUT_BIOM_TABLE_PATH=/home/mfedarko/analyses/qurro-mackerel-analysis/input/reference-hit.biom
env: INPUT_REP_SEQS_PATH=/home/mfedarko/analyses/qurro-mackerel-analysis/input/reference-hit.seqs.fa
env: INPUT_SAMPLE_METADATA_PATH=/home/mfedarko/analyses/qurro-mackerel-analysis/input/11721_prep_4638_qiime_20190722-104633.txt
env: INPUT_REFERENCE_SEQS_PATH=/home/mfedarko/analyses/silva-132/SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna
env: INPUT_REFERENCE_TAXONOMY_PATH=/home/mfedarko/analyses/silva-132/SILVA_132_QIIME_release/taxonomy/16S_only/99/majority_taxonomy_7_levels.txt
env: OUTPUT_DIRECTORY=/home/mfedarko/analyses/qurro-mackerel-analysis/output


### 0.2. Move into the output directory

In [2]:
import os
odir = os.environ["OUTPUT_DIRECTORY"]
os.chdir(odir)
print("Moved into output directory: {}".format(odir))

Moved into output directory: /home/mfedarko/analyses/qurro-mackerel-analysis/output


### 0.3. Get information about the current QIIME 2 environment
(For future reference.)

We assume that a QIIME 2 2019.7 environment is already activated when you start up this notebook.

In [38]:
!qiime info

[32mSystem versions[0m
Python version: 3.6.7
QIIME 2 release: 2019.7
QIIME 2 version: 2019.7.0
q2cli version: 2019.7.0
[32m
Installed plugins[0m
alignment: 2019.7.0
composition: 2019.7.0
cutadapt: 2019.7.0
dada2: 2019.7.0
deblur: 2019.7.0
deicode: 0.2.3
demux: 2019.7.0
diversity: 2019.7.0
emperor: 2019.7.0
feature-classifier: 2019.7.0
feature-table: 2019.7.0
fragment-insertion: 2019.7.0
gneiss: 2019.7.0
longitudinal: 2019.7.0
metadata: 2019.7.0
phylogeny: 2019.7.0
quality-control: 2019.7.0
quality-filter: 2019.7.0
qurro: 0.4.0
sample-classifier: 2019.7.0
songbird: 1.0.1
taxa: 2019.7.0
types: 2019.7.0
vsearch: 2019.7.0
[32m
Application config directory[0m
/home/mfedarko/.config/q2cli[0m
[32m
Getting help[0m
To get help with QIIME 2, visit https://qiime2.org[0m


## 1. Import data into QIIME 2 artifacts
See [the QIIME 2 documentation on importing data](https://docs.qiime2.org/2019.4/tutorials/importing/) for context on why this is necessary and useful.

### 1.1. Import the study's data
Note that this dataset doesn't just contain data about the microbiota of pacific chub mackerel: it also contains samples taken from other species of fish, as well as well as environmental samples. We'll filter some of these samples out of the dataset soon.

In [39]:
!qiime tools import \
    --type "FeatureTable[Frequency]" \
    --input-path $INPUT_BIOM_TABLE_PATH \
    --output-path table-unfiltered.qza
!qiime tools import \
    --type "FeatureData[Sequence]" \
    --input-path $INPUT_REP_SEQS_PATH \
    --output-path rep-seqs-unfiltered.qza

[32mImported /home/mfedarko/analyses/qurro-mackerel-analysis/input/reference-hit.biom as BIOMV210DirFmt to table-unfiltered.qza[0m
[32mImported /home/mfedarko/analyses/qurro-mackerel-analysis/input/reference-hit.seqs.fa as DNASequencesDirectoryFormat to rep-seqs-unfiltered.qza[0m


### 1.2. Summarize the imported study table and representative sequence data
This gives us information about the number of samples and sequences present in these files. It's useful for sanity-checking the filtering that will be done in the next section.

In [40]:
!qiime feature-table summarize \
    --i-table table-unfiltered.qza \
    --o-visualization table-unfiltered-summary.qzv \
    --m-sample-metadata-file $INPUT_SAMPLE_METADATA_PATH

!qiime feature-table tabulate-seqs \
    --i-data rep-seqs-unfiltered.qza \
    --o-visualization rep-seqs-unfiltered-summary.qzv

[32mSaved Visualization to: table-unfiltered-summary.qzv[0m
[32mSaved Visualization to: rep-seqs-unfiltered-summary.qzv[0m


## 2. Filter the study's data
In particular, we'll filter the feature table (and representative sequences) to only contain samples that satisfy both of the following two conditions:

1. have a `host_common_name` of `pacific chub mackerel` OR have a `sample_type_body_site` of `sea water`
2. contain at least 1,370 sequences

### 2.1. Why do we filter to just pacific chub mackerel and sea water samples?
If you examine `table-unfiltered-summary.qzv` (in particular the "Interactive Sample Detail" tab), you should see that only 1,167 / 1,522 samples have a `host_common_name` of `pacific chub mackerel`. We're going to look at how samples taken from various body sites of these mackerel differ from environmental samples (in particular, just samples taken from sea water).

In order to perform this analysis, we filter the table to just samples where `host_common_name` is `pacific chub mackerel` *or* samples where `sample_type_body_site` is `sea water`.

### 2.2. Why do we filter out samples with less than 1,370 sequences?
This isn't rarefaction—the remaining samples have all their sequences preserved—but samples with less than this number of sequences are removed from the analysis from here on down.

This number was obtained using the KatharoSeq protocol, which involves looking at the positive controls processed alongside these samples: see [Minich et al. 2018](https://msystems.asm.org/content/3/3/e00218-17.abstract) for a description of how KatharoSeq works. (The actual sample exclusion threshold number for this dataset, on the `reference-hit` table processed through Deblur 1.1.0, was 1,369.5, but we round up here since "half a count" doesn't really make sense.)

In [41]:
!qiime feature-table filter-samples \
    --i-table table-unfiltered.qza \
    --m-metadata-file $INPUT_SAMPLE_METADATA_PATH \
    --p-where "host_common_name='pacific chub mackerel' OR sample_type_body_site='sea water'" \
    --p-min-frequency 1370 \
    --o-filtered-table table-partially-filtered.qza

# Filter rep-seqs-unfiltered.qza to only include sequences present in the partially-filtered table
!qiime feature-table filter-seqs \
    --i-table table-partially-filtered.qza \
    --i-data rep-seqs-unfiltered.qza \
    --o-filtered-data rep-seqs-partially-filtered.qza

[32mSaved FeatureTable[Frequency] to: table-partially-filtered.qza[0m
[32mSaved FeatureData[Sequence] to: rep-seqs-partially-filtered.qza[0m


## 3. Filter the study's data again, but in a less important way
Songbird automatically filters out features with less than 10 counts (this is configurable with the `--p-min-feature-count` parameter in `qiime songbird multinomial`). To avoid wasting time trying to assign taxonomy to these features, we filter these out upstream in order to save time and resources. This step should have *no impact* on the actual results of this analysis.

(Songbird also automatically filters out samples with less than 1,000 counts, but we already filtered these samples out of the table due to our use of KatharoSeq anyway -- so that behavior won't make a difference.)

In [42]:
!qiime feature-table filter-features \
    --i-table table-partially-filtered.qza \
    --p-min-frequency 10 \
    --o-filtered-table table.qza

!qiime feature-table filter-seqs \
    --i-table table.qza \
    --i-data rep-seqs-partially-filtered.qza \
    --o-filtered-data rep-seqs.qza

[32mSaved FeatureTable[Frequency] to: table.qza[0m
[32mSaved FeatureData[Sequence] to: rep-seqs.qza[0m


## 4. Summarize the filtered data
This will let us double-check that the filtering steps above were done properly.

In [43]:
!qiime feature-table summarize \
    --i-table table.qza \
    --m-sample-metadata-file $INPUT_SAMPLE_METADATA_PATH \
    --o-visualization table-summary.qzv

!qiime feature-table tabulate-seqs \
    --i-data rep-seqs.qza \
    --o-visualization rep-seqs-summary.qzv

[32mSaved Visualization to: table-summary.qzv[0m
[32mSaved Visualization to: rep-seqs-summary.qzv[0m


## 5. Taxonomic classification
You don't *need* taxonomy information (also known as "feature metadata", in the sense that taxonomy is information about your features [provided that, uh, your features are sequences]) to run Qurro (or Songbird or DEICODE, for that matter). However, having this information available is extremely useful in interpreting a Qurro visualization -- this is why we'll perform taxonomic classification on our dataset's features.

We're going to do this taxonomic classification using q2-feature-classifier's `classify-sklearn` Naive Bayes classifier (see [Bokulich et al. 2018](https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0470-z)) and based on the SILVA 132 database (see [Quast et al. 2012](https://academic.oup.com/nar/article/41/D1/D590/1069277)). Rather than use [the pre-trained QIIME 2 SILVA classifier that targets the V4 region](https://docs.qiime2.org/2019.7/data-resources/), we train our own because our samples' primers are very slightly different:

Forward primers:
```
 This study's samples: GTGYCAGCMGCCGCGGTAA
Pretrained Classifier: GTGCCAGCMGCCGCGGTAA
```

Reverse primers:
```
 This study's samples: GGACTACNVGGGTWTCTAAT
Pretrained Classifier: GGACTACHVGGGTWTCTAAT
```

So we can see that there are two differences in the primers (`Y` vs. `C` in the forward, and `N` vs. `H` in the reverse). As is explained on the [EMP page describing 16S protocols](http://press.igsb.anl.gov/earthmicrobiome/protocols-and-standards/16s/), these differences seem to be due to the EMP-protocol primers used to target the V4 region being updated: this study uses the "updated" primers, while the pre-trained QIIME 2 SILVA classifier uses the "original" primers (at least, as of QIIME 2 2019.7).

This discrepancy [really shouldn't make a difference](https://forum.qiime2.org/t/primer-sequences-used-for-trained-silva119-515f-806r/2888/6), but I figure we might as well account for this anyway just so we can be super extra confident about our results.

Note that, during the process of importing SILVA files / training the classifier, most of the steps / parameters taken were based on replicating the provenance of the SILVA 515F/806R pre-trained QIIME 2 Naive Bayes classifier available at https://docs.qiime2.org/2019.7/data-resources/ (as well as the [tutorial on training feature classifiers](https://docs.qiime2.org/2019.10/tutorials/feature-classifier/), of course).

### 5.1. Import SILVA database files

#### 5.1.1. Why do we use the `rep_set` sequence data, rather than the `rep_set_aligned` sequence data?
I wasn't sure about this, so I did some digging (see [this post](https://forum.qiime2.org/t/classifier-training-questions/1162/3)) and found out that `rep_set` is what should be used with QIIME 2's classifiers. (That particular post is describing Greengenes, not SILVA, but the SILVA 132 documentation makes it clear that these folders are mostly analogous to those in Greengenes.)

#### 5.1.2. Why do we use the majority 7-level taxonomy?
The 7-level thing is because we're working with 16S data (i.e. our sequences should all be derived from bacterial or archaeal genomes), and as I understand it using the 7-level taxonomy is needed to get many classifiers working properly.

The use of the "majority" taxonomy (instead of the raw or consensus taxonomy) is a bit more subjective; since I am by no means a phylogenetics expert, I went with the majority taxonomy because it seems to be a reasonable thing to do, based on comments from both [QIIME 2](https://forum.qiime2.org/t/silva-taxonomy-classifier-clarification/5087/3) and [SILVA](https://forum.qiime2.org/t/silva-132-classifiers/3698/11) folks.

#### 5.1.3. Why did you bother explaining that? That was all obvious to me!

Well, a lot of it wasn't obvious to me! ;) I'm trying to be this explicit for my own sanity, so I can come back to this notebook in the future and understand why I did the things I did. Furthermore, a lot of this is information that isn't laid out in a single clear place (at least that I could find), which is why I had to go searching for it... maybe seeing a description like this will help someone with their analysis.

That all being said, I'm sure you could rerun taxonomic classification (or really, any other part of this notebook) with slightly different files/parameters and maybe get slightly different results, but I'll be honest -- at this point I've reran this notebook upwards of 5 times (I think, I haven't been keeping track) because I keep finding things to improve. So, I've reran it with different classifiers, different reference databases, ..., and things look mostly the same regardless.

In [51]:
!qiime tools import \
    --type "FeatureData[Sequence]" \
    --input-path $INPUT_REFERENCE_SEQS_PATH \
    --output-path silva-132-99-16S-seqs.qza

!qiime tools import \
    --type "FeatureData[Taxonomy]" \
    --input-format HeaderlessTSVTaxonomyFormat \
    --input-path $INPUT_REFERENCE_TAXONOMY_PATH \
    --output-path silva-132-99-16S-majority-7-level-taxonomy.qza

[32mImported /home/mfedarko/analyses/silva-132/SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna as DNASequencesDirectoryFormat to silva-132-99-16S-seqs.qza[0m
[32mImported /home/mfedarko/analyses/silva-132/SILVA_132_QIIME_release/taxonomy/16S_only/99/majority_taxonomy_7_levels.txt as HeaderlessTSVTaxonomyFormat to silva-132-99-16S-majority-7-level-taxonomy.qza[0m


### 5.2. Extract reads based on primers
We use the exact same primers used in this study (being able to use our exact primers is why we're going through the trouble of training a classifier in the first place...), and otherwise use default parameters. (The use of these default parameters seems to match up with the parameters used when extracting reads for the aforemented pre-trained SILVA 515F/806R QIIME 2 classifier, based on that classifier's provenance.)

In [58]:
!qiime feature-classifier extract-reads \
    --i-sequences silva-132-99-16S-seqs.qza \
    --p-f-primer GTGYCAGCMGCCGCGGTAA \
    --p-r-primer GGACTACNVGGGTWTCTAAT \
    --o-reads silva-132-99-16S-extracted-seqs.qza

[32mSaved FeatureData[Sequence] to: silva-132-99-16S-extracted-seqs.qza[0m


### 5.3. Train the classifier
I haven't split this across multiple jobs because I couldn't find an option to do that for this command.

In [59]:
!qiime feature-classifier fit-classifier-naive-bayes \
    --i-reference-reads silva-132-99-16S-extracted-seqs.qza \
    --i-reference-taxonomy silva-132-99-16S-majority-7-level-taxonomy.qza \
    --p-verbose \
    --verbose \
    --o-classifier silva-132-99-16S-v4-classifier.qza

[Pipeline] .......... (step 1 of 2) Processing feat_ext, total=  41.1s
[Pipeline] ......... (step 2 of 2) Processing classify, total=100.2min
[32mSaved TaxonomicClassifier to: silva-132-99-16S-v4-classifier.qza[0m


### 5.4. Actually use the classifier to perform taxonomic classification!
Phew.

In [3]:
!qiime feature-classifier classify-sklearn \
    --i-reads rep-seqs.qza \
    --i-classifier silva-132-99-16S-v4-classifier.qza \
    --p-n-jobs 4 \
    --verbose \
    --o-classification taxonomy.qza

[32mSaved FeatureData[Taxonomy] to: taxonomy.qza[0m


## 6. Run Songbird
This will generate feature differentials, which we'll visualize in Qurro.
For details on what Songbird does and how it works, please see [Songbird's GitHub page](https://github.com/biocore/songbird/), as well as [Morton and Marotz et al. 2019](https://www.nature.com/articles/s41467-019-10656-5).

### 6.1. Why do we manually set certain Songbird parameters?
These parameters were chosen based on consulting with the `songbird-regression-summary.qzv` generated below to ensure that they resulted in a reasonable model fit. See [this section of Songbird's README](https://github.com/biocore/songbird#interpreting-model-fitting) for details on model fitting.

#### `--p-formula`
This parameter is used by Songbird to determine what sample metadata fields should be used as covariates when generating differentials. Here, we generate differentials relative to the `sample_type_body_site` field (using the `sea water` values of this field as a reference), but there are of course plenty of other options for fields that could be used here if we'd like to ask different questions about this data.

#### `--p-epochs`, `--p-learning-rate`
These parameters, in particular, are closely related to Songbird's runtime:

- We've increased `--p-epochs` from the default of 1,000 to 10,000 to make Songbird run longer (we're working with a fairly large dataset).
- We've decreased `--p-learning-rate` from the default of 0.001 to 0.0001 to similarly increase Songbird's run time.

#### `--p-batch-size`
We've increased `--p-batch-size` from the default of 5 to 40 to make Songbird process a larger amount of samples at once in each iteration. Since we have a lot of samples in this dataset, and since our samples fall into six "categories" (the five mackerel body sites, plus sea water samples), using a larger batch size (that stands a better chance of reflecting this diversity) makes sense.

#### `--p-num-random-test-examples`
Quoting Songbird's documentation again, this is "\[the number\] of random samples to hold out for cross-validation if `training-column` is not specified." The default for this is 5 (i.e. use just 5 samples for cross-validation); since we have the luxury of having a lot of samples in this dataset, we can afford to hold out more samples. This is why we've increased this to 40 samples, analogously to how and why we increased `--p-batch-size`.

#### `--p-summary-interval`
This is the frequency (in seconds) of how often Songbird saves model fitting statistics in the `--o-regression-stats` output artifact. More frequent measurements will help us more accurately diagnose if Songbird's model is fitting reasonably to the dataset.

In [4]:
!qiime songbird multinomial \
    --i-table table.qza \
    --m-metadata-file $INPUT_SAMPLE_METADATA_PATH \
    --p-formula "C(sample_type_body_site, Treatment('sea water'))" \
    --p-epochs 10000 \
    --p-learning-rate 0.0001 \
    --p-num-random-test-examples 40 \
    --p-batch-size 40 \
    --p-summary-interval 1 \
    --verbose \
    --o-differentials songbird-differentials.qza \
    --o-regression-stats songbird-regression-stats.qza \
    --o-regression-biplot songbird-regression-biplot.qza

W1110 22:41:00.718637 140255813809984 deprecation_wrapper.py:119] From /home/mfedarko/.conda/envs/qiime2-2019.7/lib/python3.6/site-packages/songbird/q2/_method.py:53: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2019-11-10 22:41:00.719544: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2019-11-10 22:41:00.729424: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2599980000 Hz
2019-11-10 22:41:00.732982: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a48580e220 executing computations on platform Host. Devices:
2019-11-10 22:41:00.733019: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
OMP: Info #212:

2019-11-10 22:41:01.251438: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
OMP: Info #250: KMP_AFFINITY: pid 15070 tid 15189 thread 1 bound to OS proc set 1
OMP: Info #250: KMP_AFFINITY: pid 15070 tid 15190 thread 2 bound to OS proc set 2
OMP: Info #250: KMP_AFFINITY: pid 15070 tid 15193 thread 4 bound to OS proc set 4
OMP: Info #250: KMP_AFFINITY: pid 15070 tid 15192 thread 3 bound to OS proc set 3
OMP: Info #250: KMP_AFFINITY: pid 15070 tid 15196 thread 7 bound to OS proc set 7
OMP: Info #250: KMP_AFFINITY: pid 15070 tid 15195 thread 6 bound to OS proc set 6
OMP: Info #250: KMP_AFFINITY: pid 15070 tid 15194 thread 5 bound to OS proc set 5
OMP: Info #250: KMP_AFFINITY: pid 15070 tid 15197 thread 8 bound to OS proc set 8
OMP: Info #250: KMP_AFFINITY: pid 15070 tid 15198 thread 9 bound to OS proc set 9
OMP: Info #250: KMP_AFFINITY: pid 15070 tid 15199 thread 10 bound to OS proc set 10
OMP: Info #250: KMP_AFFINITY: pid 15070 tid 15201 thread 12 bound to

 25%|████████▊                          | 35111/140000 [00:58<02:53, 604.54it/s]2019-11-10 22:41:59.066325: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
 26%|████████▉                          | 35721/140000 [00:59<02:51, 608.10it/s]2019-11-10 22:42:00.067823: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
 26%|█████████                          | 36329/140000 [01:00<02:52, 599.87it/s]2019-11-10 22:42:01.068911: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
 26%|█████████▏                         | 36872/140000 [01:00<02:52, 597.23it/s]2019-11-10 22:42:02.070368: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
 27%|█████████▎                         | 37474/140000 [01:01<02:50, 600.41it/s]2019-11-10 22:42:03.071251: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
 27%|█████████▌                         | 38083/14

 62%|█████████████████████▋             | 86581/140000 [02:24<01:29, 595.39it/s]2019-11-10 22:43:25.141264: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
 62%|█████████████████████▊             | 87189/140000 [02:25<01:28, 599.73it/s]2019-11-10 22:43:26.142262: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
 63%|█████████████████████▉             | 87796/140000 [02:26<01:26, 600.80it/s]2019-11-10 22:43:27.143829: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
 63%|██████████████████████             | 88408/140000 [02:27<01:24, 607.20it/s]2019-11-10 22:43:28.146079: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
 64%|██████████████████████▎            | 89012/140000 [02:28<01:25, 597.57it/s]2019-11-10 22:43:29.146470: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
 64%|██████████████████████▍            | 89618/14

 98%|█████████████████████████████████▍| 137872/140000 [03:50<00:03, 612.78it/s]2019-11-10 22:44:51.219211: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
 99%|█████████████████████████████████▋| 138493/140000 [03:51<00:02, 616.34it/s]2019-11-10 22:44:52.219903: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
 99%|█████████████████████████████████▊| 139108/140000 [03:52<00:01, 610.10it/s]2019-11-10 22:44:53.220695: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
100%|█████████████████████████████████▉| 139726/140000 [03:53<00:00, 612.27it/s]2019-11-10 22:44:54.222009: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
100%|██████████████████████████████████| 140000/140000 [03:53<00:00, 599.23it/s]
[32mSaved FeatureData[Differential] to: songbird-differentials.qza[0m
[32mSaved SampleData[SongbirdStats] to: songbird-regression-stats.qza[0m
[32mSaved P

### 6.2. Visualize Songbird model fitting statistics
For more information about how to interpret this visualization, check out [this section of the Songbird README](https://github.com/biocore/songbird#interpreting-model-fitting).

In [5]:
!qiime songbird summarize-single \
    --i-regression-stats songbird-regression-stats.qza \
    --o-visualization songbird-regression-summary.qzv

[32mSaved Visualization to: songbird-regression-summary.qzv[0m


## 7. Run Qurro!
We're doing this using Qurro v0.4.0. Note that the version of the "mackerel demo" up on [Qurro's website](https://biocore.github.io/qurro) will be updated as Qurro itself is updated (so although the underlying data should remain the same, the visualization interface might look a bit different/contain a few more features in a few months).

In [6]:
!qiime qurro differential-plot \
    --i-table table.qza \
    --i-ranks songbird-differentials.qza \
    --m-sample-metadata-file $INPUT_SAMPLE_METADATA_PATH \
    --m-feature-metadata-file taxonomy.qza \
    --verbose \
    --o-visualization qurro-plot.qzv

7955 feature(s) in the BIOM table were not present in the feature rankings.
These feature(s) have been removed from the visualization.
1248 sample(s) in the sample metadata file were not present in the BIOM table.
These sample(s) have been removed from the visualization.
[32mSaved Visualization to: qurro-plot.qzv[0m
