# Feature Volatility

Identify features that are predictive of a given state column (time) and plot their relative frequencies across states using interactive volatility plots.

First observe feature volatility in the 7 chosen taxa, then explore data for important taxa.

Produce volatility plots of given families first. 

1. Mycobacteriaceae
2. Legionellaceae
3. Burkholderiaceae
4. Azospirillaceae
5. Nitrosomonadaceae
6. Nitrospiraceae
7. Pseudomonadaceae

## Collapse tables to Family level and convert to relative frequencies

In [None]:
%%bash
qiime taxa collapse \
    --i-table AR-filtered-table.qza \
    --i-taxonomy taxonomy.qza \
    --p-level 5 \
    --o-collapsed-table family-level-table.qza

# Create table of relative frequencies
qiime feature-table relative-frequency \
    --i-table family-level-table.qza \
    --o-relative-frequency-table family-level-relative-frequency-table.qza

## Create Family level plots for the given Families

Each of these plots is exactly the same, the only diference is in which default taxa is shown in the plot. In other words, you can select Burkholderiaceae after opening the Mycobacteriaceae plot - its just easier to have a plot open to the given default value

In [None]:
# Mycobacteriaceae
qiime longitudinal volatility \
    --m-metadata-file sample-metadata.tsv \
    --i-table family-level-relative-frequency-table.qza \
    --p-default-metric "D_0__Bacteria;D_1__Actinobacteria;D_2__Actinobacteria;D_3__Corynebacteriales;D_4__Mycobacteriaceae" \
    --p-default-group-column Pipe_Material \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --o-visualization longitudinal/AR-Mycobacteriaceae-volatility.qzv

# Legionellaceae
qiime longitudinal volatility \
    --m-metadata-file sample-metadata.tsv \
    --i-table family-level-relative-frequency-table.qza \
    --p-default-metric "D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Legionellales;D_4__Legionellaceae" \
    --p-default-group-column Pipe_Material \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --o-visualization longitudinal/AR-Legionellaceae-volatility.qzv

# Burkholderiaceae
qiime longitudinal volatility \
    --m-metadata-file sample-metadata.tsv \
    --i-table family-level-relative-frequency-table.qza \
    --p-default-metric "D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Betaproteobacteriales;D_4__Burkholderiaceae" \
    --p-default-group-column Pipe_Material \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --o-visualization longitudinal/AR-Burkholderiaceae-volatility.qzv

# Azospirillaceae
qiime longitudinal volatility \
    --m-metadata-file sample-metadata.tsv \
    --i-table family-level-relative-frequency-table.qza \
    --p-default-metric "D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Azospirillales;D_4__Azospirillaceae" \
    --p-default-group-column Pipe_Material \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --o-visualization longitudinal/AR-Azospirillaceae-volatility.qzv

# Nitrosomonadaceae
qiime longitudinal volatility \
    --m-metadata-file sample-metadata.tsv \
    --i-table family-level-relative-frequency-table.qza \
    --p-default-metric "D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Betaproteobacteriales;D_4__Nitrosomonadaceae" \
    --p-default-group-column Pipe_Material \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --o-visualization longitudinal/AR-Nitrosomonadaceae-volatility.qzv

# Nitrospiraceae
qiime longitudinal volatility \
    --m-metadata-file sample-metadata.tsv \
    --i-table family-level-relative-frequency-table.qza \
    --p-default-metric "D_0__Bacteria;D_1__Nitrospirae;D_2__Nitrospira;D_3__Nitrospirales;D_4__Nitrospiraceae" \
    --p-default-group-column Pipe_Material \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --o-visualization longitudinal/AR-Nitrospiraceae-volatility.qzv

# Pseudomonadaceae
qiime longitudinal volatility \
    --m-metadata-file sample-metadata.tsv \
    --i-table family-level-relative-frequency-table.qza \
    --p-default-metric "D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Pseudomonadales;D_4__Pseudomonadaceae" \
    --p-default-group-column Pipe_Material \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --o-visualization longitudinal/AR-Pseudomonadaceae-volatility.qzv

# The following families were identified with the `longitudinal feature-volatility` plugin. See below for implementation.
# Rhodocyclaceae - Cast Iron dominant
qiime longitudinal volatility \
    --m-metadata-file sample-metadata.tsv \
    --i-table family-level-relative-frequency-table.qza \
    --p-default-metric "D_0__Bacteria;D_1__Proteobacteria;D_2__Gammaproteobacteria;D_3__Betaproteobacteriales;D_4__Rhodocyclaceae" \
     --p-default-group-column Pipe_Material \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --o-visualization longitudinal/AR-Rhodocyclaceae-volatility.qzv

# Chitinophagaceae - Cast Iron dominant
qiime longitudinal volatility \
    --m-metadata-file sample-metadata.tsv \
    --i-table family-level-relative-frequency-table.qza \
    --p-default-metric "D_0__Bacteria;D_1__Bacteroidetes;D_2__Bacteroidia;D_3__Chitinophagales;D_4__Chitinophagaceae" \
     --p-default-group-column Pipe_Material \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --o-visualization longitudinal/AR-Chitinophagaceae-volatility.qzv

# Beijerinckiaceae - Cement dominant
qiime longitudinal volatility \
    --m-metadata-file sample-metadata.tsv \
    --i-table family-level-relative-frequency-table.qza \
    --p-default-metric "D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rhizobiales;D_4__Beijerinckiaceae" \
     --p-default-group-column Pipe_Material \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --o-visualization longitudinal/AR-Beijerinckiaceae-volatility.qzv

## Identify Important Features with a Random Forest Regressor

Instead of selecting features ourselves we can use a random forest regressor to pick features that are most important in a given pipe material. Fits we'll have to split the tables and then collapse to the desired taxonomic level. Then we can train the classifier on the data in order to pick out the most important features.

## Split Tables

In [None]:
%%bash
qiime feature-table filter-samples \
    --i-table AR-filtered-table.qza \
    --m-metadata-file sample-metadata.tsv \
    --p-where "Pipe_Material = 'Cast Iron'" \
    --o-filtered-table cast-iron-table.qza

qiime feature-table filter-samples \
    --i-table AR-filtered-table.qza \
    --m-metadata-file sample-metadata.tsv \
    --p-where "Pipe_Material = 'Cement'" \
    --o-filtered-table cement-table.qza

## Collapse each to family and genus level

In [None]:
%%bash
# family level
qiime taxa collapse \
    --i-table cast-iron-table.qza \
    --i-taxonomy taxonomy.qza \
    --p-level 5 \
    --o-collapsed-table cast-iron-family-table.qza

qiime taxa collapse \
    --i-table cement-table.qza \
    --i-taxonomy taxonomy.qza \
    --p-level 5 \
    --o-collapsed-table cement-family-table.qza

# genus level
qiime taxa collapse \
    --i-table cast-iron-table.qza \
    --i-taxonomy taxonomy.qza \
    --p-level 6 \
    --o-collapsed-table cast-iron-genus-table.qza

qiime taxa collapse \
    --i-table cement-table.qza \
    --i-taxonomy taxonomy.qza \
    --p-level 6 \
    --o-collapsed-table cement-genus-table.qza

## Perform feature volatilty on each pipe material at the family level

In [None]:
%%bash
qiime longitudinal feature-volatility \
    --i-table cast-iron-family-table.qza \
    --m-metadata-file sample-metadata.tsv \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --p-n-jobs 8 \
    --output-dir longitudinal/cast-iron-feature-volatility
     
qiime longitudinal feature-volatility \
    --i-table cement-family-table.qza \
    --m-metadata-file sample-metadata.tsv \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --p-n-jobs 8 \
    --output-dir longitudinal/cement-feature-volatility

## Perform feature volatility on each Pipe Material at the Genus level

In [None]:
%%bash
qiime longitudinal feature-volatility \
    --i-table cast-iron-genus-table.qza \
    --m-metadata-file sample-metadata.tsv \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --p-n-jobs 4 \
    --output-dir longitudinal/cast-iron-feature-volatility2
     
qiime longitudinal feature-volatility \
    --i-table cement-genus-table.qza \
    --m-metadata-file sample-metadata.tsv \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --p-n-jobs 4 \
    --output-dir longitudinal/cement-feature-volatility2

## Compute on original table to see if there is any difference

In [None]:
%%bash
qiime longitudinal feature-volatility \
    --i-table family-level-table.qza \
    --m-metadata-file sample-metadata.tsv \
    --p-state-column Months_Since_Start \
    --p-individual-id-column Sample_Identifier \
    --p-n-jobs 4 \
    --output-dir longitudinal/AR-feature-volatility