# Global Foodomics Visualization

This figure re-visualizes the food metabolome dataset used in [Tripathi and Vazquez-Baeza et al. 2020](https://www.biorxiv.org/content/10.1101/2020.05.04.077636v1). This is an untargeted metabolomics study looking at food-associated samples in the context of a cladogram created using [*q2-qemistree*](https://github.com/biocore/q2-qemistree).

The main goal of this figure is demonstrating the ease of visualizing feature prevalence on an EMPress tree, allowing us to do things like quickly see which types of samples contain features present in certain clades.

## Requirements
This should be run from within a QIIME 2 (at least version 2019.10) conda environment, with EMPress installed. Please see [EMPress' README](https://github.com/biocore/empress) for the most up-to-date installation instructions.

In [1]:
import qiime2 as q2, pandas as pd, numpy as np, seaborn as sns, biom

In [2]:
mkdir -p fig2b/output

In [3]:
fm = q2.Artifact.load('fig2b/input/feature-data-classified.qza').view(pd.DataFrame)

In [4]:
fm.superclass.value_counts()

unclassified                               246
Lipids and lipid-like molecules            144
Organic acids and derivatives               61
Phenylpropanoids and polyketides            52
Benzenoids                                  36
Organoheterocyclic compounds                33
Organic oxygen compounds                    29
Organic nitrogen compounds                  25
Alkaloids and derivatives                    4
Nucleosides, nucleotides, and analogues      3
SMILE parse error                            1
Name: superclass, dtype: int64

Using q2-qemistree and [Classyfire](http://classyfire.wishartlab.com), we can obtain annotations for the MS features in a dataset. However, these annotations are not available for every feature. In order to narrow the scope of this analysis, we will remove all features that lack a superclass annotation.

In [5]:
observations = fm.query('superclass != "SMILE parse error" and superclass != "unclassified"').index.tolist()

Similarly, we'll also focus only on the samples with a *common meal type* annotation.

In [6]:
mf = q2.Metadata.load('fig2b/input/mapping-file.txt').to_dataframe()
samples = mf.query('common_meal_type != "not applicable"').index.tolist()

Using the selected samples and observations we can subset the feature table before visualizing the tree.

In [7]:
bt = q2.Artifact.load('fig2b/input/feature-table-hashed.qza').view(biom.Table)

bt.filter(set(samples) & set(bt.ids()), axis='sample', inplace=True)
bt.filter(observations, axis='observation', inplace=True)

q2.Artifact.import_data('FeatureTable[Frequency]', bt).save('fig2b/output/filtered-feature-table.qza')

'fig2b/output/filtered-feature-table.qza'

In [8]:
!qiime empress community-plot \
--i-tree fig2b/input/qemistree.qza \
--i-feature-table fig2b/output/filtered-feature-table.qza \
--m-sample-metadata-file fig2b/input/mapping-file.txt \
--m-feature-metadata-file fig2b/input/feature-data-classified.qza \
--o-visualization fig2b/output/fig2b.qzv

[32mSaved Visualization to: fig2b/output/fig2b.qzv[0m
