# Evaluating Metagenomic Prediciton of the Metaproteome in a 6 Year Study of a Crohn's Patient

Date: 8/24/2018 <br>
Author: Robert Mills <br>
Environment: qiime2-2018.4 & qiime1 as designated

Project Abstract: <br><br>
Although genetic approaches are the standard in microbiome analysis, proteome-level information is largely absent. This discrepancy warrants a better understanding of the relationship between genetic copy number and protein abundance, as this is crucial information for inferring protein level changes from metagenomic data. As it is unknown how these systems are altered during disease states, we leverage a six-year fecal time series of a single patient with Colonic Crohn’s disease. Utilizing Tandem Mass Tag (TMT) multiplexed proteomics and shotgun metagenomic sequencing, we quantify over 29,000 protein groups and 110,000 genes and compare them to the clinical diagnostics of serum C-reactive protein, fecal calprotectin, and lysozyme. Results indicate that many broad scale observations were consistent between data types, including fluctuations in Gene Ontology (GO) terms related to IBD severity such as formate oxidation and nitrate utilization. By applying linear regression we determined genes and proteins related to clinical metrics, and observed many conserved taxonomic differences relevant to Crohn’s disease such as negative correlation of Faecalibacterium and positive correlation of Escherichia to fecal calprotectin. Despite consistent genera associations, the specific genes correlated with these metrics were almost entirely different between the two data types. Unique protein-level functional changes were observed relating to clinical markers, and the metaproteome revealed unique functional relationships not seen in the metagenome. These relationships include a previously established connection between urease enzymes, amino acid metabolism and local inflammation. This proof-of-concept metagenomic-metaproteomic approach prompts further investigation of the metaproteome and its relations to the metagenome in larger cohorts.

### Load all Dependencies

In [4]:
# Initializes the notebook with inline display
%matplotlib inline

from os import mkdir
import os
import copy
from os.path import abspath, join as pjoin, exists
from shutil import copy2, move
from time import strftime, strptime
from numpy import nan, isnan, arange
from pandas import read_csv, Series, DataFrame
from IPython.display import Image
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

### Convert text files to Biom Files

In [5]:
#Convert summed MG table into biom format

#Convert tab-separated file to biom file
!biom convert -i ./Shotgun/Salmon_CPMs_Sum_per_date.txt \
-o ./Shotgun/MG_sums.biom \
-m ./LS_Metadata.txt \
--table-type="OTU table" --to-hdf5

In [6]:
#Convert MG table into biom format

#Convert tab-separated file to biom file
!biom convert -i ./Shotgun/Salmon_CPMs_w0s.txt \
-o ./Shotgun/MG.biom \
-m ./LS_Metadata_triplicates.txt \
--table-type="OTU table" --to-hdf5

In [21]:
#Convert pDB commonreps table into biom format

#Convert tab-separated file to biom file
!biom convert -i ./NEW_DATA/Duplicate_Peptide_Filter/pDB_CSVs/NormalizedCommonReps.txt \
-o ./NEW_DATA/Duplicate_Peptide_Filter/pDB_CSVs/NormalizedCommonReps.biom \
-m ./LS_Metadata_triplicates.txt \
--table-type="OTU table" --to-hdf5

In [8]:
#Convert pDB commonreps avgs table into biom format

#Convert tab-separated file to biom file
!biom convert -i ./NEW_DATA/Duplicate_Peptide_Filter/pDB_CSVs/NormalizedDataAll_Avgs.txt \
-o ./NEW_DATA/Duplicate_Peptide_Filter/pDB_CSVs/NormalizedCommonReps_avg.biom \
-m ./LS_Metadata.txt \
--table-type="OTU table" --to-hdf5

### Import all as Qiime2 artifacts

In [9]:
!qiime tools import \
  --input-path ./Shotgun/MG.biom \
  --type 'FeatureTable[Frequency]' \
  --output-path Shotgun_biom.qza

In [10]:
!qiime tools import \
  --input-path ./Shotgun/MG_sums.biom \
  --type 'FeatureTable[Frequency]' \
  --output-path Shotgun_sums_biom.qza

In [11]:
!qiime tools import \
  --input-path ./NEW_DATA/Duplicate_Peptide_Filter/pDB_CSVs/NormalizedCommonReps.biom \
  --type 'FeatureTable[Frequency]' \
  --output-path ./NormalizedCommonReps_biom.qza

In [12]:
!qiime tools import \
  --input-path ./NEW_DATA/Duplicate_Peptide_Filter/pDB_CSVs/NormalizedDataAll_Avgs.biom \
  --type 'FeatureTable[Frequency]' \
  --output-path ./Average_per_date_MP_biom.qza

### Feature table summarize

In [13]:
!qiime feature-table summarize \
  --i-table Shotgun_biom.qza \
  --o-visualization Shotgun_biom.qzv \
  --m-sample-metadata-file ./LS_Metadata_triplicates.txt 

[32mSaved Visualization to: Shotgun_biom.qzv[0m


In [14]:
!qiime feature-table summarize \
  --i-table ./NormalizedCommonReps_biom.qza \
  --o-visualization ./NormalizedCommonReps_biom.qzv \
  --m-sample-metadata-file ./LS_Metadata_triplicates.txt 

[32mSaved Visualization to: ./NormalizedCommonReps_biom.qzv[0m


In [15]:
!qiime feature-table summarize \
  --i-table ./Average_per_date_MP_biom.qza \
  --o-visualization ./Average_per_date_MP_biom.qzv \
  --m-sample-metadata-file ./LS_Metadata.txt 

[32mSaved Visualization to: ./Average_per_date_MP_biom.qzv[0m


In [16]:
!qiime feature-table summarize \
  --i-table ./Shotgun_sums_biom.qza \
  --o-visualization ./Shotgun_sums_biom.qzv \
  --m-sample-metadata-file ./LS_Metadata.txt 

[32mSaved Visualization to: ./Shotgun_sums_biom.qzv[0m


### Qiime 1 PCoAs

In [2]:
#Use Qiime1 for creating distance matrices for Procrustes and Mantel test

In [19]:
!beta_diversity_through_plots.py -i ./Shotgun/MG.biom -m ./LS_Metadata_triplicates.txt -o ./Qiime/PCoA_MG -p ./paramaters2.txt

  if rank(datamtx) != 2:
  if rank(datamtx) != 2:
  if rank(datamtx) != 2:


In [23]:
!beta_diversity_through_plots.py -i ./NEW_DATA/Duplicate_Peptide_Filter/pDB_CSVs/NormalizedCommonReps.biom -m ./LS_Metadata_triplicates.txt -o ./Qiime/PCoA_pDB_CommonReps -p ./paramaters2.txt

  if rank(datamtx) != 2:
  if rank(datamtx) != 2:
  if rank(datamtx) != 2:
  proportion_explained = eigvals / eigvals.sum()
Traceback (most recent call last):
  File "/Users/rhmills/miniconda3/envs/qiime1/bin/beta_diversity_through_plots.py", line 4, in <module>
    __import__('pkg_resources').run_script('qiime==1.9.1', 'beta_diversity_through_plots.py')
  File "/Users/rhmills/miniconda3/envs/qiime1/lib/python2.7/site-packages/pkg_resources/__init__.py", line 750, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/Users/rhmills/miniconda3/envs/qiime1/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1527, in run_script
    exec(code, namespace, namespace)
  File "/Users/rhmills/miniconda3/envs/qiime1/lib/python2.7/site-packages/qiime-1.9.1-py2.7.egg-info/scripts/beta_diversity_through_plots.py", line 153, in <module>
    main()
  File "/Users/rhmills/miniconda3/envs/qiime1/lib/python2.7/site-packages/qiime-1.9.1-py2.7.egg-info/scripts/beta_diver

#### Procrustes - Qiime 1 with triplicates

In [24]:
#Qiime1 MG vs CommonReps using bray-curtis distances
!compare_distance_matrices.py --method mantel -i ./Qiime/PCoA_MG2/bray_curtis_dm.txt,./Qiime/PCoA_pDB_CommonReps/bray_curtis_dm.txt -o ./Qiime/Procrustes/MG_pDB_CommonReps -n 999

In [25]:
!transform_coordinate_matrices.py -i ./Qiime/PCoA_MG/bray_curtis_pc.txt,./Qiime/PCoA_pDB_CommonReps/bray_curtis_pc.txt -r 999 -o ./Qiime/Procrustes/MG_pDB_CommonReps_out

In [26]:
!make_emperor.py -c -i ./Qiime/Procrustes/MG_pDB_CommonReps_out/ -o ./Qiime/Procrustes/MG_pDB_CommonReps_out/plots/ -m ./LS_Metadata_triplicates.txt

### Core metrics - Qiime2

In [4]:
#
!qiime diversity core-metrics \
  --i-table ./Shotgun_biom.qza \
    --p-sampling-depth 998500 \
--m-metadata-file ./LS_Metadata_triplicates.txt \
--output-dir core-metrics-results/MG

[32mSaved FeatureTable[Frequency] to: core-metrics-results/MG/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/MG/observed_otus_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/MG/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/MG/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: core-metrics-results/MG/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: core-metrics-results/MG/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: core-metrics-results/MG/jaccard_pcoa_results.qza[0m
[32mSaved PCoAResults to: core-metrics-results/MG/bray_curtis_pcoa_results.qza[0m
[32mSaved Visualization to: core-metrics-results/MG/jaccard_emperor.qzv[0m
[32mSaved Visualization to: core-metrics-results/MG/bray_curtis_emperor.qzv[0m


In [5]:
#
!qiime diversity core-metrics \
  --i-table ./NormalizedCommonReps_biom.qza \
    --p-sampling-depth 1704085 \
--m-metadata-file ./LS_Metadata_triplicates.txt \
--output-dir core-metrics-results/pDB_common

[32mSaved FeatureTable[Frequency] to: core-metrics-results/pDB_common/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/pDB_common/observed_otus_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/pDB_common/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/pDB_common/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: core-metrics-results/pDB_common/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: core-metrics-results/pDB_common/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: core-metrics-results/pDB_common/jaccard_pcoa_results.qza[0m
[32mSaved PCoAResults to: core-metrics-results/pDB_common/bray_curtis_pcoa_results.qza[0m
[32mSaved Visualization to: core-metrics-results/pDB_common/jaccard_emperor.qzv[0m
[32mSaved Visualization to: core-metrics-results/pDB_common/bray_curtis_emperor.qzv[0m


In [42]:
#
!qiime diversity core-metrics \
  --i-table ./Average_per_date_MP_biom.qza \
    --p-sampling-depth 3188991 \
--m-metadata-file ./LS_Metadata.txt \
--output-dir core-metrics-results/pDB_averages

[32mSaved FeatureTable[Frequency] to: core-metrics-results/pDB_averages/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/pDB_averages/observed_otus_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/pDB_averages/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/pDB_averages/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: core-metrics-results/pDB_averages/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: core-metrics-results/pDB_averages/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: core-metrics-results/pDB_averages/jaccard_pcoa_results.qza[0m
[32mSaved PCoAResults to: core-metrics-results/pDB_averages/bray_curtis_pcoa_results.qza[0m
[32mSaved Visualization to: core-metrics-results/pDB_averages/jaccard_emperor.qzv[0m
[32mSaved Visualization to: core-metrics-results/pDB_averages/bray_curtis_emperor.qzv[0m


In [46]:
#
!qiime diversity core-metrics \
  --i-table ./Shotgun_sums_biom.qza \
    --p-sampling-depth 2605204 \
--m-metadata-file ./LS_Metadata.txt \
--output-dir core-metrics-results/Shotgun_sums

[32mSaved FeatureTable[Frequency] to: core-metrics-results/Shotgun_sums/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/Shotgun_sums/observed_otus_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/Shotgun_sums/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/Shotgun_sums/evenness_vector.qza[0m
[32mSaved DistanceMatrix to: core-metrics-results/Shotgun_sums/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: core-metrics-results/Shotgun_sums/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: core-metrics-results/Shotgun_sums/jaccard_pcoa_results.qza[0m
[32mSaved PCoAResults to: core-metrics-results/Shotgun_sums/bray_curtis_pcoa_results.qza[0m
[32mSaved Visualization to: core-metrics-results/Shotgun_sums/jaccard_emperor.qzv[0m
[32mSaved Visualization to: core-metrics-results/Shotgun_sums/bray_curtis_emperor.qzv[0m


### Beta group significance - Qiime2

In [17]:
!qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/MG/bray_curtis_distance_matrix.qza \
  --m-metadata-file ./LS_Metadata_triplicates.txt \
  --m-metadata-column Inflammation_State \
  --o-visualization core-metrics-results/MG/bray_curtis-Inflammation-significance.qzv \
  --p-pairwise

[32mSaved Visualization to: core-metrics-results/MG/bray_curtis-Inflammation-significance.qzv[0m


In [18]:
!qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/pDB_common/bray_curtis_distance_matrix.qza \
  --m-metadata-file ./LS_Metadata_triplicates.txt \
  --m-metadata-column Inflammation_State \
  --o-visualization core-metrics-results/pDB_common/bray_curtis-Inflammation-significance.qzv \
  --p-pairwise

[32mSaved Visualization to: core-metrics-results/pDB_common/bray_curtis-Inflammation-significance.qzv[0m


In [44]:
!qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/pDB_averages/bray_curtis_distance_matrix.qza \
  --m-metadata-file ./LS_Metadata.txt \
  --m-metadata-column Inflammation_State \
  --o-visualization core-metrics-results/pDB_averages/bray_curtis-Inflammation-significance.qzv \
  --p-pairwise

[32mSaved Visualization to: core-metrics-results/pDB_averages/bray_curtis-Inflammation-significance.qzv[0m


In [47]:
!qiime diversity beta-group-significance \
  --i-distance-matrix core-metrics-results/Shotgun_sums/bray_curtis_distance_matrix.qza \
  --m-metadata-file ./LS_Metadata.txt \
  --m-metadata-column Inflammation_State \
  --o-visualization core-metrics-results/Shotgun_sums/bray_curtis-Inflammation-significance.qzv \
  --p-pairwise

[32mSaved Visualization to: core-metrics-results/Shotgun_sums/bray_curtis-Inflammation-significance.qzv[0m
