## Generate a DataFrame with the percentage of Unknown and Mitochondrial reads

For each of our compartments and references, we already have taxonomic annotations from previous steps. We now want to know what percentage of total reads were mitochondria (at level 5 of the taxonomic classification scheme), and what percentage were 'Unknown' at the most general level of within-domain classification (at level 1 of the taxonomic classification scheme).

**Requirements**
It is assumed that taxonomy bar plot qzv files have been generated and are stored in the top level of the ../output/ directory. It is further assumed that each is named based on *compartment*_*reference*_tbp.qzv so for example M_greengenes_tbp.qzv would indicate the set of mucus samples annotated according to Greengenes 13_8.

**Output**
Two .csv files will be output, with columns summarizing the percentage of mitochondria and Unknown bacteria according to each annotation scheme and compartment


In [1]:
import tempfile
from qiime2 import Visualization
import pandas as pd
import os

In [2]:
working_dir = os.path.abspath('../')

In [3]:
compartments = ['M', 'T', 'S']
references = ['greengenes', 'silva', 'greengenes_metaxa2',
              'silva_metaxa2']

In [4]:
proportion_unassigned = []
references1 = []
compartments1 = []
proportion_mitochondria = []
references5 = []
compartments5 = []
for reference in references:
    for compartment in compartments:
        with tempfile.TemporaryDirectory() as temp_dir:
            dataset = compartment + '_' + reference
            tbp_path = working_dir + '/output/' + dataset + '_tbp.qzv'
            tbp = Visualization.load(tbp_path)
            tbp.export_data(temp_dir)
            df1 = pd.read_csv(temp_dir + '/level-1.csv')
            df1['total'] = df1.sum(axis = 1)
            df1['compartment'] = compartment
            df1['reference'] = reference
            proportion_unassigned.extend(list(df1['Unassigned'] / df1['total']))
            references1.extend(list(df1['reference']))
            compartments1.extend(df1['compartment'])
            df5 = pd.read_csv(temp_dir + '/level-5.csv')
            df5['total'] = df5.sum(axis = 1)
            df5['compartment'] = compartment
            df5['reference'] = reference
            if 'greengenes' in reference:
                proportion_mitochondria.extend(list(df5['k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rickettsiales;f__mitochondria'] / df5['total']))
            elif 'silva' in reference:
                proportion_mitochondria.extend(list(df5['D_0__Bacteria;D_1__Proteobacteria;D_2__Alphaproteobacteria;D_3__Rickettsiales;D_4__Mitochondria'] / df5['total']))
            references5.extend(list(df5['reference']))
            compartments5.extend(df5['compartment'])

#### Export results to lvl1.csv and lvl5.csv files

In [5]:
lvl1 = list(zip(proportion_unassigned, references1, compartments1))
lvl1_df = pd.DataFrame(lvl1, columns = ['proportion unassigned', 'reference taxonomy', 'tissue compartment'])
lvl1_df.to_csv(working_dir + '/output/lvl1.csv')

In [6]:
lvl5 = list(zip(proportion_mitochondria, references5, compartments5))
lvl5_df = pd.DataFrame(lvl5, columns = ['proportion mitochondria', 'reference taxonomy', 'tissue compartment'])
lvl5_df.to_csv(working_dir + '/output/lvl5.csv')

#### Show the final results as HTML

In [7]:
from IPython.core.display import display
print("Proportion Unassigned")
display(lvl1_df)
print("Proportion Mitochondria")
display(lvl5_df)

Proportion Unassigned


Unnamed: 0,proportion unassigned,reference taxonomy,tissue compartment
0,0.228988,greengenes,M
1,0.558824,greengenes,M
2,0.931076,greengenes,M
3,0.021456,greengenes,M
4,0.082127,greengenes,M
...,...,...,...
5083,0.001788,silva_metaxa2,S
5084,0.000000,silva_metaxa2,S
5085,0.000826,silva_metaxa2,S
5086,0.000000,silva_metaxa2,S


Proportion Mitochondria


Unnamed: 0,proportion mitochondria,reference taxonomy,tissue compartment
0,0.000000,greengenes,M
1,0.000000,greengenes,M
2,0.000038,greengenes,M
3,0.000000,greengenes,M
4,0.000000,greengenes,M
...,...,...,...
5083,0.025550,silva_metaxa2,S
5084,0.363636,silva_metaxa2,S
5085,0.002811,silva_metaxa2,S
5086,0.029424,silva_metaxa2,S
