# Dead man's teeth

## Part 1. Amplicon sequencing

In [6]:
!cat data/sample-metadata.tsv

#SampleID	BarcodeSequence	LinkerPrimerSequence	Type
calculus	ATCAGACACG	GTATTACCGCGGCTGCTGGCAC	Calculus
bone	ACGAGTGCGT	GTATTACCGCGGCTGCTGGCAC	Bone


In [8]:
print('Barcode + primer length: ', len('ATCAGACACGGTATTACCGCGGCTGCTGGCAC'))

Barcode + primer length:  32


### 1. Import data

In [3]:
!qiime tools import --type 'SampleData[SequencesWithQuality]' --input-path data/manifest.tsv --output-path data/sequences.qza   --input-format SingleEndFastqManifestPhred33V2

[32mImported data/manifest.tsv as SingleEndFastqManifestPhred33V2 to data/sequences.qza[0m


### 2. Demultiplexing and QC

In [4]:
!qiime demux summarize   --i-data data/sequences.qza   --o-visualization data/sequences.qzv

[32mSaved Visualization to: data/sequences.qzv[0m


### 3. Feature table construction

In [23]:
!qiime dada2 denoise-single --i-demultiplexed-seqs data/sequences.qza --p-trim-left 32 --p-trunc-len 140 --o-representative-sequences data/rep-seqs.qza --o-table data/table.qza --o-denoising-stats data/stats.qza

[32mSaved FeatureTable[Frequency] to: data/table.qza[0m
[32mSaved FeatureData[Sequence] to: data/rep-seqs.qza[0m
[32mSaved SampleData[DADA2Stats] to: data/stats.qza[0m


In [24]:
!qiime metadata tabulate --m-input-file data/stats.qza --o-visualization data/stats.qzv

[32mSaved Visualization to: data/stats.qzv[0m


### 4. Feature summaries

In [25]:
!qiime feature-table summarize --i-table data/table.qza --o-visualization data/table.qzv --m-sample-metadata-file data/sample-metadata.tsv

[32mSaved Visualization to: data/table.qzv[0m


In [26]:
!qiime feature-table tabulate-seqs  --i-data data/rep-seqs.qza   --o-visualization data/rep-seqs.qzv

[32mSaved Visualization to: data/rep-seqs.qzv[0m


### 5. Taxonomic analysis

In [27]:
!qiime feature-classifier classify-sklearn --i-classifier data/gg-13-8-99-nb-classifier.qza --i-reads data/rep-seqs.qza --o-classification data/taxonomy.qza

[32mSaved FeatureData[Taxonomy] to: data/taxonomy.qza[0m


In [28]:
!qiime metadata tabulate --m-input-file data/taxonomy.qza --o-visualization data/taxonomy.qzv

[32mSaved Visualization to: data/taxonomy.qzv[0m


In [29]:
!qiime taxa barplot --i-table data/table.qza --i-taxonomy data/taxonomy.qza --m-metadata-file data/sample-metadata.tsv --o-visualization data/taxa-bar-plots.qzv

[32mSaved Visualization to: data/taxa-bar-plots.qzv[0m


### 6. Bacterial teamwork

The three members of the red complex are:  

* Porphyromonas gingivalis  
* Tannerella forsythia  
* Treponema denticola  

## Part 2. Shotgun sequencing

### 1. Shotgun sequence data profiling


In [None]:
!metaphlan data/G12_assembly.fna --input_type fasta --nproc 2 > data/meta_output.txt

In [None]:
!merge_metaphlan_tables.py -o hmp/merged_profile.txt hmp/*_profile.txt

In [36]:
import pandas as pd

In [40]:
df = pd.read_csv('hmp/merged_profile.txt', skiprows=1, sep='\t')

In [42]:
print(list(df))
print(df.head())

['clade_name', 'NCBI_tax_id', 'SRS014494-Posterior_fornix_profile', 'SRS014476-Supragingival_plaque_profile', 'SRS014472-Buccal_mucosa_profile', 'SRS014470-Tongue_dorsum_profile', 'SRS014464-Anterior_nares_profile', 'SRS014459-Stool_profile']
                                          clade_name  \
0                                        k__Bacteria   
1                      k__Bacteria|p__Actinobacteria   
2    k__Bacteria|p__Actinobacteria|c__Actinobacteria   
3  k__Bacteria|p__Actinobacteria|c__Actinobacteri...   
4  k__Bacteria|p__Actinobacteria|c__Actinobacteri...   

                NCBI_tax_id  SRS014494-Posterior_fornix_profile  \
0                         2                               100.0   
1                  2|201174                                 0.0   
2             2|201174|1760                                 0.0   
3       2|201174|1760|85007                                 0.0   
4  2|201174|1760|85007|1653                                 0.0   

   SRS014476-Supr