# Gut Microbiome of HIV+ Individuals in Rural Appalachia

In [1]:
!qiime info

[32mSystem versions[0m
Python version: 3.8.12
QIIME 2 release: 2021.11
QIIME 2 version: 2021.11.0
q2cli version: 2021.11.0
[32m
Installed plugins[0m
alignment: 2021.11.0
composition: 2021.11.0
cutadapt: 2021.11.0
dada2: 2021.11.0
deblur: 2021.11.0
demux: 2021.11.0
diversity: 2021.11.0
diversity-lib: 2021.11.0
emperor: 2021.11.0
feature-classifier: 2021.11.0
feature-table: 2021.11.0
fragment-insertion: 2021.11.0
gneiss: 2021.11.0
longitudinal: 2021.11.0
metadata: 2021.11.0
phylogeny: 2021.11.0
quality-control: 2021.11.0
quality-filter: 2021.11.0
sample-classifier: 2021.11.0
taxa: 2021.11.0
types: 2021.11.0
vsearch: 2021.11.0
[32m
Application config directory[0m
/Users/johnsterrett/mambaforge/envs/qiime2-2021.11/var/q2cli[0m
[32m
Getting help[0m
To get help with QIIME 2, visit https://qiime2.org[0m


In [2]:
import os
import sys
import subprocess

from qiime2 import Artifact, Visualization

## Initial processing
### Unzip files

`outputs = []
for zipped in os.listdir("hiv-fastqs/"):
    bashCommand = f"unzip hiv-fastqs/{zipped} -d hiv-fastqs/"
    process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
    output, error = process.communicate()
    outputs.append((output,error))
`    

`! rm hiv-fastqs/*.zip`

### Import data to qiime2 artifact

`! qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path hiv-fastqs/ \
--input-format CasavaOneEightSingleLanePerSampleDirFmt \
--output-path demux-paired-end.qza`

### Summarize the imported data

`! qiime demux summarize \
--i-data demux-paired-end.qza \
--o-visualization paired-end-summarized.qzv`

In [3]:
viz = Visualization.load("paired-end-summarized.qzv")
viz

**Thoughts** - forward

- Overall sequence quality looks awesome. 
- Looks like quality dips over the first 19 reads, which could be primers, so I'll just trim those off.
- Looks like it dips at the very end too. At bp 269, 25 percentile of PHRED is still >30, so I'll truncate there.

Trimming at 19 and 269 still gives us 250 bp - well more than we need, so that's good!

**Thoughts** - reverse
- Overall sequence quality looks good. 
- Not sure if the sequencing folks used reverse primers, so I'll trim at 19 again to be safe. Need to check in on this.
- Looks like it dips as it goes on. At bp 240, 50th percentile of PHRED starting to drop < 30, so I'll truncate there.

Trimming at 19 and 240 still gives us 221 bp - well more than we need, so that's good!

## Denoise using dada2

`! qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux-paired-end.qza \
--p-trim-left-f 19 \
--p-trunc-len-f 269 \
--p-trim-left-r 19 \
--p-trunc-len-r 240 \
--output-dir dada2-out \
--verbose`

In [9]:
! qiime feature-table summarize \
--i-table dada2-out/table.qza \
--m-sample-metadata-file metadata.txt \
--o-visualization dada2-out/table.qzv

[32mSaved Visualization to: dada2-out/table.qzv[0m
[0m

In [10]:
viz = Visualization.load("dada2-out/table.qzv")
viz

## SEPP fragment insertion for phylogenetic tree

In [None]:
! qiime fragment-insertion sepp \
--i-representative-sequences dada2-out/representative_sequences.qza \
--i-reference-database sepp-refs-silva-128.qza \
--o-tree insertion-tree.qza \
--o-placements insertion-placements.qza