# 01 - Generate ASVs using DADA2
## Pipeline Description
This is the 16s rRNA amplicon soil data from CeMiST, re-analyzed using QIIME2 to get Amplicon Sequence Variants (ASVs) for downstream analysis. The V3V4 region of bacterial 16s rRNA was amplified using these primers (the product should be around 464 bp): 
* _341F (5’-CCTACGGGNGGCWGCAG-3')_ : len 17
* _805R (5’-GACTACHVGGGTATCTAATCC-3')_ len 21

There are three runs, each containing a paired end FASTQ in this folders:
* ../data/raw_files/Psoil_1_L001-ds.be0c008785964521859130b2f2ead9be
* ../data/raw_files/Psoil_2_L001-ds.6255278a9f7448a0b3e7c48b80f6d25f
* ../data/raw_files/Psoil_3_L001-ds.e31e51f5248b4af4ad90603ebbf46e08

And the metadata in this files:
* ../data/raw_files/Pool1.barcodes
* ../data/raw_files/Pool2.barcodes
* ../data/raw_files/Pool3.barcodes

Here, Amplicon Sequence Variants was generated from raw FASTQ data using DADA2 in [QIIME2 version 2020.11](https://docs.qiime2.org/2020.11/). 

Analysis was conducted within Conda environment in NBC Shared machine.

## Steps: 
1. [Check input](#section1)
    - [ ] QC Raw Paired End FastQ 
    - [x] Build Metadata - make it compatible with QIIME2
    - [x] Import to QIIME2
2. [Demultiplexing](#section2)
    - [ ] QC Demultiplexed samples
    - [X] Run demultiplexing with cutadapt in QIIME2
3. [Denoising with DADA2](#section3)
    - [ ] QC denoised data 
    - [x] Run DADA2
4. [Merge outputs](#section4)
    - [x] Merge QZA files into 1

Notes: 
* _metadata in Pool3 suggested that the 16s run was mixed with other experiments. Will these affect DADA2 capabilites to denoise the samples?_
* _fastq files are mixed forward and reverse reads_

In [1]:
_341F = 'CCTACGGGNGGCWGCAG'
_805R = 'GACTACHVGGGTATCTAATCC'
x = 461 -len(_341F) - len(_805R)
x

423

In [1]:
# Load Library
import pandas as pd
from qiime2 import Artifact, Visualization
import os

import tempfile
from shutil import copyfile

In [3]:
from sys import executable 
print(executable)  

/home/WIN.DTU.DK/matinnu/miniconda3/envs/qiime2-2021.2/bin/python


<a id='section1'></a>
## Check Input
### Build metadata

In [13]:
# makedir
! mkdir ../data/metadata

# create demultiplexing barcodes containing barcodes + the first 5 nucleotides of the forward primer
pool = [1, 2, 3]
for i in pool:
    df_demux = pd.read_csv('../data/raw_files/Pool'+str(i)+'.barcodes', sep='\t')
    try:
        df_demux = df_demux.rename(columns={'ID ':'ID'})
    except:
        pass
    df_demux = pd.DataFrame({'#SampleID' : [i for i in df_demux['ID']],
                             'BarcodeSequence' : [i for num, i in enumerate(df_demux['#F-tag'])],
                             'Sample' : [i.split('_')[0] for i in df_demux['ID']]
                             })
    df_demux.to_csv('../data/metadata/psoil'+str(i)+'_metadata_fwd.tsv', sep='\t', index=False) #write into tsv file

# metadata is already available
[i for i in os.listdir('../data/metadata') if i.endswith('metadata_fwd.tsv')]

mkdir: cannot create directory ‘../data/metadata’: File exists


['psoil1_metadata_fwd.tsv',
 'psoil2_metadata_fwd.tsv',
 'psoil3_metadata_fwd.tsv']

In [6]:
# takes in average 5-6 minutes each
! mkdir ../data/qiime2
# import raw data into qza format
#path = '../data/raw_files/'
path = '../../clean_run/data/raw_files/'
path = [os.path.join(path, i) for i in os.listdir(path) if i.startswith('Psoil')]

# create manifest files
filepath = []
for i in path:
    file = os.listdir(i)
    try:
        file.remove('.ipynb_checkpoints')
    except:
        pass
    
    file = [os.path.join(i, x) for x in file]
    filepath.append(file)

for num, i in enumerate(path):
    try:
        os.rmdir(os.path.join(i, '.ipynb_checkpoints'))
    except:
        pass
    os.rename(filepath[num][0], os.path.join(i, 'forward.fastq.gz'))
    os.rename(filepath[num][1], os.path.join(i, 'reverse.fastq.gz'))
    
    # Import raw files into QIIME 
    ! qiime tools import --type MultiplexedPairedEndBarcodeInSequence --input-path {i} --input-format MultiplexedPairedEndBarcodeInSequenceDirFmt --output-path ../data/qiime2/Psoil{num+1}_PE.qza 
    os.rename(os.path.join(i, 'forward.fastq.gz'), filepath[num][0]) 
    os.rename(os.path.join(i, 'reverse.fastq.gz'), filepath[num][1]) 

[32mImported ../../clean_run/data/raw_files/Psoil_1_L001-ds.be0c008785964521859130b2f2ead9be as MultiplexedPairedEndBarcodeInSequenceDirFmt to ../data/qiime2/Psoil1_PE.qza[0m
[32mImported ../../clean_run/data/raw_files/Psoil_2_L001-ds.6255278a9f7448a0b3e7c48b80f6d25f as MultiplexedPairedEndBarcodeInSequenceDirFmt to ../data/qiime2/Psoil2_PE.qza[0m
[32mImported ../../clean_run/data/raw_files/Psoil_3_L001-ds.e31e51f5248b4af4ad90603ebbf46e08 as MultiplexedPairedEndBarcodeInSequenceDirFmt to ../data/qiime2/Psoil3_PE.qza[0m


<a id='section2'></a>
## Demultiplexing

In [7]:
# demultiplex PE based on metadata
def demux(path, metadata, out):
    ! mkdir {out}
    ! qiime cutadapt demux-paired \
        --i-seqs {path} \
        --m-forward-barcodes-file {metadata} \
        --m-forward-barcodes-column BarcodeSequence \
        --p-error-rate 0.1 \
        --p-minimum-length 150 \
        --p-mixed-orientation \
        --o-per-sample-sequences {out}/demux-0.1.qza \
        --o-untrimmed-sequences {out}/untrimmed-0.1.qza \
        --verbose \
        > {out}/cutadapt-0.1.log
    return

In [16]:
paths = [os.path.join('../data/qiime2', i) for i in os.listdir('../data/qiime2') if i.endswith('PE.qza')]
for num, i in enumerate(paths):
    metadata = '../data/metadata/psoil'+str(num+1)+'_metadata_fwd.tsv'
    out = i.replace('.qza', '')
    if os.path.isdir(out):
        print(out)
        pass
    else:
        demux(i, metadata, out)
        
#[       8=---] 00:26:18     6,318,906 reads  @    249.9 µs/read;   0.24 M reads/minute
#[------>8    ] 00:15:15     3,389,998 reads  @    270.2 µs/read;   0.22 M reads/minute
#[=8          ] 00:37:51     8,795,488 reads  @    258.2 µs/read;   0.23 M reads/minute
#[---->8      ] 00:22:08     4,747,122 reads  @    279.8 µs/read;   0.21 M reads/minute
#[-------->8  ] 00:19:56     6,869,491 reads  @    174.1 µs/read;   0.34 M reads/minute
#[------->8   ] 00:12:45     3,801,926 reads  @    201.4 µs/read;   0.30 M reads/minute

#[       8=---] 00:18:02     6,318,906 reads  @    171.3 µs/read;   0.35 M reads/minute
#[---------->8] 00:00:42       228,836 reads  @    185.9 µs/read;   0.32 M reads/minute
#[=8          ] 00:25:22     8,795,488 reads  @    173.1 µs/read;   0.35 M reads/minute
#[--------=8  ] 00:00:53       274,200 reads  @    194.0 µs/read;   0.31 M reads/minute
#[-------->8  ] 00:14:03     6,869,491 reads  @    122.8 µs/read;   0.49 M reads/minute
#[---------=8 ] 00:00:37       255,531 reads  @    145.6 µs/read;   0.41 M reads/minute

[       8=---] 00:18:02     6,318,906 reads  @    171.3 µs/read;   0.35 M reads/minute
[---------->8] 00:00:42       228,836 reads  @    185.9 µs/read;   0.32 M reads/minute
[=8          ] 00:25:22     8,795,488 reads  @    173.1 µs/read;   0.35 M reads/minute
[--------=8  ] 00:00:53       274,200 reads  @    194.0 µs/read;   0.31 M reads/minute
[-------->8  ] 00:14:03     6,869,491 reads  @    122.8 µs/read;   0.49 M reads/minute
[---------=8 ] 00:00:37       255,531 reads  @    145.6 µs/read;   0.41 M reads/minute


In [17]:
# summarize
for i in range(3):
    ! qiime demux summarize \
        --i-data ../data/qiime2/Psoil{i+1}_PE/demux-0.1.qza \
        --o-visualization ../data/qiime2/Psoil{i+1}_PE/demux-0.1.qzv

[32mSaved Visualization to: ../data/qiime2/Psoil1_PE/demux-0.1.qzv[0m
[32mSaved Visualization to: ../data/qiime2/Psoil2_PE/demux-0.1.qzv[0m
[32mSaved Visualization to: ../data/qiime2/Psoil3_PE/demux-0.1.qzv[0m


In [18]:
Visualization.load('../data/qiime2/Psoil3_PE/demux-0.1.qzv')

In [20]:
# export
for i in range(3):
    ! qiime tools export \
        --input-path ../data/qiime2/Psoil{i+1}_PE/demux-0.1.qza \
        --output-path ../data/qiime2/Psoil{i+1}_PE/raw_demux

[32mExported ../data/qiime2/Psoil1_PE/demux-0.1.qza as SingleLanePerSamplePairedEndFastqDirFmt to directory ../data/qiime2/Psoil1_PE/raw_demux[0m
[32mExported ../data/qiime2/Psoil2_PE/demux-0.1.qza as SingleLanePerSamplePairedEndFastqDirFmt to directory ../data/qiime2/Psoil2_PE/raw_demux[0m
[32mExported ../data/qiime2/Psoil3_PE/demux-0.1.qza as SingleLanePerSamplePairedEndFastqDirFmt to directory ../data/qiime2/Psoil3_PE/raw_demux[0m


In [4]:
def fix_mixed_orientation(path, primer, outdir, error_rate=0.1, minlength=150):
    '''
    Given a directory of demultiplexed FastQ files, re-orient and trim mixed orientation reads according to a given primer. 
    Utilizes QIIME2 version 20 above.
    import pandas as pd
    import tempfile
    import os
    from shutil import copyfile
    '''
   
    #error_rate = 0.1 
    #minlength = 150
    
    def make_manifest(path, out):
        manifest = sorted([i for i in os.listdir(path) if i.endswith('.fastq.gz')])
        ctr = 0
        df = pd.DataFrame(columns=['sample-id','forward-absolute-filepath','reverse-absolute-filepath'])
    
        for i in range(int(len(manifest)/2)):
            forward = manifest[ctr]
            reverse = manifest[ctr+1]
            sampleid = (manifest[ctr].rsplit('_', 4)[0])
            df.loc[i] = [sampleid, os.path.join(path, forward), os.path.join(path, reverse)]
            ctr = ctr+2
        df.to_csv(out, sep='\t', index=False)
        return

    def make_reorient_metadata(sampleid, fprimer, outpath):
        df = pd.DataFrame(columns=['#SampleID','BarcodeSequence'])
        df.loc[0] = [sampleid, fprimer]
        df.to_csv(outpath+'/reorient.tsv', sep='\t', index=False)
        return

    manifest = os.path.join(outdir, 'manifest.txt')
    make_manifest(path, manifest)
        
    outdir = os.path.join(outdir, 'reorient')
    ! mkdir {outdir}
    df = pd.read_csv(manifest, sep='\t')
    
    for i in range(len(df)):
        with tempfile.TemporaryDirectory() as tmp:
            print(df.loc[i, 'sample-id'])
            copyfile(df.loc[i, 'forward-absolute-filepath'], os.path.join(tmp, 'forward.fastq.gz'))
            copyfile(df.loc[i, 'reverse-absolute-filepath'], os.path.join(tmp, 'reverse.fastq.gz'))
            ! qiime tools import \
                --type 'MultiplexedPairedEndBarcodeInSequence' \
                --input-path {tmp} \
                --output-path {tmp}/paired-end-demux.qza \
                --input-format MultiplexedPairedEndBarcodeInSequenceDirFmt 
            make_reorient_metadata(df.loc[i, 'sample-id'], primer, tmp)
            path = os.path.join(tmp,'paired-end-demux.qza')
            metadata = os.path.join(tmp, 'reorient.tsv')
            sampleid = df.loc[i, 'sample-id']
            ! qiime cutadapt demux-paired \
                --i-seqs {path} \
                --m-forward-barcodes-file {metadata} \
                --m-forward-barcodes-column BarcodeSequence \
                --p-error-rate {error_rate} \
                --p-minimum-length {minlength} \
                --p-mixed-orientation \
                --o-per-sample-sequences {outdir+'/'+sampleid}-demux.qza \
                --o-untrimmed-sequences {outdir+'/'+sampleid}-untrimmed.qza \
                --verbose > {outdir+'/'+sampleid}-cutadapt.log
            #print(df.loc[i, 'sample-id'], os.listdir(tmp))
        # directory and contents have been removed
    return

mkdir: cannot create directory ‘../data/qiime2/Psoil1_PE/reorient’: File exists
S1_rep1
[32mImported /tmp/tmpc5e2htkm as MultiplexedPairedEndBarcodeInSequenceDirFmt to /tmp/tmpc5e2htkm/paired-end-demux.qza[0m
[  8<--------] 00:00:07        94,849 reads  @     83.3 µs/read;   0.72 M reads/minute
[ 8=---------] 00:00:05        65,766 reads  @     81.1 µs/read;   0.74 M reads/minute
S1_rep2
[32mImported /tmp/tmpfbn0z8bh as MultiplexedPairedEndBarcodeInSequenceDirFmt to /tmp/tmpfbn0z8bh/paired-end-demux.qza[0m
[    8=------] 00:00:14       182,017 reads  @     81.3 µs/read;   0.74 M reads/minute
[  8=--------] 00:00:09       114,274 reads  @     79.6 µs/read;   0.75 M reads/minute
S1_rep3
[32mImported /tmp/tmpqisxg8bc as MultiplexedPairedEndBarcodeInSequenceDirFmt to /tmp/tmpqisxg8bc/paired-end-demux.qza[0m
[   8<-------] 00:00:11       134,229 reads  @     81.9 µs/read;   0.73 M reads/minute
[  8<--------] 00:00:07        89,235 reads  @     82.2 µs/read;   0.73 M reads/minute
[    

In [5]:
for i in range(2):
    path = '../data/qiime2/Psoil'+str(i+2)+'_PE/raw_demux/'
    primer = 'CCTACGGGNGGCWGCAG'
    outdir = '../data/qiime2/Psoil'+str(i+2)+'_PE/'
    print(path)
    fix_mixed_orientation(path, primer, outdir)

../data/qiime2/Psoil2_PE/raw_demux/
P5_rep1
[32mImported /tmp/tmprg_w3hbq as MultiplexedPairedEndBarcodeInSequenceDirFmt to /tmp/tmprg_w3hbq/paired-end-demux.qza[0m
[    8=------] 00:00:15       183,934 reads  @     82.0 µs/read;   0.73 M reads/minute
[  8=--------] 00:00:09       113,843 reads  @     81.3 µs/read;   0.74 M reads/minute
P5_rep2
[32mImported /tmp/tmp3697aq7q as MultiplexedPairedEndBarcodeInSequenceDirFmt to /tmp/tmp3697aq7q/paired-end-demux.qza[0m
[       8=---] 00:00:25       309,088 reads  @     82.7 µs/read;   0.73 M reads/minute
[    8=------] 00:00:15       196,720 reads  @     80.4 µs/read;   0.75 M reads/minute
P5_rep3
[32mImported /tmp/tmpqnxomm9f as MultiplexedPairedEndBarcodeInSequenceDirFmt to /tmp/tmpqnxomm9f/paired-end-demux.qza[0m
[   8<-------] 00:00:11       135,587 reads  @     83.3 µs/read;   0.72 M reads/minute
[  8<--------] 00:00:07        87,705 reads  @     81.4 µs/read;   0.74 M reads/minute
P5_rep4
[32mImported /tmp/tmpmae9srju as Multipl

In [None]:
def clean_reverse_primer(path, primer):
    df = pd.read_csv(os.path.join(path, 'manifest.txt'), sep='\t')

    ! mkdir {path}/cleaned

    for i in df['sample-id']:
        print(i)
        with tempfile.TemporaryDirectory() as tmp:
            ! echo {path}/reorient/{i}-demux.qza
            ! qiime tools export \
                --input-path {path}/reorient/{i}-demux.qza \
                --output-path tmp
            ! fastqc tmp/*.fastq.gz -o {path}/reorient -q
            print('cutadapt')
            ! qiime cutadapt trim-paired \
                --i-demultiplexed-sequences {path}/reorient/{i}-demux.qza \
                --p-front-r {primer} \
                --p-discard-untrimmed \
                --p-minimum-length 150 \
                --p-error-rate 0.1 \
                --o-trimmed-sequences {path}/cleaned/{i}-trimmed.qza \
                --verbose > {path}/cleaned/{i}-cutadapt.log

        with tempfile.TemporaryDirectory() as tmp2:
            ! qiime tools export \
                --input-path {path}/cleaned/{i}-trimmed.qza \
                --output-path tmp2
            ! fastqc tmp2/*.fastq.gz -q -o {path}/cleaned -q
    return

for i in range(2):
    path = '../data/qiime2/Psoil'+str(i+2)+'_PE'
    primer = 'GACTACHVGGGTATCTAATCC'
    print(path)
    clean_reverse_primer(path, primer)

../data/qiime2/Psoil2_PE
mkdir: cannot create directory ‘../data/qiime2/Psoil2_PE/cleaned’: File exists
P5_rep1
../data/qiime2/Psoil2_PE/reorient/P5_rep1-demux.qza
[32mExported ../data/qiime2/Psoil2_PE/reorient/P5_rep1-demux.qza as SingleLanePerSamplePairedEndFastqDirFmt to directory tmp[0m
cutadapt
[   8=-------] 00:00:11       151,668 reads  @     77.8 µs/read;   0.77 M reads/minute
[32mExported ../data/qiime2/Psoil2_PE/cleaned/P5_rep1-trimmed.qza as SingleLanePerSamplePairedEndFastqDirFmt to directory tmp2[0m
P5_rep2
../data/qiime2/Psoil2_PE/reorient/P5_rep2-demux.qza
[32mExported ../data/qiime2/Psoil2_PE/reorient/P5_rep2-demux.qza as SingleLanePerSamplePairedEndFastqDirFmt to directory tmp[0m
cutadapt
[      8=----] 00:00:21       262,531 reads  @     80.5 µs/read;   0.75 M reads/minute
[32mExported ../data/qiime2/Psoil2_PE/cleaned/P5_rep2-trimmed.qza as SingleLanePerSamplePairedEndFastqDirFmt to directory tmp2[0m
P5_rep3
../data/qiime2/Psoil2_PE/reorient/P5_rep3-demux.qza


In [5]:
# merge
! qiime feature-table merge-seqs \
    --i-data ../data/qiime2/Psoil1_PE/cleaned/S6_rep1-trimmed.qza ../data/qiime2/Psoil1_PE/cleaned/S6_rep2-trimmed.qza \
    --o-merged-data ../data/tryout.qza

Usage: [34mqiime feature-table merge-seqs[0m [OPTIONS]

  Combines feature data objects which may or may not contain data for the
  same features. If different feature data is present for the same feature
  id in the inputs, the data from the first will be propagated to the
  result.

[1mInputs[0m:
  [34m[4m--i-data[0m ARTIFACTS... [32mList[FeatureData[Sequence]][0m
                         The collection of feature sequences to be merged.
                                                                    [35m[required][0m
[1mOutputs[0m:
  [34m[4m--o-merged-data[0m ARTIFACT [32mFeatureData[Sequence][0m
                         The resulting collection of feature sequences
                         containing all feature sequences provided. [35m[required][0m
[1mMiscellaneous[0m:
  [34m--output-dir[0m PATH      Output unspecified results to a directory
  [34m--verbose[0m / [34m--quiet[0m    Display verbose output to stdout and/or stderr
                       

<a id='section2'></a>
## Denoising

In [25]:
def denoise(path):
    ! qiime dada2 denoise-paired \
        --i-demultiplexed-seqs {path}/demux-0.1.qza \
        --p-trim-left-f 12 \
        --p-trim-left-r 29 \
        --p-trunc-len-f 200 \
        --p-trunc-len-r 250 \
        --p-n-threads 6 \
        --o-table {path}/table-0.1.qza \
        --o-representative-sequences {path}/rep-seqs-0.1.qza \
        --o-denoising-stats {path}/denoising-stats-0.1.qza \
        --verbose
    return

In [9]:
# sample 3 contains runs that are not part of the data, so we need to clean it up first
## create metadata for filtering
df_psoil3 = pd.read_csv('../data/metadata/psoil3_metadata_fwd.tsv', delimiter='\t')
sample = ['P9', 'Pmix', 'Neg']
df_psoil3 = df_psoil3[df_psoil3.Sample.isin(sample)]
df_psoil3.to_csv('../data/metadata/psoil3_metadata_fwd_filtered.tsv', sep='\t', index=False) #write into tsv file
## filtering using qiime2
! mv ../data/qiime2/Psoil3_PE/demux-0.1.qza ../data/qiime2/Psoil3_PE/demux-0.1-unfiltered.qza
! mv ../data/metadata/psoil3_metadata_fwd.tsv ../data/metadata/psoil3_metadata_fwd_unfiltered.tsv
! mv ../data/metadata/psoil3_metadata_fwd_filtered.tsv ../data/metadata/psoil3_metadata_fwd.tsv

In [None]:
## filtering using qiime2
! qiime demux filter-samples \
    --i-demux ./data/qiime2/Psoil3_PE/demux-0.1-unfiltered.qza \
    --m-metadata-file ../data/metadata/psoil3_metadata_fwd.tsv \
    --o-filtered-demux ../data/qiime2/Psoil3_PE/demux-0.1.qza \
    --verbose

In [30]:
# run denoising for all samples
paths = [os.path.join('../data/qiime2', i) for i in os.listdir('../data/qiime2') if os.path.isdir(os.path.join('../data/qiime2', i))]
paths = [i for i in paths if i.endswith('PE')]
for num, i in enumerate(paths):
    pe = os.path.join(i, 'demux-0.1.qza')
    if 'rep-seqs-0.1.qza' in os.listdir(i):
        print('already denoised', i)
        pass
    else:
        print('denoising', i)
        denoise(i)

already denoised ../data/qiime2/Psoil2_PE
already denoised ../data/qiime2/Psoil1_PE
already denoised ../data/qiime2/Psoil3_PE


In [22]:
# summarizing denoising results
# summarize
for i in range(3):   
    ! qiime feature-table summarize \
        --i-table ../data/qiime2/Psoil{i+1}_PE/table-0.1.qza \
        --o-visualization ../data/qiime2/Psoil{i+1}_PE/table-0.1.qzv \
        --m-sample-metadata-file ../data/metadata/psoil{i+1}_metadata_fwd.tsv

    ! qiime feature-table tabulate-seqs \
        --i-data ../data/qiime2/Psoil{i+1}_PE/rep-seqs-0.1.qza \
        --o-visualization ../data/qiime2/Psoil{i+1}_PE/rep-seqs-0.1.qzv

    ! qiime metadata tabulate \
        --m-input-file ../data/qiime2/Psoil{i+1}_PE/denoising-stats-0.1.qza \
        --o-visualization ../data/qiime2/Psoil{i+1}_PE/denoising-stats-0.1.qzv

[32mSaved Visualization to: qiime2/Psoil1_PE/table-0.1.qzv[0m
[32mSaved Visualization to: qiime2/Psoil1_PE/rep-seqs-0.1.qzv[0m
[32mSaved Visualization to: qiime2/Psoil1_PE/denoising-stats-0.1.qzv[0m
[32mSaved Visualization to: qiime2/Psoil2_PE/table-0.1.qzv[0m
[32mSaved Visualization to: qiime2/Psoil2_PE/rep-seqs-0.1.qzv[0m
[32mSaved Visualization to: qiime2/Psoil2_PE/denoising-stats-0.1.qzv[0m
[32mSaved Visualization to: qiime2/Psoil3_PE/table-0.1.qzv[0m
[32mSaved Visualization to: qiime2/Psoil3_PE/rep-seqs-0.1.qzv[0m
[32mSaved Visualization to: qiime2/Psoil3_PE/denoising-stats-0.1.qzv[0m


<a id='section4'></a>
## Merge Outputs

In [24]:
! qiime feature-table merge \
  --i-tables ../data/qiime2/Psoil1_PE/table-0.1.qza \
  --i-tables ../data/qiime2/Psoil2_PE/table-0.1.qza \
  --i-tables ../data/qiime2/Psoil3_PE/table-0.1.qza \
  --o-merged-table ../data/qiime2/table.qza

! qiime feature-table merge-seqs \
  --i-data ../data/qiime2/Psoil1_PE/rep-seqs-0.1.qza \
  --i-data ../data/qiime2/Psoil2_PE/rep-seqs-0.1.qza \
  --i-data ../data/qiime2/Psoil3_PE/rep-seqs-0.1.qza \
  --o-merged-data ../data/qiime2/rep-seqs.qza

[32mSaved FeatureTable[Frequency] to: qiime2/table.qza[0m
[32mSaved FeatureData[Sequence] to: qiime2/rep-seqs.qza[0m


In [32]:
df1 = pd.read_csv('../data/metadata/psoil1_metadata_fwd.tsv', sep='\t')
df2 = pd.read_csv('../data/metadata/psoil2_metadata_fwd.tsv', sep='\t')
df3 = pd.read_csv('../data/metadata/psoil3_metadata_fwd.tsv', sep='\t')
df = df1.append([df2, df3])
df.to_csv('../data/metadata/sample-metadata.tsv', sep='\t', index=False) #write into tsv file

In [33]:
! qiime feature-table summarize \
    --i-table ../data/qiime2/table.qza \
    --o-visualization ../data/qiime2/table.qzv \
    --m-sample-metadata-file ../data/metadata/sample-metadata.tsv

! qiime feature-table tabulate-seqs \
    --i-data ../data/qiime2/rep-seqs.qza \
    --o-visualization ../data/qiime2/rep-seqs.qzv

[32mSaved Visualization to: qiime2/table.qzv[0m
[32mSaved Visualization to: qiime2/rep-seqs.qzv[0m


In [31]:
Visualization.load('../data/qiime2/table.qzv')

In [32]:
Visualization.load('../data/qiime2/rep-seqs.qzv')

In [6]:
! biom add-metadata \
    -i ../../clean_run/data/qiime2/filtered/exported-feature-table/feature-table.biom \
    -o ../../clean_run/data/qiime2/filtered/exported-feature-table/mix.biom \
    --observation-metadata-fp ../../clean_run/data/qiime2/filtered/exported-feature-table/taxonomy.tsv \
    --observation-header OTUID,taxonomy,confidence

In [7]:
! biom convert \
    -i ../../clean_run/data/qiime2/filtered/exported-feature-table/mix.biom \
    -o ../../clean_run/data/qiime2/filtered/exported-feature-table/json.biom \
    --table-type="OTU table" --to-json