# 다시 fastq.gz부터

1st, 2nd 파일이 있는데 무슨차이인지는 모르겠지만 메타데이터로 구분해서 두가지 다 해본다.

> 모든 결과는 output 폴더에 넣는다.


# qza 파일 만들기

In [1]:
!qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path 1st_manifest \
  --output-path output/paired-end-demux-1.qza \
  --input-format PairedEndFastqManifestPhred33V2

[32mImported 1st_manifest as PairedEndFastqManifestPhred33V2 to output/paired-end-demux-1.qza[0m


In [2]:
!qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path 2nd_manifest \
  --output-path output/paired-end-demux-2.qza \
  --input-format PairedEndFastqManifestPhred33V2

[32mImported 2nd_manifest as PairedEndFastqManifestPhred33V2 to output/paired-end-demux-2.qza[0m


In [3]:
!qiime demux summarize \
  --i-data output/paired-end-demux-1.qza \
  --o-visualization output/demux_seqs_1.qzv

[32mSaved Visualization to: output/demux_seqs_1.qzv[0m


In [4]:
!qiime demux summarize \
  --i-data output/paired-end-demux-2.qza \
  --o-visualization output/demux_seqs_2.qzv

[32mSaved Visualization to: output/demux_seqs_2.qzv[0m


In [5]:
from qiime2 import Visualization
Visualization.load('output/demux_seqs.qzv')

In [6]:
Visualization.load('output/demux_seqs_1.qzv')

In [7]:
Visualization.load('output/demux_seqs_2.qzv')

Reverse Reads 의 quality score값이 이상하다. 어떻게 처리해야하는지 찾아보자

# 그렇다면 일단 Single-end로 간다


In [34]:
!qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path single_end_manifest \
  --output-path output/single_end_demux.qza \
  --input-format SingleEndFastqManifestPhred33V2

[32mImported single_end_manifest as SingleEndFastqManifestPhred33V2 to output/single_end_demux.qza[0m


In [35]:
!qiime demux summarize \
  --i-data output/single_end_demux.qza \
  --o-visualization output/demux_single.qzv

[32mSaved Visualization to: output/demux_single.qzv[0m


In [36]:
Visualization.load('output/demux_single.qzv')

- Here the quality seems relatively low in the first few bases and
- We’ll therefore trim the first 6 bases from each sequence and truncate the sequences at 270 bases.

# Sequence quality control

using dada2

In [38]:
!qiime dada2 denoise-single \
  --i-demultiplexed-seqs output/single_end_demux.qza \
  --p-trim-left 6 \
  --p-trunc-len 270 \
  --p-n-threads 0 \
  --o-representative-sequences output/rep_seqs_dada2.qza \
  --o-table output/table_dada2.qza \
  --o-denoising-stats output/stats_dada2.qza

[32mSaved FeatureTable[Frequency] to: output/table_dada2.qza[0m
[32mSaved FeatureData[Sequence] to: output/rep_seqs_dada2.qza[0m
[32mSaved SampleData[DADA2Stats] to: output/stats_dada2.qza[0m


We can also review the denoising statistics using the qiime metadata tabulate command.

In [39]:
!qiime metadata tabulate \
  --m-input-file output/stats_dada2.qza  \
  --o-visualization output/stats_dada2.qzv

[32mSaved Visualization to: output/stats_dada2.qzv[0m


In [40]:
Visualization.load('output/stats_dada2.qzv')

## Feature table summary

After we finish denoising the data, we can check the results by looking at the summary of the feature table. This will provide us with the counts associated with each sequence and each feature, as well as other useful plots and metrics.

In [41]:
!qiime feature-table summarize \
  --i-table output/table_dada2.qza \
  --m-sample-metadata-file metadata.tsv \
  --o-visualization output/table_dada2.qzv

[32mSaved Visualization to: output/table_dada2.qzv[0m


In [42]:
Visualization.load('output/table_dada2.qzv')

# Generating a phylogenetic tree for diversity analysis

In [43]:
!qiime fragment-insertion sepp \
  --i-representative-sequences output/rep_seqs_dada2.qza \
  --o-tree output/tree.qza \
  --o-placements output/tree_placements.qza \
  --p-threads 8  # update to a higher number if you can

[32mSaved Phylogeny[Rooted] to: output/tree.qza[0m
[32mSaved Placements to: output/tree_placements.qza[0m


# Alpha Rarefaction and Selecting a Rarefaction Depth

Current best practices suggest the use of rarefaction, a normalization via sub-sampling without replacement. Rarefaction occurs in two steps: first, samples which are below the rarefaction depth are filtered out of the feature table. Then, all remaining samples are subsampled without replacement to get to the specified sequencing depth. 

In [44]:
!qiime diversity alpha-rarefaction \
  --i-table output/table_dada2.qza \
  --m-metadata-file metadata.tsv \
  --o-visualization output/alpha_rarefaction_curves.qzv \
  --p-min-depth 10 \
  --p-max-depth 4970

[32mSaved Visualization to: output/alpha_rarefaction_curves.qzv[0m


In [45]:
Visualization.load('output/alpha_rarefaction_curves.qzv')

# Diversity analysis
The first step in hypothesis testing in microbial ecology is typically to look at within- (alpha) and between-sample (beta) diversity. 

In [60]:
!qiime diversity core-metrics-phylogenetic \
  --i-table output/table_dada2.qza \
  --i-phylogeny output/tree.qza \
  --m-metadata-file metadata.tsv \
  --p-sampling-depth 2000 \
  --output-dir output/core-metrics-results

[32mSaved FeatureTable[Frequency] to: output/core-metrics-results/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] % Properties('phylogenetic') to: output/core-metrics-results/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: output/core-metrics-results/observed_otus_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: output/core-metrics-results/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: output/core-metrics-results/evenness_vector.qza[0m
[32mSaved DistanceMatrix % Properties('phylogenetic') to: output/core-metrics-results/unweighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix % Properties('phylogenetic') to: output/core-metrics-results/weighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: output/core-metrics-results/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: output/core-metrics-results/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: output/core-metrics-results/unwe

## Alpha diversity

Alpha diversity asks whether the distribution of features within a sample (or groups of samples) differs between different conditions.

In [61]:
!qiime diversity alpha-group-significance \
  --i-alpha-diversity output/core-metrics-results/faith_pd_vector.qza \
  --m-metadata-file metadata.tsv \
  --o-visualization output/core-metrics-results/faiths_pd_statistics.qzv

[32mSaved Visualization to: output/core-metrics-results/faiths_pd_statistics.qzv[0m


In [62]:
Visualization.load('output/core-metrics-results/faiths_pd_statistics.qzv')

# Beta diversity
Next, we’ll compare the structure of the microbiome communities using beta diversity.

In [63]:
!qiime diversity beta-group-significance \
  --i-distance-matrix output/core-metrics-results/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata.tsv \
  --m-metadata-column cisplatine \
  --o-visualization output/core-metrics-results/unweighted-unifrac-donor-significance.qzv

[32mSaved Visualization to: output/core-metrics-results/unweighted-unifrac-donor-significance.qzv[0m


In [64]:
Visualization.load('output/core-metrics-results/unweighted-unifrac-donor-significance.qzv')

In [65]:
!qiime diversity beta-group-significance \
  --i-distance-matrix output/core-metrics-results/weighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata.tsv \
  --m-metadata-column cisplatine \
  --o-visualization output/core-metrics-results/weighted-unifrac-donor-significance.qzv

[32mSaved Visualization to: output/core-metrics-results/weighted-unifrac-donor-significance.qzv[0m


In [66]:
Visualization.load('output/core-metrics-results/weighted-unifrac-donor-significance.qzv')

# Taxonomic classification

For this analysis, we’ll use a pre-trained naive Bayes machine-learning classifier that was trained to differentiate taxa present in the 99% Greengenes 13_8 reference set trimmed to 250 bp of the V4 hypervariable region

In [53]:
!wget \
  -O "gg-13-8-99-515-806-nb-classifier.qza" \
  "https://data.qiime2.org/2019.7/common/gg-13-8-99-515-806-nb-classifier.qza"

--2019-09-16 22:20:00--  https://data.qiime2.org/2019.7/common/gg-13-8-99-515-806-nb-classifier.qza
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/common/gg-13-8-99-515-806-nb-classifier.qza [following]
--2019-09-16 22:20:01--  https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/common/gg-13-8-99-515-806-nb-classifier.qza
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.217.192
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.217.192|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28373760 (27M) [application/x-www-form-urlencoded]
Saving to: `gg-13-8-99-515-806-nb-classifier.qza'


2019-09-16 22:20:07 (5.15 MB/s) - `gg-13-8-99-515-806-nb-classifier.qza' saved [28373760/28373760]



In [67]:
!qiime feature-classifier classify-sklearn \
  --i-reads output/rep_seqs_dada2.qza  \
  --i-classifier gg-13-8-99-515-806-nb-classifier.qza \
  --o-classification output/taxonomy.qza

[32mSaved FeatureData[Taxonomy] to: output/taxonomy.qza[0m


Now, let’s review the taxonomy associated with the sequences using the qiime metadata tabulate method.

In [68]:
!qiime metadata tabulate \
  --m-input-file output/taxonomy.qza \
  --o-visualization output/taxonomy.qzv

[32mSaved Visualization to: output/taxonomy.qzv[0m


Let’s also tabulate the representative sequences (FeatureData[Sequence]). Tabulating the representative sequences will allow us to see the sequence assigned to the identifier and interactively blast the sequence against the NCBI database.

In [69]:
!qiime feature-table tabulate-seqs \
  --i-data output/rep_seqs_dada2.qza \
  --o-visualization output/dada2_rep_set.qzv

[32mSaved Visualization to: output/dada2_rep_set.qzv[0m


# Taxonomy barchart
Since we saw a difference in diversity in this dataset, we may want to look at the taxonomic composition of these samples. To visualize this, we will build a taxonomic barchart of the samples we analyzed in the diversity dataset.


Before doing this, we will first filter out any samples with fewer features than our rarefaction threshold


In [74]:
!qiime feature-table filter-samples \
  --i-table output/table_dada2.qza \
  --p-min-frequency 2000 \
  --o-filtered-table output/table_2k.qza

[32mSaved FeatureTable[Frequency] to: output/table_2k.qza[0m


Now, let’s use the filtered table to build an interactive barplot of the taxonomy in each sample.


In [75]:
!qiime taxa barplot \
  --i-table output/table_2k.qza \
  --i-taxonomy output/taxonomy.qza \
  --m-metadata-file metadata.tsv \
  --o-visualization output/taxa_barplot.qzv

[32mSaved Visualization to: output/taxa_barplot.qzv[0m


In [76]:
Visualization.load('output/taxa_barplot.qzv')

6번 마우스는 데이터도 적고 상태도 이상한것 같은걸
