시스플라틴은 항암제로 DNA에 결합해 합성을 저해해 암세포의 분화를 막는다. 시스플라틴을 먹인 쥐의 장내 세균 비교 분석.


# 1. Demultiplex

`manifest`. `fastq` 파일로 분석에 필요한 `demux.qza`을 만든다.

In [10]:
!qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path single_end_manifest \
  --output-path output/demux.qza \
  --input-format SingleEndFastqManifestPhred33V2

[32mImported single_end_manifest as SingleEndFastqManifestPhred33V2 to output/demux.qza[0m


In [11]:
!qiime demux summarize \
  --i-data output/demux.qza \
  --o-visualization output/demux.qzv

[32mSaved Visualization to: output/demux.qzv[0m


In [12]:
from qiime2 import Visualization
Visualization.load('output/demux_single.qzv')

- 5번째 염기까지는 값이 낮다.
- 따라서 6번째 염기부터 270 염기까지만 사용할 것이다.

# 2. Denoising

서열의 품질관리를 `DADA2`를 사용해 진행한다. `p-trim-left` 값과 `p-trunc-len`값은 위의 결과에서 결정한다.

In [13]:
!qiime dada2 denoise-single \
  --i-demultiplexed-seqs output/demux.qza \
  --p-trim-left 6 \
  --p-trunc-len 270 \
  --p-n-threads 0 \
  --o-representative-sequences output/rep_seqs.qza \
  --o-table output/table.qza \
  --o-denoising-stats output/stats.qza

[32mSaved FeatureTable[Frequency] to: output/table.qza[0m
[32mSaved FeatureData[Sequence] to: output/rep_seqs.qza[0m
[32mSaved SampleData[DADA2Stats] to: output/stats.qza[0m


## 2.1. Denoising stats

아래 명령어를 통해 Denoising 결과에 대한 통계값을 확인 할 수 있다.

In [14]:
!qiime metadata tabulate \
  --m-input-file output/stats.qza  \
  --o-visualization output/stats.qzv

[32mSaved Visualization to: output/stats.qzv[0m


In [15]:
Visualization.load('output/stats.qzv')

6번째 마우스의 경우 데이터가 너무 적은 것 같다. 대장의 경우 9배 정도 차이를 보인다.


## 2.2. Feature table 요약

After we finish denoising the data, we can check the results by looking at the summary of the feature table. This will provide us with the counts associated with each sequence and each feature, as well as other useful plots and metrics.

In [16]:
!qiime feature-table summarize \
  --i-table output/table.qza \
  --m-sample-metadata-file metadata.tsv \
  --o-visualization output/table.qzv

[32mSaved Visualization to: output/table.qzv[0m


In [18]:
Visualization.load('output/table.qzv')

Sampling Depth가 4989일때 Retained 24,945 (45.28%) features in 5 (50.00%) samples

# 3. Analysis

## 3.1. Generating a phylogenetic tree for diversity analysis

In [19]:
!qiime fragment-insertion sepp \
  --i-representative-sequences output/rep_seqs.qza \
  --o-tree output/tree.qza \
  --o-placements output/tree_placements.qza \
  --p-threads 8  # update to a higher number if you can

[32mSaved Phylogeny[Rooted] to: output/tree.qza[0m
[32mSaved Placements to: output/tree_placements.qza[0m


## 3.2. Alpha Rarefaction and Selecting a Rarefaction Depth

Current best practices suggest the use of rarefaction, a normalization via sub-sampling without replacement. Rarefaction occurs in two steps: first, samples which are below the rarefaction depth are filtered out of the feature table. Then, all remaining samples are subsampled without replacement to get to the specified sequencing depth. 

In [20]:
!qiime diversity alpha-rarefaction \
  --i-table output/table.qza \
  --m-metadata-file metadata.tsv \
  --o-visualization output/alpha_rarefaction_curves.qzv \
  --p-max-depth 4989

[32mSaved Visualization to: output/alpha_rarefaction_curves.qzv[0m


In [21]:
Visualization.load('output/alpha_rarefaction_curves.qzv')

## 3.3. Core metric anaylsis

Diversity analysis
The first step in hypothesis testing in microbial ecology is typically to look at within- (alpha) and between-sample (beta) diversity. 

In [23]:
!qiime diversity core-metrics-phylogenetic \
  --i-table output/table.qza \
  --i-phylogeny output/tree.qza \
  --m-metadata-file metadata.tsv \
  --p-sampling-depth 1500 \
  --output-dir output/core_metrics_results

[32mSaved FeatureTable[Frequency] to: output/core_metrics_results/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] % Properties('phylogenetic') to: output/core_metrics_results/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: output/core_metrics_results/observed_otus_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: output/core_metrics_results/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: output/core_metrics_results/evenness_vector.qza[0m
[32mSaved DistanceMatrix % Properties('phylogenetic') to: output/core_metrics_results/unweighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix % Properties('phylogenetic') to: output/core_metrics_results/weighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: output/core_metrics_results/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: output/core_metrics_results/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: output/core_metrics_results/unwe

## 3.4. Alpha diversity

Alpha diversity asks whether the distribution of features within a sample (or groups of samples) differs between different conditions.

In [24]:
!qiime diversity alpha-group-significance \
  --i-alpha-diversity output/core_metrics_results/faith_pd_vector.qza \
  --m-metadata-file metadata.tsv \
  --o-visualization output/core_metrics_results/faiths_pd_statistics.qzv

[32mSaved Visualization to: output/core_metrics_results/faiths_pd_statistics.qzv[0m


In [62]:
Visualization.load('output/core-metrics-results/faiths_pd_statistics.qzv')

In [25]:
!qiime diversity alpha-group-significance \
  --i-alpha-diversity output/core_metrics_results/observed_otus_vector.qza \
  --m-metadata-file metadata.tsv \
  --o-visualization output/core_metrics_results/observed_otus_vector.qzv

[32mSaved Visualization to: output/core_metrics_results/observed_otus_vector.qzv[0m


In [27]:
Visualization.load('output/core_metrics_results/observed_otus_vector.qzv')

# Beta diversity
Next, we’ll compare the structure of the microbiome communities using beta diversity.

In [63]:
!qiime diversity beta-group-significance \
  --i-distance-matrix output/core-metrics-results/unweighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata.tsv \
  --m-metadata-column cisplatine \
  --o-visualization output/core-metrics-results/unweighted-unifrac-donor-significance.qzv

[32mSaved Visualization to: output/core-metrics-results/unweighted-unifrac-donor-significance.qzv[0m


In [64]:
Visualization.load('output/core-metrics-results/unweighted-unifrac-donor-significance.qzv')

In [65]:
!qiime diversity beta-group-significance \
  --i-distance-matrix output/core-metrics-results/weighted_unifrac_distance_matrix.qza \
  --m-metadata-file metadata.tsv \
  --m-metadata-column cisplatine \
  --o-visualization output/core-metrics-results/weighted-unifrac-donor-significance.qzv

[32mSaved Visualization to: output/core-metrics-results/weighted-unifrac-donor-significance.qzv[0m


In [66]:
Visualization.load('output/core-metrics-results/weighted-unifrac-donor-significance.qzv')

# Taxonomic classification

For this analysis, we’ll use a pre-trained naive Bayes machine-learning classifier that was trained to differentiate taxa present in the 99% Greengenes 13_8 reference set trimmed to 250 bp of the V4 hypervariable region

In [53]:
!wget \
  -O "gg-13-8-99-515-806-nb-classifier.qza" \
  "https://data.qiime2.org/2019.7/common/gg-13-8-99-515-806-nb-classifier.qza"

--2019-09-16 22:20:00--  https://data.qiime2.org/2019.7/common/gg-13-8-99-515-806-nb-classifier.qza
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/common/gg-13-8-99-515-806-nb-classifier.qza [following]
--2019-09-16 22:20:01--  https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/common/gg-13-8-99-515-806-nb-classifier.qza
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.217.192
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.217.192|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28373760 (27M) [application/x-www-form-urlencoded]
Saving to: `gg-13-8-99-515-806-nb-classifier.qza'


2019-09-16 22:20:07 (5.15 MB/s) - `gg-13-8-99-515-806-nb-classifier.qza' saved [28373760/28373760]



In [67]:
!qiime feature-classifier classify-sklearn \
  --i-reads output/rep_seqs_dada2.qza  \
  --i-classifier gg-13-8-99-515-806-nb-classifier.qza \
  --o-classification output/taxonomy.qza

[32mSaved FeatureData[Taxonomy] to: output/taxonomy.qza[0m


Now, let’s review the taxonomy associated with the sequences using the qiime metadata tabulate method.

In [68]:
!qiime metadata tabulate \
  --m-input-file output/taxonomy.qza \
  --o-visualization output/taxonomy.qzv

[32mSaved Visualization to: output/taxonomy.qzv[0m


Let’s also tabulate the representative sequences (FeatureData[Sequence]). Tabulating the representative sequences will allow us to see the sequence assigned to the identifier and interactively blast the sequence against the NCBI database.

In [69]:
!qiime feature-table tabulate-seqs \
  --i-data output/rep_seqs_dada2.qza \
  --o-visualization output/dada2_rep_set.qzv

[32mSaved Visualization to: output/dada2_rep_set.qzv[0m


# Taxonomy barchart
Since we saw a difference in diversity in this dataset, we may want to look at the taxonomic composition of these samples. To visualize this, we will build a taxonomic barchart of the samples we analyzed in the diversity dataset.


Before doing this, we will first filter out any samples with fewer features than our rarefaction threshold


In [74]:
!qiime feature-table filter-samples \
  --i-table output/table_dada2.qza \
  --p-min-frequency 2000 \
  --o-filtered-table output/table_2k.qza

[32mSaved FeatureTable[Frequency] to: output/table_2k.qza[0m


Now, let’s use the filtered table to build an interactive barplot of the taxonomy in each sample.


In [75]:
!qiime taxa barplot \
  --i-table output/table_2k.qza \
  --i-taxonomy output/taxonomy.qza \
  --m-metadata-file metadata.tsv \
  --o-visualization output/taxa_barplot.qzv

[32mSaved Visualization to: output/taxa_barplot.qzv[0m


In [76]:
Visualization.load('output/taxa_barplot.qzv')

6번 마우스는 데이터도 적고 상태도 이상한것 같은걸


In [38]:
!qiime feature-table heatmap \
    --i-table output/table.qza \
    --m-metadata-file metadata.tsv \
    --m-metadata-column cisplatine \
    --o-visualization output/heatmap.qzv

[32mSaved Visualization to: output/heatmap.qzv[0m


In [39]:
Visualization.load('output/heatmap.qzv')