# Raw Data 정리

일단은 데이터 정제부터 해야할 것 같다. 

```
1L.2nd.fastq
1L_1.fastq
1L_1.qual
1L.2nd.fasta 
```

파일간의 차이점?

`1L.2nd.fastq` 과 `1L_1.fastq`는 동일한데 이름만 다르다. fastq은 완전하게 만들어졌다고 생각하고 `Fastq manifest`파일을 작성했다.

# qza 파일 만들기

qiime2 에서 사용되는 포멧으로 변경

In [1]:
!qiime tools import \
  --type 'SampleData[SequencesWithQuality]' \
  --input-path cisplatine_manifest \
  --output-path single-end-demux.qza \
  --input-format SingleEndFastqManifestPhred33V2

[32mImported cisplatine_manifest as SingleEndFastqManifestPhred33V2 to single-end-demux.qza[0m


In [2]:
!qiime demux summarize \
  --i-data single-end-demux.qza \
  --o-visualization demux_seqs.qzv

[32mSaved Visualization to: demux_seqs.qzv[0m


In [3]:
from qiime2 import Visualization
Visualization.load('demux_seqs.qzv')

quality score를 생각하면 140bp 이하로만 사용할 수 있을 것 같다.


# Sequence quality control

In [4]:
!qiime dada2 denoise-single \
  --i-demultiplexed-seqs single-end-demux.qza \
  --p-trunc-len 140 \
  --o-table dada2_table.qza \
  --o-representative-sequences dada2_rep_set.qza \
  --o-denoising-stats dada2_stats.qza

[32mSaved FeatureTable[Frequency] to: dada2_table.qza[0m
[32mSaved FeatureData[Sequence] to: dada2_rep_set.qza[0m
[32mSaved SampleData[DADA2Stats] to: dada2_stats.qza[0m


In [5]:
!qiime metadata tabulate \
  --m-input-file dada2_stats.qza  \
  --o-visualization dada2_stats.qzv

[32mSaved Visualization to: dada2_stats.qzv[0m


In [7]:
Visualization.load('dada2_stats.qzv')

# feature table


In [25]:
!qiime feature-table summarize \
  --i-table dada2_table.qza \
  --m-sample-metadata-file cisplatine_metadata.tsv \
  --o-visualization dada2_table.qzv

[32mSaved Visualization to: dada2_table.qzv[0m


In [26]:
Visualization.load('dada2_table.qzv')

# Generating a phylogenetic tree for diversity analysis

In [10]:
!qiime fragment-insertion sepp \
  --i-representative-sequences dada2_rep_set.qza \
  --o-tree tree.qza \
  --o-placements tree_placements.qza \
  --p-threads 8  # update to a higher number if you can

[32mSaved Phylogeny[Rooted] to: tree.qza[0m
[32mSaved Placements to: tree_placements.qza[0m


# Alpha Rarefaction and Selecting a Rarefaction Depth

In [None]:
!qiime diversity alpha-rarefaction \
  --i-table ./dada2_table.qza \
  --m-metadata-file ./metadata.tsv \
  --o-visualization ./alpha_rarefaction_curves.qzv \
  --p-min-depth 10 \
  --p-max-depth 4250

# Alpha and beta diversity analysis

In [12]:
!qiime diversity core-metrics-phylogenetic \
  --i-phylogeny rooted-tree.qza \
  --i-table dada2_table.qza \
  --p-sampling-depth 3000 \
  --m-metadata-file cisplatine_metadata.tsv \
  --output-dir core-metrics-results

[32mSaved FeatureTable[Frequency] to: core-metrics-results/rarefied_table.qza[0m
[32mSaved SampleData[AlphaDiversity] % Properties('phylogenetic') to: core-metrics-results/faith_pd_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/observed_otus_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/shannon_vector.qza[0m
[32mSaved SampleData[AlphaDiversity] to: core-metrics-results/evenness_vector.qza[0m
[32mSaved DistanceMatrix % Properties('phylogenetic') to: core-metrics-results/unweighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix % Properties('phylogenetic') to: core-metrics-results/weighted_unifrac_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: core-metrics-results/jaccard_distance_matrix.qza[0m
[32mSaved DistanceMatrix to: core-metrics-results/bray_curtis_distance_matrix.qza[0m
[32mSaved PCoAResults to: core-metrics-results/unweighted_unifrac_pcoa_results.qza[0m
[32mSaved PCoAResults to: core-me

In [13]:
!qiime diversity alpha-group-significance \
  --i-alpha-diversity core-metrics-results/faith_pd_vector.qza \
  --m-metadata-file cisplatine_metadata.tsv \
  --o-visualization core-metrics-results/faith-pd-group-significance.qzv

[32mSaved Visualization to: core-metrics-results/faith-pd-group-significance.qzv[0m


In [14]:
Visualization.load('core-metrics-results/faith-pd-group-significance.qzv')

In [15]:
!qiime diversity alpha-group-significance \
  --i-alpha-diversity core-metrics-results/evenness_vector.qza \
  --m-metadata-file cisplatine_metadata.tsv \
  --o-visualization core-metrics-results/evenness-group-significance.qzv

[32mSaved Visualization to: core-metrics-results/evenness-group-significance.qzv[0m


In [16]:
Visualization.load('core-metrics-results/evenness-group-significance.qzv')

# Taxonomic analysis

In [17]:
!wget \
  -O "gg-13-8-99-515-806-nb-classifier.qza" \
  "https://data.qiime2.org/2019.7/common/gg-13-8-99-515-806-nb-classifier.qza"

--2019-09-09 15:49:22--  https://data.qiime2.org/2019.7/common/gg-13-8-99-515-806-nb-classifier.qza
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/common/gg-13-8-99-515-806-nb-classifier.qza [following]
--2019-09-09 15:49:23--  https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/common/gg-13-8-99-515-806-nb-classifier.qza
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.248.160
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.248.160|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28373760 (27M) [application/x-www-form-urlencoded]
Saving to: `gg-13-8-99-515-806-nb-classifier.qza'


2019-09-09 15:49:34 (2.62 MB/s) - `gg-13-8-99-515-806-nb-classifier.qza' saved [28373760/28373760]



In [18]:
!qiime feature-classifier classify-sklearn \
  --i-classifier gg-13-8-99-515-806-nb-classifier.qza \
  --i-reads dada2_rep_set.qza \
  --o-classification taxonomy.qza

[32mSaved FeatureData[Taxonomy] to: taxonomy.qza[0m


In [19]:
!qiime metadata tabulate \
  --m-input-file taxonomy.qza \
  --o-visualization taxonomy.qzv

[32mSaved Visualization to: taxonomy.qzv[0m


In [23]:
!qiime taxa barplot \
  --i-table dada2_table.qza \
  --i-taxonomy taxonomy.qza \
  --m-metadata-file cisplatine_metadata.tsv \
  --o-visualization taxa-bar-plots.qzv

[32mSaved Visualization to: taxa-bar-plots.qzv[0m


In [24]:
Visualization.load('taxa-bar-plots.qzv')