출처: https://docs.qiime2.org/2019.7/tutorials/atacama-soils/

# Atacama soil microbiome tutorial

This tutorial is designed to serve two purposes. 

- First, it illustrates the initial processing steps of paired-end read analysis, up to the point where the analysis steps are identical to single-end read analysis.
- Second, this is intended to be a self-guided exercise that could be run after the moving pictures tutorial to gain more experience with QIIME 2.

In this tutorial you’ll use QIIME 2 to perform an analysis of soil samples from the Atacama Desert in northern Chile

# Obtain the data

Start by creating a directory to work in.

In [1]:
!pwd

/home/partrita/jupyternotebook/Tutorial_Atacama_microbiome


In [2]:
!wget \
  -O "sample-metadata.tsv" \
  "https://data.qiime2.org/2019.7/tutorials/atacama-soils/sample_metadata.tsv"

--2019-09-05 10:46:31--  https://data.qiime2.org/2019.7/tutorials/atacama-soils/sample_metadata.tsv
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://docs.google.com/spreadsheets/d/1y3yM50tW_23H7fXeou9XwyM92VNd8dCtgk8ndHOMSMs/export?gid=0&format=tsv [following]
--2019-09-05 10:46:33--  https://docs.google.com/spreadsheets/d/1y3yM50tW_23H7fXeou9XwyM92VNd8dCtgk8ndHOMSMs/export?gid=0&format=tsv
Resolving docs.google.com (docs.google.com)... 172.217.161.142, 2404:6800:4005:806::200e
Connecting to docs.google.com (docs.google.com)|172.217.161.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/tab-separated-values]
Saving to: `sample-metadata.tsv'

    [ <=>                                   ] 9,433       --.-K/s   in 0.007s  

2019-09-05 10:46:33 (1.26 MB/s) - `sample-metadata.tsv' saved [94

Next, you’ll download the multiplexed reads. You will download three fastq.gz files, corresponding to the forward, reverse, and barcode (i.e., index) reads. 

# 1% subsample data

In [3]:
!mkdir emp-paired-end-sequences

In [4]:
!wget \
  -O "emp-paired-end-sequences/forward.fastq.gz" \
  "https://data.qiime2.org/2019.7/tutorials/atacama-soils/1p/forward.fastq.gz"

--2019-09-05 10:47:25--  https://data.qiime2.org/2019.7/tutorials/atacama-soils/1p/forward.fastq.gz
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/tutorials/atacama-soils/1p/forward.fastq.gz [following]
--2019-09-05 10:47:26--  https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/tutorials/atacama-soils/1p/forward.fastq.gz
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.238.48
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.238.48|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14579008 (14M) [binary/octet-stream]
Saving to: `emp-paired-end-sequences/forward.fastq.gz'


2019-09-05 10:47:30 (4.37 MB/s) - `emp-paired-end-sequences/forward.fastq.gz' saved [14579008/14579008]



In [5]:
!wget \
  -O "emp-paired-end-sequences/reverse.fastq.gz" \
  "https://data.qiime2.org/2019.7/tutorials/atacama-soils/1p/reverse.fastq.gz"

--2019-09-05 10:47:32--  https://data.qiime2.org/2019.7/tutorials/atacama-soils/1p/reverse.fastq.gz
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/tutorials/atacama-soils/1p/reverse.fastq.gz [following]
--2019-09-05 10:47:32--  https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/tutorials/atacama-soils/1p/reverse.fastq.gz
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.245.0
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.245.0|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16296324 (16M) [binary/octet-stream]
Saving to: `emp-paired-end-sequences/reverse.fastq.gz'


2019-09-05 10:47:39 (2.50 MB/s) - `emp-paired-end-sequences/reverse.fastq.gz' saved [16296324/16296324]



In [6]:
!wget \
  -O "emp-paired-end-sequences/barcodes.fastq.gz" \
  "https://data.qiime2.org/2019.7/tutorials/atacama-soils/1p/barcodes.fastq.gz"

--2019-09-05 10:47:41--  https://data.qiime2.org/2019.7/tutorials/atacama-soils/1p/barcodes.fastq.gz
Resolving data.qiime2.org (data.qiime2.org)... 52.35.38.247
Connecting to data.qiime2.org (data.qiime2.org)|52.35.38.247|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/tutorials/atacama-soils/1p/barcodes.fastq.gz [following]
--2019-09-05 10:47:42--  https://s3-us-west-2.amazonaws.com/qiime2-data/2019.7/tutorials/atacama-soils/1p/barcodes.fastq.gz
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.230.0
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.230.0|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2105968 (2.0M) [binary/octet-stream]
Saving to: `emp-paired-end-sequences/barcodes.fastq.gz'


2019-09-05 10:47:44 (1.58 MB/s) - `emp-paired-end-sequences/barcodes.fastq.gz' saved [2105968/2105968]



# Paired-end read analysis commands

To analyze these data, the sequences that you just downloaded must first be imported into an artifact of type EMPPairedEndSequences

In [7]:
!qiime tools import \
   --type EMPPairedEndSequences \
   --input-path emp-paired-end-sequences \
   --output-path emp-paired-end-sequences.qza

[32mImported emp-paired-end-sequences as EMPPairedEndDirFmt to emp-paired-end-sequences.qza[0m


You next can demultiplex the sequence reads. This requires the sample metadata file, and you must indicate which column in that file contains the per-sample barcodes. 

In [8]:
!qiime demux emp-paired \
  --m-barcodes-file sample-metadata.tsv \
  --m-barcodes-column barcode-sequence \
  --p-rev-comp-mapping-barcodes \
  --i-seqs emp-paired-end-sequences.qza \
  --o-per-sample-sequences demux.qza \
  --o-error-correction-details demux-details.qza

[32mSaved SampleData[PairedEndSequencesWithQuality] to: demux.qza[0m
[32mSaved ErrorCorrectionDetails to: demux-details.qza[0m


In [9]:
!qiime demux summarize \
  --i-data demux.qza \
  --o-visualization demux.qzv

[32mSaved Visualization to: demux.qzv[0m


In [10]:
from qiime2 import Visualization
Visualizationization.load('demux.qzv')

After demultiplexing reads, we’ll look at the sequence quality based on ten-thousand randomly selected reads, and then denoise the data.

In this example we have 150-base forward and reverse reads. Since we need the reads to be long enough to overlap when joining paired ends, the first thirteen bases of the forward and reverse reads are being trimmed, but no trimming is being applied to the ends of the sequences to avoid reducing the read length by too much.

In [11]:
!qiime dada2 denoise-paired \
  --i-demultiplexed-seqs demux.qza \
  --p-trim-left-f 13 \
  --p-trim-left-r 13 \
  --p-trunc-len-f 150 \
  --p-trunc-len-r 150 \
  --o-table table.qza \
  --o-representative-sequences rep-seqs.qza \
  --o-denoising-stats denoising-stats.qza

[32mSaved FeatureTable[Frequency] to: table.qza[0m
[32mSaved FeatureData[Sequence] to: rep-seqs.qza[0m
[32mSaved SampleData[DADA2Stats] to: denoising-stats.qza[0m


At this stage, you will have artifacts containing the feature table and corresponding feature sequences. You can generate summaries of those as follows.

In [13]:
!qiime feature-table summarize \
  --i-table table.qza \
  --o-visualization table.qzv \
  --m-sample-metadata-file sample-metadata.tsv

[32mSaved Visualization to: table.qzv[0m


In [14]:
!qiime feature-table tabulate-seqs \
  --i-data rep-seqs.qza \
  --o-visualization rep-seqs.qzv

[32mSaved Visualization to: rep-seqs.qzv[0m


In [15]:
Visualization.load('table.qzv')

In [16]:
Visualization.load('rep-seqs.qzv')

As well, you can visualize the denoising stats by running:

In [17]:
!qiime metadata tabulate \
  --m-input-file denoising-stats.qza \
  --o-visualization denoising-stats.qzv

[32mSaved Visualization to: denoising-stats.qzv[0m


In [18]:
Visualization.load('denoising-stats.qzv')

From this point, analysis of paired-end read data progresses in the same way as analysis of single-end read data.

# Questions to guide data analysis

Use the following questions to guide your further analyses of these data data.

- What value would you choose to pass for --p-sampling-depth? How many samples will be excluded from your analysis based on this choice? Approximately how many total sequences will you be analyzing in the core-metrics-phylogenetic command?

- What sample metadata or combinations of sample metadata are most strongly associated with the differences in microbial composition of the samples? Are these associations stronger with unweighted UniFrac or with Bray-Curtis? Based on what you know about these metrics, what does that difference suggest? For exploring associations between continuous metadata and sample composition, the commands qiime metadata distance-matrix in combination with qiime diversity mantel and qiime diversity bioenv will be useful. These were not covered in the Moving Pictures tutorial, but you can learn about them by running them with the --help parameter.

- What do you conclude about the associations between continuous sample metadata and the richness and evenness of these samples? For exploring associations between continuous metadata and richness or evenness, the command qiime diversity alpha-correlation will be useful. This was not covered in the Moving Pictures tutorial, but you can learn about it by running it with the --help parameter.

- Which categorical sample metadata columns are most strongly associated with the differences in microbial community richness or evenness? Are these differences statistically significant?

- In taxonomic composition bar plots, sort the samples by their average soil relative humidity, and visualize them at the phylum level. What are the dominant phyla in these samples? Which phyla increase and which decrease with increasing average soil relative humidity?

- What phyla differ in abundance across vegetated and unvegetated sites?