- 理解比对(mapping, alignment)的含义
- 理解全局比对和局部比对的区别和应用
- 掌握应用bwa, minimap2, samtools的使用
- 理解SAM, BAM文件格式
- Global alignment
- Local alignment
我们熟悉的blast和blat均属于第二类。
另外,不同长度的reads比对所用的策略也不一样,对于短reads,基于local alignment的软件如blast, blat不适合。
将短的reads回帖到长的参考基因组上,这一过程称之为mapping。一般reads数目很大,读长短,参考基因组较长,对于mapping软件有两个要求:
- 速度
- 准确性
Mapping软件众多,比较有名的包括bwa, soap, bowtie, novoalign
另外,由于真核生物mRNA不含有内含子,与一般的DNA mapping软件要求不一样,故转录组mapping使用的软件也不一样,转录组mapping软件比较有名的包括:STAR, hisat
本实验主要介绍一般意义上的DNA mapping软件的使用。
- Volume of data
- Garbage reads
- Errors in reads, and quality scores
- Repeat elements and multicopy sequence
- SNPs/SNVs
- Indels
- Splicing (transcriptome)
- How many mismatches to allow?
- Report how many matches?
- Require best match, or first/any that fit criteria?
mkdir lab02
cd lab02
mkdir data
mkdir results
/data/lab/genomic/lab02/data/REL606.fa (参考序列)
/data/lab/genomic/lab02/data/reads_1.fq.gz, /data/lab/genomic/lab02/data/reads_2.fq.gz (illumina reads)
/data/lab/genomic/lab02/data/pb_ecoli_0001.fastq (pacbio reads)
$ cd data
$ ln -s /data/lab/genomic/lab02/data/REL606.fa /data/lab/genomic/lab02/data/reads_* ../data/pb_ecoli_0001.fastq ./
$ samtools faidx REL606.fa
$ mkdir index
$ cd index
$ ln -s ../REL606.fa ./
work_bwaIndex.sh
#!/bin/bash
#$ -S /bin/bash
#$ -N INDEX
#$ -j y
#$ -cwd
bwa index REL606.fa
cd ../../results
work_bwa.sh
#!/bin/bash
#$ -S /bin/bash
#$ -N bwa
#$ -j y
#$ -cwd
bwa mem ../data/index/REL606.fa ../data/reads_1.fq.gz ../data/reads_2.fq.gz > mapping.sam
samtools view -b mapping.sam > mapping.bam
samtools sort -o mapping.sort.bam mapping.bam
samtools index mapping.sort.bam
work_bwa2.sh (using pipe)
#!/bin/bash
#$ -S /bin/bash
#$ -N bwa_pipe
#$ -j y
#$ -cwd
bwa mem ../data/index/REL606.fa ../data/reads_1.fq.gz ../data/reads_2.fq.gz | \
samtools view -b - | \
samtools sort -o mapping.sort.2.bam -
samtools index mapping.sort.2.bam
work_minimap2.sh
#!/bin/bash
#$ -S /bin/bash
#$ -N minimap2
#$ -j y
#$ -cwd
minimap2 -ax sr ../data/REL606.fa ../data/reads_1.fq.gz ../data/reads_2.fq.gz |\
samtools view -b - |\
samtools sort -o mapping.sort.mm.bam -
samtools index mapping.sort.mm.bam
work_minimap_pb.sh
#!/bin/bash
#$ -S /bin/bash
#$ -N minimap2
#$ -j y
#$ -cwd
minimap2 -ax map-pb ../data/REL606.fa ../data/pb_ecoli_0001.fastq |\
samtools view -b - |\
samtools sort -o mapping.sort.pb.bam -
samtools index mapping.sort.pb.bam
- 先组装,得到contigs,assemble short reads using SPAdes, assemble pacbio long reads using canu | mecat | miniasm
- 然后将contigs用bwa mem比对到参考基因组上
- 用igv显示比对结果