Software

Progressive Approach for SNP Calling and Haplotype Assembly Using SMRT Data

About

We develop a progressive approach for SNP calling and haplotype assembly that works very well for SMRT data. Our method takes a set of pre-aligned reads, the reference sequence, dbSNP and chromosome ID (1-22) as its input. The output contains two files, named info.snp and group.mtx. Our method can handle more than 200 million non-N bases on Chromosome 1 with 4 million reads, more than 100 blocks, each of which contains 2 million bases and more than 3000 SNP sites on average.

Software

To run our program, put the following files under the same directory: run.class, block.sam, Homo_sapiens_assembly19.fasta, 00-All.vcf, where run.class is the executible file, block.sam the pre-aligned reads in a block, Homo_sapiens_assembly19.fasta the reference sequence, and 00-ALL.vcf the dbSNP.

Download software package: run.class
Download small example: block.sam
Download reference sequence: Homo_sapiens_assembly19.fasta
Download dbSNP: 00-All.vcf

Try command: java -Xmx5G run block.sam Homo_sapiens_assembly19.fasta 00-All.vcf result/ 1

The output contains two files under the same directory: info.snp and group.mtx

SNP calling: info.snp
Haplotype assembly: group.mtx

Environment

Linux
Intel Xeon CPU
32GB RAM

Data

PacBio sequencing data on NA12878 (Accession: SRX638310): here
NIST Human data on NA24385 (Ashkenazim Trio Son: HG002): here

Result

Simulation data: chromosome 1

We use six servers (72 CPU, 1536GB RAM) to complete the experiment.

Share Data: Link : akvm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Software

Progressive Approach for SNP Calling and Haplotype Assembly Using SMRT Data

About

Software

Data

Result

Clone this wiki locally