Skip to content

Software

GIAB edited this page Jul 26, 2018 · 60 revisions

Progressive Approach for SNP Calling and Haplotype Assembly Using SMRT Data

About

We develop a progressive approach for SNP calling and haplotype assembly that works very well for SMRT data. Our method takes a set of pre-aligned reads, the reference sequence, dbSNP and chromosome ID (1-22) as its input. The output contains two files, named info.snp and group.mtx. Our method can handle more than 200 million non-N bases on Chromosome 1 with 4 million reads, more than 100 blocks, each of which contains 2 million bases and more than 3000 SNP sites on average.

Software

To run our program, put the following files under the same directory: run.class, block.sam, Homo_sapiens_assembly19.fasta, 00-All.vcf, where run.class is the executible file, block.sam the pre-aligned reads in a block, Homo_sapiens_assembly19.fasta the reference sequence, and 00-ALL.vcf the dbSNP.

  1. Download software package: run.class

  2. Download small example: block.sam

  3. Download reference sequence: Homo_sapiens_assembly19.fasta

  4. Download dbSNP: 00-All.vcf

Try command: java -Xmx5G run block.sam Homo_sapiens_assembly19.fasta 00-All.vcf result/ 1

The output contains two files under the same directory: info.snp and group.mtx

Environment

  • Linux

  • Intel Xeon CPU

  • 32GB RAM

Data

  • PacBio sequencing data on NA12878 (Accession: SRX638310): here

  • NIST Human data on NA24385 (Ashkenazim Trio Son: HG002): here

Result

We use six servers (72 CPU, 1536GB RAM) to complete the experiment.

Share Data: Link : akvm

Clone this wiki locally