Ultra-fast de novo assembler using long noisy reads
Branch: master
Clone or download
ruanjue Update README.md
Many poeple asked the question about consensus sequences, please add -c 1 to smartdenovo.pl
2
Latest commit 5cc1356 Feb 19, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Makefile
Makefile.pbh5 convert pbhdf5 file into fast5q Dec 16, 2015
README-tools.md formatted the table Dec 16, 2015
README.md Update README.md Feb 19, 2018
bit2vec.h upload codes Dec 15, 2015
bitsvec.h upload codes Dec 15, 2015
bitvec.h Update bitvec.h Sep 16, 2017
block_sparse_array.h upload codes Dec 15, 2015
bloom_filter.h upload codes Dec 15, 2015
counting_bloom_filter.h upload codes Dec 15, 2015
dagcns.h Add files via upload Feb 26, 2017
dbm_index_fa.pl upload codes Dec 15, 2015
dbm_read_fa.pl upload codes Dec 15, 2015
dna.h upload codes Dec 15, 2015
file_reader.c upload codes Dec 15, 2015
file_reader.h upload codes Dec 15, 2015
fq2fa.pl upload codes Dec 15, 2015
golomb.h upload codes Dec 15, 2015
hashset.h upload codes Dec 15, 2015
heap.h add heap_replace to speed up pop+push the similar value Jan 7, 2016
hzm_aln.h Add files via upload Feb 26, 2017
ksw.c upload codes Dec 15, 2015
ksw.h upload codes Dec 15, 2015
kswx.h Add files via upload Feb 26, 2017
large_seqs.pl upload codes Dec 15, 2015
linkset.h upload codes Dec 15, 2015
list.h add num_abs Jan 9, 2016
longest_pacbio_subreads.pl upload codes Dec 15, 2015
longest_pacbio_subreads_f5q.pl upload codes Dec 15, 2015
mem_share.h upload codes Dec 15, 2015
pairaln.c change gap open penalty to -3 Dec 21, 2015
pbcluster_haplo.pl upload codes Dec 15, 2015
pbcluster_upgma.pl upload codes Dec 15, 2015
pbcorr_dbg.c upload codes Dec 15, 2015
pbcorr_dbg.h upload codes Dec 15, 2015
pbh5tof5q.c convert pbhdf5 file into fast5q Dec 16, 2015
pomsa.h Add files via upload Feb 26, 2017
queue.h upload codes Dec 15, 2015
rename_fa.pl upload codes Dec 15, 2015
rename_fq.pl upload codes Dec 15, 2015
run_dmo.sh pipeline for DMO Jan 19, 2016
run_zmo.sh wtgbo avoid to calculate failed pairs in wtzmo Dec 31, 2015
seq_n50.pl upload codes Dec 15, 2015
smartdenovo.pl
sort.h upload codes Dec 15, 2015
string.h upload codes Dec 15, 2015
thread.h Add files via upload Feb 26, 2017
timer.h change micro name for xxx_clock Jan 7, 2016
upgma.h upload codes Dec 15, 2015
wtbase.h upload codes Dec 15, 2015
wtclp.c Add files via upload Feb 26, 2017
wtcns.c Add files via upload Feb 26, 2017
wtcorr.c upload codes Dec 15, 2015
wtcyc.c change gap open penalty to -3 Dec 21, 2015
wtdif.c upload codes Dec 15, 2015
wtext.c Add files via upload Feb 26, 2017
wtgbo.c Add files via upload Feb 26, 2017
wtidx.c upload codes Dec 15, 2015
wtidx.h upload codes Dec 15, 2015
wtjnt.c upload codes Dec 15, 2015
wtlay.c skip strange bubble Oct 31, 2017
wtlay.h Add files via upload Feb 26, 2017
wtmer.c upload codes Dec 15, 2015
wtmsa.c Add files via upload Feb 26, 2017
wtobt.c Add files via upload Feb 26, 2017
wtpre.c Add files via upload Feb 26, 2017
wtsky.c upload codes Dec 15, 2015
wtzmo.c Add files via upload Feb 26, 2017

README.md

Getting Started

# Download sample PacBio from the PBcR website
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz | tar zxf -
awk 'NR%4==1||NR%4==2' selfSampleData/pacbio_filtered.fastq | sed 's/^@/>/g' > reads.fa
# Install SMARTdenovo
git clone https://github.com/ruanjue/smartdenovo.git && (cd smartdenovo; make)
# Assemble (raw unitigs in wtasm.lay.utg; consensus unitigs: wtasm.cns)
smartdenovo/smartdenovo.pl -c 1 reads.fa > wtasm.mak
make -f wtasm.mak

Introduction

SMARTdenovo is a de novo assembler for PacBio and Oxford Nanopore (ONT) data. It produces an assembly from all-vs-all raw read alignments without an error correction stage. It also provides tools to generate accurate consensus sequences, though a platform dependent consensus polish tools (e.g. Quiver for PacBio or Nanopolish for ONT) are still required for higher accuracy.

SMARTdenovo consists of several separate command line tools: wtzmo for read overlapping, wtgbo to rescue missing overlaps, wtclp for identifying low-quality regions and chimaera, and wtcns or wtmsa to produce better unitig consensus. The smartdenovo.pl script provides a convenient interface to call these programs in one go. If you do not care about the internal of SMARTdenovo, you may simply run with:

/path/to/smartdenovo/smartdenovo.pl -p prefix -c 1 reads.fa > prefix.mak
make -f prefix.mak

It calls other SMARTdenovo executables in the same directory containing smartdenovo.pl. After assembly, the raw unitigs are reported in file prefix.lay.utg and consensus unitigs in prefix.cns. If you want to know more about how SMARTdenovo works in detail, please see README-tools.md.

New development

Most time of assembly is spent on Smith-Waterm alignment, which might be not necessary to long reads assembly. We are developping a novel algorithm, called dot matrix alignment , which is smith-waterman free.

wtzmo now supports dot matrix alignment by add option -U -1 -m 0.1. run_dmo.sh works well on E.coli, Yeast PacBio dataset, Bacteria ERS554120, and drosopila.