Skip to content
A De Novo Next Generation Genomic Sequence Assembler Based on String Graph and MapReduce Cloud Computing Framework
Java Perl
Find file
Latest commit 8ee7084 Jul 29, 2013 @ice91 readme update
Failed to load latest commit information.
data
hadoop_config hadoop Aug 27, 2012
src/Brush
COPYING
CloudBrush.jar high kmer filter May 8, 2013
LEGAL license Aug 19, 2012
README.md

README.md

CloudBrush

CloudBrush is a de novo genome assembler based on the string graph and mapreduce framework, it is released under Apache License 2.0 (http://www.apache.org/licenses/LICENSE-2.0) as a Free and Open Source Software Project.

More details about CloudBrush Project you can get under: https://github.com/ice91/CloudBrush


requirement: hadoop cluster

step 1: convert *.fastq to *.sfa

e.g. perl data/preprocessor.pl -renum data/Ec10k.sim.fastq > data/Ec10k.sim.sfa

step 2: upload *.sfa to HDFS

e.g. hadoop fs -put data/Ec10k.sim.sfa Ec10k

(You can use CloudRS(ReadStackCorrector) as preprocesser to correct sequencing errors and prepare *.sfa in HDFS. More details about CloudRS Project you can get under: https://github.com/ice91/ReadStackCorrector)

step 3 perform the CloudBrush using hadoop cluster

e.g. hadoop jar CloudBrush.jar -reads Ec10k -asm Ec10k_Brush -k 21 -readlen 36

step 4 download *.fasta from HDFS

e.g. hadoop fs -cat Ec10k_Brush/* > Ec10k_Brush.fasta

[Reference] Yu-Jung Chang, Chien-Chih Chen, Chuen-Liang Chen and Jan-Ming Ho, "De Novo Next Generation Genomic Sequence Assembler Based on String Graph and MapReduce Cloud Computing Framework," BMC Genomics, volume 13, number Suppl 7, pages S28, December 2012.

Something went wrong with that request. Please try again.