Skip to content
This repository

PAired-eND Assembler for DNA sequences

branch: master
Octocat-spinner-32 debian Release version 2.7 March 22, 2014
Octocat-spinner-32 m4 Detect compile switches correctly for clang February 20, 2014
Octocat-spinner-32 testing Rebased regression-testing framework on new code February 01, 2014
Octocat-spinner-32 .gitignore Rebased regression-testing framework on new code February 01, 2014
Octocat-spinner-32 .indent.pro Add more type to indent file March 10, 2014
Octocat-spinner-32 .travis.yml Try Travis with clang again February 20, 2014
Octocat-spinner-32 CHANGES Point a CHANGES file at the Debian changelog November 17, 2013
Octocat-spinner-32 COPYING Created C version with module system January 28, 2012
Octocat-spinner-32 Makefile.am Add a plugin to display overlaps examined April 07, 2014
Octocat-spinner-32 README Include a symlink to README to keep AutoTools happy October 12, 2013
Octocat-spinner-32 README.md Fixed wrong package name in Debian/Ubuntu install instructions December 25, 2013
Octocat-spinner-32 algo.c Free memory associated with algorithms list February 12, 2014
Octocat-spinner-32 algo.h Move scoring algorithm to a separate module November 01, 2013
Octocat-spinner-32 algo_example.c Make algorithm list able to handle new plugins December 24, 2013
Octocat-spinner-32 algo_pear.c Clean up includes December 24, 2013
Octocat-spinner-32 algo_rdp_mle.c Fix RDP scoring algorithm January 28, 2014
Octocat-spinner-32 algo_simple_bayes.c Clean up includes December 24, 2013
Octocat-spinner-32 args.c Correct setting of HAVE_PTHREAD in config.h February 21, 2014
Octocat-spinner-32 args_array.c Clean up includes December 24, 2013
Octocat-spinner-32 args_assembler.c Set the maximum overlap back to read length February 25, 2014
Octocat-spinner-32 args_fastq.c Prevent repitition of command line arguments February 20, 2014
Octocat-spinner-32 args_hang.c Clean up includes December 24, 2013
Octocat-spinner-32 assembler.c Do a bound check when setting values in the bit list March 14, 2014
Octocat-spinner-32 assembler.h Allow restricting the maximum overlap length February 21, 2014
Octocat-spinner-32 assembler_support.c Change check on max overlap March 10, 2014
Octocat-spinner-32 async.c Avoid leaving unprocessed reads in the buffer when done November 15, 2013
Octocat-spinner-32 autogen.sh Created C version with module system January 28, 2012
Octocat-spinner-32 buffer.c Fix memory leak in static_buffer February 12, 2014
Octocat-spinner-32 buffer.h Fixed bug in non-pthreaded buffer allocation April 18, 2013
Octocat-spinner-32 buffer.list Move mux to having dedicated buffers May 23, 2013
Octocat-spinner-32 build-macos-pkg.in Packaging scripts for MacOS April 25, 2013
Octocat-spinner-32 bzstream.c Addes streaming BZip decompressor for cURL January 09, 2014
Octocat-spinner-32 configure.ac Release version 2.7 March 22, 2014
Octocat-spinner-32 curl_reader.c Fix potential free of garbage address in cURL on Windows February 20, 2014
Octocat-spinner-32 deps-url.in Add dependency file for Vala binding for pandaseq-url November 22, 2013
Octocat-spinner-32 diff.c Fix bug preventing -k from working as expected in diff February 21, 2014
Octocat-spinner-32 fastq.c Supress warning about PHRED+64 when the header is CASAVA 1.7+ March 16, 2014
Octocat-spinner-32 fileio.c Fix formatting March 07, 2014
Octocat-spinner-32 hang.c Clean up includes December 24, 2013
Octocat-spinner-32 idset.c Added PandaIdFmt to formatting types and reformatted August 28, 2013
Octocat-spinner-32 iter.c Migrate iter stuff to a separate C file August 29, 2013
Octocat-spinner-32 lib.rc Separate product and file versions in Windows. January 07, 2014
Octocat-spinner-32 linebuf.c Deal with DOS formatted files gracefully February 12, 2014
Octocat-spinner-32 main-diff.c Add a program to compare differening conditions February 01, 2014
Octocat-spinner-32 main-hang.c Clean up includes December 24, 2013
Octocat-spinner-32 main-parse.c Add support for EBI SRA header formats March 08, 2014
Octocat-spinner-32 main.c Clean up includes December 24, 2013
Octocat-spinner-32 misc.c Fix duplicate const qualifier February 20, 2014
Octocat-spinner-32 misc.h Set me maybe macro May 21, 2013
Octocat-spinner-32 mktable.c Fix RDP scoring to match original algorithm January 31, 2014
Octocat-spinner-32 module.c Make showing modules in help work again January 31, 2014
Octocat-spinner-32 module.h List all known modules when the help is invoked December 19, 2013
Octocat-spinner-32 mux.c Correct setting of HAVE_PTHREAD in config.h February 21, 2014
Octocat-spinner-32 nt.c Fix probability to PHRED conversion again February 04, 2014
Octocat-spinner-32 nt.h Formatted and reorganised code September 26, 2012
Octocat-spinner-32 offset.c Clean up includes December 24, 2013
Octocat-spinner-32 output.c Clean up includes December 24, 2013
Octocat-spinner-32 panda_api.c Formatted and reorganised code September 26, 2012
Octocat-spinner-32 pandabug Added bug-filing script March 04, 2013
Octocat-spinner-32 pandaseq-algorithm.h Add RDP maximum likehood algorithm January 11, 2014
Octocat-spinner-32 pandaseq-args.h Allow restricting the maximum overlap length February 21, 2014
Octocat-spinner-32 pandaseq-assembler.h Allow restricting the maximum overlap length February 21, 2014
Octocat-spinner-32 pandaseq-checkid.1 Thorough review and update of all man pages March 05, 2013
Octocat-spinner-32 pandaseq-common.h Add support for EBI SRA header formats March 08, 2014
Octocat-spinner-32 pandaseq-diff.1 Add a man page for pandaseq-diff February 01, 2014
Octocat-spinner-32 pandaseq-hang.1 Added pandaseq-hang manual page May 23, 2013
Octocat-spinner-32 pandaseq-iter.h Split up the massive header file into manageable pieces May 25, 2013
Octocat-spinner-32 pandaseq-linebuf.h Move to buffered reads to improve performance August 14, 2013
Octocat-spinner-32 pandaseq-log.h Add a perror-like function to log proxy December 18, 2013
Octocat-spinner-32 pandaseq-module.h Make plugins have no static state January 23, 2014
Octocat-spinner-32 pandaseq-mux.h Detect compression automatically. January 01, 2014
Octocat-spinner-32 pandaseq-nt.h Split up the massive header file into manageable pieces May 25, 2013
Octocat-spinner-32 pandaseq-plugin.h Make plugins have no static state January 23, 2014
Octocat-spinner-32 pandaseq-seqid.h Add support for EBI SRA header formats March 08, 2014
Octocat-spinner-32 pandaseq-set.h Added PandaIdFmt to formatting types and reformatted August 28, 2013
Octocat-spinner-32 pandaseq-tablebuilder.h Add documentation. Everyone loves documentation. September 03, 2013
Octocat-spinner-32 pandaseq-url.h Addes streaming BZip decompressor for cURL January 09, 2014
Octocat-spinner-32 pandaseq-writer.h Add a discarding writer January 09, 2014
Octocat-spinner-32 pandaseq.1 Document default k-mer table size. March 25, 2014
Octocat-spinner-32 pandaseq.h Separate out mux compilation February 20, 2014
Octocat-spinner-32 pandaseq.spec Created C version with module system January 28, 2012
Octocat-spinner-32 pandaseq.svg Added logo and made README markdown October 09, 2013
Octocat-spinner-32 pandaxs.1 Thorough review and update of all man pages March 05, 2013
Octocat-spinner-32 pandaxs.in Make pandaxs a little more pedantic April 13, 2014
Octocat-spinner-32 pc-url.in Add a URL data source November 21, 2013
Octocat-spinner-32 pc.in Make library naming more automatic December 06, 2012
Octocat-spinner-32 plugin_after.c Make plugins have no static state January 23, 2014
Octocat-spinner-32 plugin_before.c Make plugins have no static state January 23, 2014
Octocat-spinner-32 plugin_completely_miss_the_point.c Make plugins have no static state January 23, 2014
Octocat-spinner-32 plugin_filter.c Fix file handle leak in filter plugin April 08, 2014
Octocat-spinner-32 plugin_min_overlapbits.c Make plugins have no static state January 23, 2014
Octocat-spinner-32 plugin_min_phred.c Add unneeded parentheses to keep clang happy February 20, 2014
Octocat-spinner-32 plugin_min_readqscore.c Add unneeded parentheses to keep clang happy February 20, 2014
Octocat-spinner-32 plugin_overlap_stat.c Fix memory correctness in overlap stat plugin April 08, 2014
Octocat-spinner-32 plugin_pear_test.c Add missing return to PEAR test arg parsing February 20, 2014
Octocat-spinner-32 plugin_sample.c Make plugins have no static state January 23, 2014
Octocat-spinner-32 plugin_validtag.c Make plugins have no static state January 23, 2014
Octocat-spinner-32 pool.c Correct setting of HAVE_PTHREAD in config.h February 21, 2014
Octocat-spinner-32 prob.h Migrate PHREDCLAMP to header file August 29, 2013
Octocat-spinner-32 proxy.c Add a perror-like function to log proxy December 18, 2013
Octocat-spinner-32 seqid.c Add support for EBI SRA header formats March 08, 2014
Octocat-spinner-32 tablebuilder.c Make all arrays const in table builder October 31, 2013
Octocat-spinner-32 vapi-url.in Addes streaming BZip decompressor for cURL January 09, 2014
Octocat-spinner-32 vapi.in Forgot to save VAPI changes March 08, 2014
Octocat-spinner-32 writer.c Produce error on partial write March 16, 2014
README.md

PANDASEQ

PANDASEQ is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.

INSTALLATION

Build Status Build Status

Binary packages are available for recent versions of Windows, MacOS and Linux. Installing from source is not too difficult. See Installation instructions for details.

Development packages for zlib and libbz2 are needed, as is a standard compiler environment. On Ubuntu, this can be installed via

sudo apt-get install build-essential libtool automake zlib1g-dev libbz2-dev

On MacOS, the Apple Developer tools and Fink (or MacPorts or Brew) must be installed, then

sudo fink install bzip2-dev

After the support packages are installed, one should be able to do:

./autogen.sh && ./configure && make && sudo make install

If you receive an error that libpandaseq.so.[number] is not found on Linux, try running:

sudo ldconfig

USAGE

Please consult the manual page by invoking

man pandaseq

or visiting online PANDAseq manual page.

The short version is

pandaseq -f forward.fastq -r reverse.fastq

REPORTING BUGS

Before filing a bug, consult how to file a bug.

Please run:

curl https://raw.github.com/neufeld/pandaseq/master/pandabug | sh

or

wget -O- https://raw.github.com/neufeld/pandaseq/master/pandabug | sh

to create a header with basic details about your system. Please include:

  1. The output of the above script.
  2. The exact error message. If this is a compilation error, do not truncate the output. If this is a problem when assembling, keep the INFO ARG lines, and the last few lines, but you may truncate the middle.
  3. If you have tried multiple different things, please list them all.
  4. Your sequencing data may be requested. This usually does not necessitate all the reads.

BINDING

PANDAseq may be used in other programs via a programmatic interface. Consult the header file pandaseq.h for more details. The C interface is pseudo-object oriented and documented in the header. The library provides pkg-config information, so compiling against it can be done using something like:

cc mycode.c `pkg-config --cflags --libs pandaseq-2`

or using, in configure.ac:

PKG_CHECK_MODULES(PANDASEQ, [ pandaseq-2 >= 2.5 ])

A Vala binding is also included.

Other lanugage bindings are welcome.

FAQ

Can I insist that PANDAseq only assembler perfect sequences?

Yes, but you shouldn't want to do it. The whole point is to fix sequences which are probably good. There is no quality setting that will achieve this effect. You can use the plugin completely_miss_the_point, but this really does miss the point. Moreover, assuming that the sequencer is right in the overlap region and in the non-overlapping regions requires an unsound leap in statistics.

Can I use SAM/BAM files as input without converting them to FASTQ?

Yes. PANDAseq-sam extends PANDAseq to do this. SAM/BAM files do not guarantee that sequences will be in the right order, so files may be slower and PANDAseq will use more memory.

The scores of the output bases seem really low. What's wrong?

Nothing. The quality scores of the output do not have any similarity to the original quality scores and are not uniform across the sequence (i.e., the overlap is scored differently from the unpaired ends.

In the overlap region where there is a mismatch, it is the probability that one base was sequenced correctly and the other was sequenced incorrectly. If both bases have high scores (i.e., are probably correct), the chance of the resulting base is low (i.e., is probably incorrect). For more information, see the paper. Also, remember that the PHRED to probability conversion is not linear, so most scores are relatively high. It's also not uncommon to see the PHRED score !, which is zero, but in this context, it means less than " (PHRED = 1, P = .20567).

Again, these scores are not meant to be interpreted as regular scores and should not be processed by downstream applications expecting PHRED scores from Illumina sequences.

The scores of the non-overlapping regions are not the same as the original reads. Why?

The PHRED scores from the input are not copied directly to the output when using FASTQ (-F) output. They go through a transformation from PHRED scores into probabilities, which is how PANDAseq uses them. When output as FASTQ, the probability is converted back to a PHRED scores. The rounding error involved can cause a score to jump by one.

ALTERNATIVES

PEAR (Paired-End AssembleR)
FLASH (Fast Length Adjustment of SHort reads)
COPE (Connecting Overlapped Pair-End reads)
XORRO (Rapid Pair-end Read Overlapper)

CITATION

Andre P Masella, Andrea K Bartram, Jakub M Truszkowski, Daniel G Brown and Josh D Neufeld. PANDAseq: paired-end assembler for illumina sequences. BMC Bioinformatics 2012, 13:31. http://www.biomedcentral.com/1471-2105/13/31

Something went wrong with that request. Please try again.