Pacbio Error Correction with Mummer

This is a pipeline developed around Mummer to correct Pacbio reads using high-identity short reads

Files

correct.sh - Runs the pipeline (local or sge)
correct.py - Corrects pb reads given output from 'show-snps'
partition.py - Convenience script to partition input Pacbio reads (for cluster)

Dependencies

Mummer must be installed and in your PATH. In particular the following subprograms are required to be in your path:

nucmer
delta-filter
show-snps

Workflow

The high level workflow is as follows:

Build Unitigs of short reads using your favorite assembler
Update file paths in correct.sh
If running on a cluster, partition pacbio reads using partition.py
Run correct.sh locally or on a cluster
Concatenate the output (*.corrected) files
Assemble concatenated reads using Celera or another Overlap Assembler.

Example

To run the example:

$> cd test
$> ../correct.sh pbread.test.fa

A file pbread.test.fa.corrected will be produced. The pipeline is using the short-read unitigs in test/utg.test.fa for correction. If you blast both the pbread.test.fa (uncorrected read) and pbread.test.fa.corrected to the Oryza Sativa genome you will find that the uncorrected read has ~85% identity while the corrected read has >99% identity.

Running on a Cluster

1.a Build unitigs with Celera or another assembler. Refer to chosen assembler's manual for details

1.b Begin by splitting the Pacbio reads into bins using the partition.py script. It is suggested that
you put anywhere between 5 and 50 pacbio reads in a file, depending on how much data you have.

2.Edit the correct.sh script to point to the unitigs created in 1.a and set the path to the correct.py
script.

3. The partition.py script will have created many directories with files in them in the form 0001/p0001,
0001/p0002 etc. To launch the correction with SGE on a particular bin cd into the bin and run:

$> qsub -cwd -t 1:${FILES_PER_BIN} ../correct.sh

where ${FILES_PER_BIN} is how many files per bin you gave as a parameter to the partition.py script.

4. Wait

5. Concatenate all files. Usually can do something like this from the top level dir:

$> cat `find . | grep corrected$` > corrected.fa

6. Launch your favorate overlap assembler on corrected.fa

Checking Results

Internally, corrected reads are generally blasted against a reference genome to determine their percent identity. If a reference does not exist, using blast to align directly to unitigs or a draft assembly is another option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pacbio Error Correction with Mummer

Files

Dependencies

Workflow

Example

Running on a Cluster

Checking Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
test		test
README.md		README.md
correct.py		correct.py
correct.sh		correct.sh
partition.py		partition.py

Folders and files

Latest commit

History

Repository files navigation

Pacbio Error Correction with Mummer

Files

Dependencies

Workflow

Example

Running on a Cluster

Checking Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages