Skip to content

Generating error and qscore models

Ryan Wick edited this page Feb 26, 2019 · 3 revisions

Badread comes with two error/qscore models: one that I built with Oxford Nanopore reads (MinION, R9.4 flowcell) and one that I built with PacBio reads (PacBio RS II, CLR). If you'd like to build your own model, keep reading!

Requirements:

  • Long reads (at least a Gbp would be good)
  • A high-quality reference FASTA (ideally an Illumina-polished assembly of the same genome as the reads came from)
  • minimap2 (my favourite long read aligner).

First, you must align your long reads to your reference. Make sure to use minimap2's -c option so it includes the CIGAR string in the output:

minimap2 -c -x map-ont reference.fasta.gz reads.fastq.gz | gzip > alignments.paf.gz

Now build the models with Badread (this can take a long time, especially for large read sets):

badread error_model --reference reference.fasta.gz --reads reads.fastq.gz --alignment alignments.paf.gz > new_error_model
badread qscore_model --reference reference.fasta.gz --reads reads.fastq.gz --alignment alignments.paf.gz > new_qscore_model

If it's taking too long or running out of RAM, try limiting the number of alignments used with the --max_alignments option.

Clone this wiki locally