For each PacBio dataset (Movie ID), we compared yield at Q30 for ccs (baseline), DeepConsensus v0.2, and DeepConsensus v0.3.
Movie ID | Sample | Chemistry | Mean insert size |
---|---|---|---|
m64011_181218_235052 | HG002 | 1 | 11 kb |
m64008_201124_002822 | HG002 | 2.2 | 15 kb |
m64014_200920_132517 | HG002 | 2.2 | 24 kb |
version | movie | dataset | num_reads_ccs | num_reads | yield@emQ20 | yield@emQ20/ccs | yield@emQ30 | yield@emQ30/ccs | yield@emQ40 | yield@emQ40/ccs | hours |
---|---|---|---|---|---|---|---|---|---|---|---|
v0.3 | m64011_181218_235052 | chem1_11kb | 1,393,202 | 1,533,357 | 16.86 Gb | 108.74% | 11.16 Gb | 121.78% | 4.06 Gb | 167.33% | 277.68 |
v0.3 | m64008_201124_002822 | chem2.2_15kb | 2,689,147 | 2,864,908 | 42.41 Gb | 106.09% | 30.41 Gb | 115.70% | 7.54 Gb | 191.51% | 683.97 |
v0.3 | m64014_200920_132517 | chem2.2_24kb | 1,919,192 | 2,064,266 | 48.99 Gb | 107.02% | 27.64 Gb | 149.24% | 1.60 Gb | 462.97% | 925.01 |
yield@emQ30/ccs
or "Yield at empirical Q30 relative to CCS" is calculated as
follows:
- Filter DeepConsensus output to predicted Q20.
- For each read, align it to the truth and calculate identity from that alignment: identity = # matches / (# matches + # mismatches + # insertions + # deletions).
- Take all the reads that have identity >= 0.999 (this is Q30).
- Because longer reads are more useful than shorter reads, we count the total bases and not just the number of reads.
- Next we repeat the above for the original CCS reads (run with default params = Q20 filtered) and subtract and divide them to get a percentage, e.g. 40% percent means that DeepConsensus increased yield of high quality reads in bases by 40% over CCS.
These were run on GCP n1-standard-16
machines with no GPU (in 500 shards,
combined above), with --batch_zmws=100 --batch_size=1024
, which is generally
what we recommend. For more information on compute setups, see the
runtime metrics page.
The --skip_windows_above
option (new in v0.3) allows DeepConsensus to skip
windows whose average CCS base qualities are already above a certain quality
threshold. The windows that are skipped just adopt the CCS sequence without
correction. This saves runtime, but there is a yield tradeoff, shown in this
chart for m64014_200920_132517-chr20:
The default in v0.3 is Q45, but you can adjust this level using
--skip_windows_above
.