Yield improvement versus ccs on various sequencing runs

We evaluate on 3 different datasets

For each PacBio dataset (Movie ID), we compared yield at Q30 for ccs (baseline), DeepConsensus v0.2, and DeepConsensus v0.3.

Movie ID	Sample	Chemistry	Mean insert size
m64011_181218_235052	HG002	1	11 kb
m64008_201124_002822	HG002	2.2	15 kb
m64014_200920_132517	HG002	2.2	24 kb

Yield versus runtime

version	movie	dataset	num_reads_ccs	num_reads	yield@emQ20	yield@emQ20/ccs	yield@emQ30	yield@emQ30/ccs	yield@emQ40	yield@emQ40/ccs	hours
v0.3	m64011_181218_235052	chem1_11kb	1,393,202	1,533,357	16.86 Gb	108.74%	11.16 Gb	121.78%	4.06 Gb	167.33%	277.68
v0.3	m64008_201124_002822	chem2.2_15kb	2,689,147	2,864,908	42.41 Gb	106.09%	30.41 Gb	115.70%	7.54 Gb	191.51%	683.97
v0.3	m64014_200920_132517	chem2.2_24kb	1,919,192	2,064,266	48.99 Gb	107.02%	27.64 Gb	149.24%	1.60 Gb	462.97%	925.01

yield@emQ30/ccs or "Yield at empirical Q30 relative to CCS" is calculated as follows:

Filter DeepConsensus output to predicted Q20.
For each read, align it to the truth and calculate identity from that alignment: identity = # matches / (# matches + # mismatches + # insertions + # deletions).
Take all the reads that have identity >= 0.999 (this is Q30).
Because longer reads are more useful than shorter reads, we count the total bases and not just the number of reads.
Next we repeat the above for the original CCS reads (run with default params = Q20 filtered) and subtract and divide them to get a percentage, e.g. 40% percent means that DeepConsensus increased yield of high quality reads in bases by 40% over CCS.

These were run on GCP n1-standard-16 machines with no GPU (in 500 shards, combined above), with --batch_zmws=100 --batch_size=1024, which is generally what we recommend. For more information on compute setups, see the runtime metrics page.

Runtime-yield tradeoffs with `--skip_windows_above`

The --skip_windows_above option (new in v0.3) allows DeepConsensus to skip windows whose average CCS base qualities are already above a certain quality threshold. The windows that are skipped just adopt the CCS sequence without correction. This saves runtime, but there is a yield tradeoff, shown in this chart for m64014_200920_132517-chr20:

.

The default in v0.3 is Q45, but you can adjust this level using --skip_windows_above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yield_metrics.md

yield_metrics.md

Yield improvement versus ccs on various sequencing runs

We evaluate on 3 different datasets

Yield versus runtime

Runtime-yield tradeoffs with `--skip_windows_above`

Files

yield_metrics.md

Latest commit

History

yield_metrics.md

File metadata and controls

Yield improvement versus ccs on various sequencing runs

We evaluate on 3 different datasets

Yield versus runtime

Runtime-yield tradeoffs with --skip_windows_above

Runtime-yield tradeoffs with `--skip_windows_above`