Skip to content

Latest commit

 

History

History
46 lines (23 loc) · 2.6 KB

rel5.md

File metadata and controls

46 lines (23 loc) · 2.6 KB

rel5 (genomic DNA) now deprecated

rel5 was a merger of NA12878 DNA sequencing data from rel3 (regular sequencing protocols) and rel4 (ultra-read set), recalled with Albacore 2.1 and guppy 0.3.

Notes on chunk size

Mike Schatz (Johns Hopkins) and Fritz Sedlazeck (CSHL) noticed that the Albacore 2.1 had a high frequency of long false positive deletions that were confounding SV prediction. This was tracked down with the help of Chris Wright and Tim Massingham at ONT to the "chunk size" setting and the computation of signal scaling. Changing this value to 10000 should remove this problem and was performed for the Guppy calls.

Reference

GRCh38 with decoys was used as the reference file: GRCh38_full_analysis_set_plus_decoy_hla.fa.

Guppy

Data was downloaded from the ENA raw submission. Guppy was run on the GridION X5. Calling took approximately 48 hours on dual GPUs (1080 Ti), therefore basecalling speed was ~2.4Gb/hour.

Downloads

Minimap2 alignments (minimap2 -t 12 -ax map-ont -L GRCh38_full_analysis_set_plus_decoy_hla.fa) and samtools 1.6 with new -L flag:

Please note that all fast5 files for this project are available from the European Nucletide Archive under the following project.

Albacore 2.1

These basecalls are not recommended due to the above mentioned chunk size problem, but are included for completeness.

Downloads

Minimap2 alignments (minimap2 -t 12 -ax map-ont -L GRCh38_full_analysis_set_plus_decoy_hla.fa):

Assembly

Adam Phillippy and Sergey Koren posted a new Canu 1.7 + WTDBG + Nanopolish assembly using a dataset equivalent to the Albacore 2.1 reads above over on their blog.