Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of classified reads significantly differ between ont-dorado-server and dorado standalone #659

Closed
billytcl opened this issue Feb 29, 2024 · 6 comments

Comments

@billytcl
Copy link

Issue Report

Please describe the issue:

I am using the Native Barcoding 96 kit with R10.4.1 5kHz, and I'm observing a huge difference between ont-dorado-server and dorado standalone. It's ~10-20% difference with more reads classified with ont-dorado-server and more being unclassified with dorado standalone. I am using a different methylation model but I highly doubt that's the culprit. I'm thinking there may be a weird interaction with read splitting that's different with standalone dorado?

ont-dorado-server v7.0.8

ont_basecall_client -p 5555 -i pod5/ -s dorado_5mc_5khz_prom_pod5/ -c dna_r10.4.1_e8.2_400bps_5khz_modbases_5mc_sup_prom.cfg --recursive --compress_fastq --barcode_kits EXP-NBD196 --align_ref hs38_naa.mmi --bam_out --min_qscore 7 --do_read_splitting --max_read_split_depth 4 --index --progress_stats_frequency 1000 --read_batch_size 200000

samtools flagstat -@ 20 ../bam_sup/P10574_27431.D01_control.merged.sorted.bam
3143504 + 0 in total (QC-passed reads + QC-failed reads)
2542461 + 0 primary
515667 + 0 secondary
85376 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
2701131 + 0 mapped (85.93% : N/A)
2100088 + 0 primary mapped (82.60% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

dorado 0.5.3

dorado basecaller sup,5mC_5hmC Seq_Output/20231003_1702_2F_PAO83072_97d6f8ad/ --recursive --min-qscore 7 --kit-name EXP-NBD196 --reference hs38_naa.mmi

samtools flagstat -@ 20 P10574_27431.D01_control.merged.sorted.bam
2679896 + 0 in total (QC-passed reads + QC-failed reads)
2120676 + 0 primary
467276 + 0 secondary
91944 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
2426897 + 0 mapped (90.56% : N/A)
1867677 + 0 primary mapped (88.07% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Run environment:

  • Dorado version: 7.0.8 and 0.5.3
  • Dorado command: see above
  • Operating system: Linux
  • Hardware (CPUs, Memory, GPUs): A100
  • Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
  • Source data location (on device or networked drive - NFS, etc.):
  • Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): R10.4.1
  • Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):
@tijyojwad
Copy link
Collaborator

Hi @billytcl - can you try to run the dorado basecaller command with then --no-trim option? I suspect that adapter trimming may be getting in the way of barcode classification, so running with no-trim will help validate that theory

@billytcl
Copy link
Author

billytcl commented Mar 5, 2024 via email

@tijyojwad
Copy link
Collaborator

Hi @billytcl - I want to confirm that's indeed the case for your situation. We have some improvements along that line internally for the next release already, and your input would help determine if more changes are needed.

@billytcl
Copy link
Author

billytcl commented Mar 5, 2024

Ok! This may take a few days for the run to finish basecalling.

@ezherman
Copy link

@billytcl did you get a chance to give this a go? I am interested in your result too!

@tijyojwad
Copy link
Collaborator

tijyojwad commented Jun 6, 2024

Closing due to inactivity. FYI @billytcl we have improved the interplay between adapter trimming and barcoding within dorado since 0.6.0 release, so this should be less of a problem now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants