-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional info in fastq header? #433
Comments
Hi @jfnjdoh - that's intended behavior. The BAM output (default) from dorado contains more information in the header. Can you share your motivation for using |
Hi @tijyojwad, they ran the sample for 24 hours and wanted to know how long they need to run a sample in order to generate a particular level of assembly quality. So I was going to run assemblies for all reads collected after 2 hours, 4 hours, 6 hours, etc, and see how that affected the final assembly. Hence I needed the time a read was collected to do that. If it matters, the assembler is Flye. The live called data has that info in the fastq headers so I assumed using dorado would more or less mimic that output. Seeing as how that's intended behavior, running If that's all there is too it and my approach to doing what I'm trying to do is correct, then you can close this issue. |
Hi @jfnjdoh - thanks for the details. That's quite an interesting use case! For now using |
If I could add one more comment, knowing how quickly you could get your data seems to be an important factor, see the following paper https://www.nature.com/articles/s41586-023-06615-2. Our use case is more of a "time is critical, how quickly can we accurately identify what's in this sample?". It's also useful for more mundane things like "it's late, should I start this now or just let it run overnight?" |
Hi @jfnjdoh, I am running into the same issue as you. I gave a try to your Could you please elaborate a bit more on how you ran it to have the complete fasta headers? Best, |
@NikoLichi The full command is to use |
Thank you for your help and fast reply. I'll give a try :) |
I received some data that was called using live base calling on the machine. The fastq files have lots of information in the headers for each read like the start time, the flow cell id, etc. However, I called the same files from their pod5's using
dorado basecaller --emit-fastq <model> <pod5> > <fastq>
and the headers only have the read ID. Is that the intended behavior? Can I get that additional info out of dorado or is it not stored in the pod5?The text was updated successfully, but these errors were encountered: