Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract reads from BAM #6

Closed
pkerbs opened this issue Jan 19, 2024 · 3 comments · Fixed by #7
Closed

Extract reads from BAM #6

pkerbs opened this issue Jan 19, 2024 · 3 comments · Fixed by #7
Labels
enhancement New feature or request

Comments

@pkerbs
Copy link

pkerbs commented Jan 19, 2024

Hi,
thank you for developing this tool. Works great for me so far.
I am using it to extract reads from a BAM file generated by Dorado (basecalling with modified bases and mapping).

However, I need to have the reads mapped for analyzing the captured mC at certain positions over sequencing time.
At the moment this requires the extraction of the reads from the BAM to a fastq file, then extracting the reads with ontime for a specific timeperiod and a remapping with minimap (using -y param for keeping the MM:Z tag).

Would it be possible to include the function that extracts reads by time from a BAM to BAM, preserving the mapping and the methyl tag?

I think this would be highly useful for others too.

Cheers,
Paul

@mbhall88
Copy link
Owner

This sounds like a good idea. I'll try and get around to it this week or next.

@mbhall88 mbhall88 added the enhancement New feature or request label Jan 21, 2024
mbhall88 added a commit that referenced this issue Feb 6, 2024
mbhall88 added a commit that referenced this issue Feb 6, 2024
* feat: support SAM/BAM [#6]

* ci: update actions
mbhall88 added a commit that referenced this issue Feb 6, 2024
* feat: support SAM/BAM [#6]

* ci: update actions
@mbhall88
Copy link
Owner

mbhall88 commented Feb 6, 2024

This is now supported in v0.3.0

@pkerbs
Copy link
Author

pkerbs commented Feb 12, 2024

Hi @mbhall88,
thank you for your work on that. I finally found some time to test it and unfortunately, I encountered an issue with this new version.
I just run the new version on a BAM file like this:

ontime test.bam --to 2h > test_2h.bam

and got this error message:

[2024-02-12T20:25:09Z INFO ] Extracting read start times...
[2024-02-12T20:25:14Z INFO ] Gathered start times for 146140 reads
[2024-02-12T20:25:14Z INFO ] First and last timestamps in the input are 2024-02-08T15:24:22.872Z and 2024-02-09T16:35:34.702Z
[2024-02-12T20:25:14Z INFO ] Extracting reads with a start time between 2024-02-08 15:24:22.872 and 2024-02-08 17:24:22.872...
Error: Could not read the header of the input file

Caused by:
    invalid BAM header

The BAM file was generated by Dorado v0.5.2 like this:

dorado basecaller \
  sup,5mCG_5hmCG \
  <path to pod5 files> \
  --reference ref/chm13v2.fa \
  --bandwidth 500,20000 \
  -Y

Did I execute it wrong somehow, or missed a parameter for the header?
Thanks in advance.
Kind regards,
Paul

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants