Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about bin start-end used for samtools invocation #9

Closed
vineetbansal opened this issue Jul 30, 2020 · 0 comments
Closed

Question about bin start-end used for samtools invocation #9

vineetbansal opened this issue Jul 30, 2020 · 0 comments

Comments

@vineetbansal
Copy link
Collaborator

vineetbansal commented Jul 30, 2020

I notice that in the BAMBinning.py script, the chromosome range is split up into bins (of size -b as passed in to the script). Assuming this is 50kb, and -q is 11, the script seems to be invoking samtools repeatedly like so:

  • samtools view input.bam -c -q 11 chr11:0-50000
  • samtools view input.bam -c -q 11 chr11:50000-100000
    ...

My understanding is that samtools uses 1-based indexing, with both boundaries inclusive. Although passing in a 0 instead of a 1 in the first case seems to work (and gives identical results to the 1 case), this seems to be something that is handled as a special case by samtools, but fails when one tries to use 0 in the underlying htslib library (as I found when trying to use pysam). So perhaps this may need fixing?

Also, creating the ranges like above means that position 50000 is repeated between the first invocation and the second. Maybe this is intentional (I don't know much about the underlying science), in which case I can emulate this 1-position overlap in the pysam case too.

Overall, I'm finding that utilizing pysam (a thin python wrapper around the underlying htslib library) instead of samtools through popen seems to reduce runtime on a ~8GB BAM (I'm using SRR5906250.bam), from 220s to just 40s, while producing identical .bin files, so I think this might be worth pursuing..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant