You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I notice that in the BAMBinning.py script, the chromosome range is split up into bins (of size -b as passed in to the script). Assuming this is 50kb, and -q is 11, the script seems to be invoking samtools repeatedly like so:
My understanding is that samtools uses 1-based indexing, with both boundaries inclusive. Although passing in a 0 instead of a 1 in the first case seems to work (and gives identical results to the 1 case), this seems to be something that is handled as a special case by samtools, but fails when one tries to use 0 in the underlying htslib library (as I found when trying to use pysam). So perhaps this may need fixing?
Also, creating the ranges like above means that position 50000 is repeated between the first invocation and the second. Maybe this is intentional (I don't know much about the underlying science), in which case I can emulate this 1-position overlap in the pysam case too.
Overall, I'm finding that utilizing pysam (a thin python wrapper around the underlying htslib library) instead of samtools through popen seems to reduce runtime on a ~8GB BAM (I'm using SRR5906250.bam), from 220s to just 40s, while producing identical .bin files, so I think this might be worth pursuing..
The text was updated successfully, but these errors were encountered:
I notice that in the
BAMBinning.py
script, the chromosome range is split up into bins (of size-b
as passed in to the script). Assuming this is50kb
, and-q
is11
, the script seems to be invokingsamtools
repeatedly like so:samtools view input.bam -c -q 11 chr11:0-50000
samtools view input.bam -c -q 11 chr11:50000-100000
...
My understanding is that
samtools
uses 1-based indexing, with both boundaries inclusive. Although passing in a0
instead of a1
in the first case seems to work (and gives identical results to the1
case), this seems to be something that is handled as a special case bysamtools
, but fails when one tries to use0
in the underlyinghtslib
library (as I found when trying to usepysam
). So perhaps this may need fixing?Also, creating the ranges like above means that position
50000
is repeated between the first invocation and the second. Maybe this is intentional (I don't know much about the underlying science), in which case I can emulate this 1-position overlap in thepysam
case too.Overall, I'm finding that utilizing
pysam
(a thin python wrapper around the underlyinghtslib
library) instead ofsamtools
throughpopen
seems to reduce runtime on a ~8GB BAM (I'm usingSRR5906250.bam
), from220s
to just40s
, while producing identical.bin
files, so I think this might be worth pursuing..The text was updated successfully, but these errors were encountered: