[Feature Request] mean_baq #528

caleblareau · 2017-08-16T04:12:31Z

Any interest / thoughts in a mean_baq function? The idea would be per base pair to, well, compute the average BAQ. It'd be really useful for my workflow at least.

I'd imagine that one could gently modify the count_coverage function to implement this quickly.

Please let me know if there's a straightforward way that I'm missing to do this with the existing functionality.

Thanks!

The text was updated successfully, but these errors were encountered:

AndreasHeger · 2017-08-21T09:42:25Z

Hi @caleblareau , I will take a look if there is easily accessible functionality in htslib.

caleblareau · 2017-08-26T02:20:07Z

Hey @AndreasHeger thanks for the thought. I did a bit of digging too and here's something that I'd like to do:

################
# Quality scores
################
n = int(maxBP)

counts = [0.00000001] * n # initialize with a pseudo count to avoid dividing by zero
qualities = [0.0] * n

baq_bam = bamfile.replace(".qc.bam", ".baq.bam")
pysam.calmd("-bAr", bamfile, fasta_file, ">", baq_bam) # this seems like it should work but it doesn't

# The rest is python code for calculating the mean quality but would be faster in C naturally
bam = pysam.AlignmentFile(baq_bam, "rb")
for read in bam:
	seq = read.seq
	quality = read.query_qualities
	for qpos, refpos in read.get_aligned_pairs(True):
		if qpos is not None and refpos is not None:
			counts[refpos] += 1
			qualities[refpos] += qpos

meanQual = [round(x/y,3) for x, y in zip(qualities, counts)]

qoutpre = outpre.replace("/temp/sparse_matrices/","/temp/quality/")
with open(qoutpre + ".quality.txt", 'w') as f:
	f.write(sample + "\n")
	for b in range(0,n):
		f.write(str(meanQual[b]) + "\n")

However, this doesn't produce the output to the .baq.bam file that I'm hoping to generate. It looks like you all have turned off unit testing. See:
https://github.com/pysam-developers/pysam/blob/master/tests/samtools_test.py#L89

For now, I can make a system call to samtools calmd:

samtools calmd -bAr in.qc.bamhg19.fasta > out.baq.bam

(see "Dump BAQ applied alignment for other SNP callers" http://www.htslib.org/doc/samtools.html)

Then, I loop over the qualities, but it'd be great of course to do this internally. Let me know if this makes any sense / any further details would help.

Obviously, it'd also be awesome to compute this mean quality without spitting out the intermediate .bam file.

AndreasHeger · 2017-12-15T13:30:40Z

This functionality is now part of the new pileup-API:

for column in samfile.pileup("chr1"):
     average_qualities = column.get_query_qualities() / len(column)

Please let me know if this is sufficient.

caleblareau · 2017-12-15T19:32:45Z

@AndreasHeger you're awesome; this will be super useful. Thanks!

AndreasHeger added the enhancement label Nov 10, 2017

AndreasHeger added the SAM/BAM/CRAM label Dec 8, 2017

AndreasHeger closed this as completed in 0e5cb84 Dec 15, 2017

caleblareau mentioned this issue Dec 15, 2017

samtools external dependency / fastq mean BAQ caleblareau/mgatk#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] mean_baq #528

[Feature Request] mean_baq #528

caleblareau commented Aug 16, 2017

AndreasHeger commented Aug 21, 2017

caleblareau commented Aug 26, 2017 •

edited

Loading

AndreasHeger commented Dec 15, 2017 •

edited

Loading

caleblareau commented Dec 15, 2017

[Feature Request] mean_baq #528

[Feature Request] mean_baq #528

Comments

caleblareau commented Aug 16, 2017

AndreasHeger commented Aug 21, 2017

caleblareau commented Aug 26, 2017 • edited Loading

AndreasHeger commented Dec 15, 2017 • edited Loading

caleblareau commented Dec 15, 2017

caleblareau commented Aug 26, 2017 •

edited

Loading

AndreasHeger commented Dec 15, 2017 •

edited

Loading