Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] mean_baq #528

Closed
caleblareau opened this issue Aug 16, 2017 · 4 comments
Closed

[Feature Request] mean_baq #528

caleblareau opened this issue Aug 16, 2017 · 4 comments

Comments

@caleblareau
Copy link

Any interest / thoughts in a mean_baq function? The idea would be per base pair to, well, compute the average BAQ. It'd be really useful for my workflow at least.

I'd imagine that one could gently modify the count_coverage function to implement this quickly.

Please let me know if there's a straightforward way that I'm missing to do this with the existing functionality.

Thanks!

@AndreasHeger
Copy link
Contributor

Hi @caleblareau , I will take a look if there is easily accessible functionality in htslib.

@caleblareau
Copy link
Author

caleblareau commented Aug 26, 2017

Hey @AndreasHeger thanks for the thought. I did a bit of digging too and here's something that I'd like to do:

################
# Quality scores
################
n = int(maxBP)

counts = [0.00000001] * n # initialize with a pseudo count to avoid dividing by zero
qualities = [0.0] * n

baq_bam = bamfile.replace(".qc.bam", ".baq.bam")
pysam.calmd("-bAr", bamfile, fasta_file, ">", baq_bam) # this seems like it should work but it doesn't

# The rest is python code for calculating the mean quality but would be faster in C naturally
bam = pysam.AlignmentFile(baq_bam, "rb")
for read in bam:
	seq = read.seq
	quality = read.query_qualities
	for qpos, refpos in read.get_aligned_pairs(True):
		if qpos is not None and refpos is not None:
			counts[refpos] += 1
			qualities[refpos] += qpos

meanQual = [round(x/y,3) for x, y in zip(qualities, counts)]

qoutpre = outpre.replace("/temp/sparse_matrices/","/temp/quality/")
with open(qoutpre + ".quality.txt", 'w') as f:
	f.write(sample + "\n")
	for b in range(0,n):
		f.write(str(meanQual[b]) + "\n")

However, this doesn't produce the output to the .baq.bam file that I'm hoping to generate. It looks like you all have turned off unit testing. See:
https://github.com/pysam-developers/pysam/blob/master/tests/samtools_test.py#L89

For now, I can make a system call to samtools calmd:

samtools calmd -bAr in.qc.bamhg19.fasta > out.baq.bam

(see "Dump BAQ applied alignment for other SNP callers" http://www.htslib.org/doc/samtools.html)

Then, I loop over the qualities, but it'd be great of course to do this internally. Let me know if this makes any sense / any further details would help.

Obviously, it'd also be awesome to compute this mean quality without spitting out the intermediate .bam file.

@AndreasHeger
Copy link
Contributor

AndreasHeger commented Dec 15, 2017

This functionality is now part of the new pileup-API:

for column in samfile.pileup("chr1"):
     average_qualities = column.get_query_qualities() / len(column)

Please let me know if this is sufficient.

@caleblareau
Copy link
Author

@AndreasHeger you're awesome; this will be super useful. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants