Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alignedread.aend is None sometimes and wellington_footprints.py crashes #16

Closed
JohnReid opened this issue Jul 23, 2016 · 2 comments
Closed
Assignees

Comments

@JohnReid
Copy link

JohnReid commented Jul 23, 2016

Hi,
I have been running into a problem where alignedread.aend returns None occasionally and wellington_footprints.py exists with the following stack trace:

+ wellington_footprints.py -fdrlimit -5 /home/jer15/Dev/Saturn/Data/DNASE/peaks/relaxed/DNASE.K562.relaxed.bed /home/jer15/Dev/Saturn/Data/DNASE/bams/DNASE.K562.biorep2.techrep16.bam /home/jer15/Dev/Saturn/Data/DNASE/bams/footprints/K562/relaxed/2/16/                                           
Reading BED File...                                                                                                                                                                                                                                                                                   
Calculating footprints...                                                                                                                                                                                                                                                                             
Traceback (most recent call last):                                                                                                                                                                                                                                                                    
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/bin/wellington_footprints.py", line 150, in <module>                                                                                                                                                                                             
    multiWellington(orderedbychr,reads, shoulder_sizes = clargs.shoulder_sizes ,footprint_sizes = clargs.footprint_sizes, FDR_cutoff=clargs.FDR_cutoff,FDR_iterations=clargs.FDR_iterations,bonferroni = clargs.bonferroni)                                                                           
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/bin/wellington_footprints.py", line 140, in multiWellington                                                                                                                                                                                      
    fp = footprinting.wellington(i,reads,**kwargs)                                                                                                                                                                                                                                                    
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/lib/python2.7/site-packages/pyDNase/footprinting/__init__.py", line 46, in __init__                                                                                                                                                              
    self.reads        = reads[self.interval]                                                                                                                                                                                                                                                          
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/lib/python2.7/site-packages/pyDNase/__init__.py", line 192, in __getitem__                                                                                                                                                                       
    return self.get_cut_values(vals)                                                                                                                                                                                                                                                                  
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/lib/python2.7/site-packages/pyDNase/__init__.py", line 182, in get_cut_values                                                                                                                                                                    
    retval = self.__lookupReadsWithoutCache(startbp,endbp,chrom)                                                                                                                                                                                                                                      
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/lib/python2.7/site-packages/pyDNase/__init__.py", line 122, in __lookupReadsWithoutCache                                                                                                                                                         
    a = int(alignedread.aend)                                                                                                                                                                                                                                                                         
TypeError: int() argument must be a string or a number, not 'NoneType'

The docs for pysam say

aligned reference position of the read on the reference genome.

reference_end points to one past the last aligned residue. Returns None if not available (read is unmapped or no cigar alignment present).

I thought all the reads were aligned but I could be wrong (I am a bit new to read mapping and DNase-seq and the data has come from a third party).

Is there any way for wellington_footprints.py to handle these problems gracefully rather than crashing? Can it just ignore these reads? This only seems to happen for reads mapped to the reverse strand: alignedread.pos is never None.

And thanks for your nice software :)

@jpiper
Copy link
Owner

jpiper commented Jul 23, 2016

Thanks for the bug report John!

I've noticed that pySAM has actually deprecated aend, so I'll fix that as well so pyDNase doesn't suddenly stop working.

By default, pyDNase just takes all the reads provided to it in the BAM - letting the user decide whether badly mapped/secondary alignments should be utilised. I (stupidly) never handled the situation where unmapped reads may be present in the BAM, and likewise, because not a lot of people are doing paired end alignment for DNase-seq (at least, none of the public data is), I didn't handle ignoring non-primary paired alignments.

For the minimum, I've changed it to only process mapped reads (see 4aca97f) and I'll try to release this to master today, which should fix your issue.

For paired end reads, there's a lot of considerations. I'm thinking a command line flag to ignore secondary alignments or not, but I'll think about this a bit more before implementing something here -opinions and input welcome!

@jpiper jpiper self-assigned this Jul 23, 2016
@JohnReid
Copy link
Author

Great thanks for the quick response! I can't say I have any opinions on the secondary alignments as I'm a bit new to this read mapping and still trying to get the hang of it. If I suddenly form some strong opinions I'll be sure to let you know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants