New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{AH} separate header into AlignmentHeader from AlignmentFile, closes … #518

Merged
merged 8 commits into from Dec 1, 2017

Conversation

Projects
None yet
2 participants
@AndreasHeger
Contributor

AndreasHeger commented Jul 24, 2017

#517

This PR separates header functionality into a separate class (AlignmentHeader, cf. VariantHeader) and propagates the information through to iterators and AlignedSegments. The benefit is that:

  1. reference names can be looked up even after a file has been closed or is out-of-scope
  2. reference names are always available and no need to look up via tid.

At the moment AlignmentHeaders are read-only objects, i.e. the header needs to be created before any derived AlignedSegments are being created.

A few backwards incompatible changes are the consequence of this refactoring, most notable that AlignmentFile.header now returns AlignmentHeader and not a dictionary.

@AndreasHeger AndreasHeger requested a review from bioinformed Jul 24, 2017

@bioinformed

PR looks great -- I left some comments inline and there are some missing changes to AlignedSegment:

  • AlignedSegment.reference_name needs a validating setter
  • AlignedSegment.reference_id needs a validating setter
  • AlignedSegment.next_reference_name needs a validating setter
  • AlignedSegment.next_reference_id needs a validating setter
elif header:
self.header = build_header_from_dict(header)
self.header = makeAlignmentHeader_from_dict(header)

This comment has been minimized.

@bioinformed

bioinformed Jul 24, 2017

Member

Allow header to be an AlignmentHeader

This comment has been minimized.

@AndreasHeger

AndreasHeger Jul 31, 2017

Contributor

done

###########################################################################
# methods/properties referencing the header. These are mostly for backwards
# compatibility for pysam < 0.12
def is_valid_tid(self, int tid):

This comment has been minimized.

@bioinformed

bioinformed Jul 24, 2017

Member

The first few are to support parse_region, which is defined in the HTSFile base class.

This comment has been minimized.

@AndreasHeger

AndreasHeger Jul 31, 2017

Contributor

ok, removed misleading comment

with nogil:
self.header = sam_hdr_read(self.htsfile)
hdr = sam_hdr_read(self.htsfile)
assert hdr != NULL

This comment has been minimized.

@bioinformed

bioinformed Jul 24, 2017

Member

This needs to raise a real exception on failure.

This comment has been minimized.

@AndreasHeger

AndreasHeger Jul 31, 2017

Contributor

done

return header

header = load_bam()
self.assertTrue(header != None)

This comment has been minimized.

@bioinformed

bioinformed Jul 24, 2017

Member

Can be just self.assertTrue(header)

This comment has been minimized.

@AndreasHeger

AndreasHeger Jul 31, 2017

Contributor

done

cdef class AlignedSegment
cdef makeAlignedSegment(bam1_t * src, AlignmentFile alignment_file):
cdef makeAlignedSegment(bam1_t * src, AlignmentHeader header):

This comment has been minimized.

@bioinformed

bioinformed Jul 24, 2017

Member

I prefer the convention where there is no space after *. It seems the most common convention in well written C code and what I've been emulating in the BCF code.

This comment has been minimized.

@AndreasHeger
@@ -1348,7 +1541,7 @@ cdef class AlignmentFile(HTSFile):

This comment has been minimized.

@bioinformed

bioinformed Jul 24, 2017

Member

Add tid translation here for when read.header is not self.header? And add a translate method to AlignedSegment.

This is a breaking change that needs to happen at some point, because it breaks apps that have already translated their tids.

'''return an AlignedSegment object constructed from `src`'''
# note that the following does not call __init__
cdef AlignedSegment dest = AlignedSegment.__new__(AlignedSegment)
dest._delegate = bam_dup1(src)
dest._alignment_file = alignment_file
dest.header = header

This comment has been minimized.

@bioinformed

bioinformed Jul 24, 2017

Member

must raise an exception if header is false (None or destroyed).

This comment has been minimized.

@AndreasHeger

AndreasHeger Jul 31, 2017

Contributor

done

def copy(self):
return makeAlignmentHeader(bam_hdr_dup(self.ptr))

def clear(self):

This comment has been minimized.

@bioinformed

bioinformed Jul 24, 2017

Member

Strongly recommend not allowing a header to be cleared unless deallocated. Allowing ptr = NULL in a live header will do more than invalidate objects, it will cause segfaults when accessing those objects.

This comment has been minimized.

@AndreasHeger

This comment has been minimized.

@AndreasHeger

AndreasHeger Jul 31, 2017

Contributor

Removed clear()

This is a read-only attribute."""
def __get__(self):
if self.ptr:

This comment has been minimized.

@bioinformed

bioinformed Jul 24, 2017

Member

No need to check self.ptr if live AlignmentHeader objects require it.

This comment has been minimized.

@AndreasHeger

AndreasHeger Jul 31, 2017

Contributor

done

def __get__(self):
t = []
if self.ptr:
for x from 0 <= x < self.ptr.n_targets:

This comment has been minimized.

@bioinformed

bioinformed Jul 24, 2017

Member

probably also worth updating to range() syntax, rather than the archaic pyrex syntax.

This comment has been minimized.

@AndreasHeger

AndreasHeger Jul 31, 2017

Contributor

done

@@ -57,15 +57,15 @@ cdef class AlignedSegment:
cpdef has_tag(self, tag)

# returns a valid sam alignment string
cpdef tostring(self, AlignmentFile_t handle)
cpdef tostring(self, htsfile=*)

This comment has been minimized.

@bioinformed

bioinformed Jul 24, 2017

Member

Not clear why this method needs to be cpdef. Do we call it often from a cdef function?

AndreasHeger added some commits Jul 31, 2017

@@ -1766,7 +1761,8 @@ cdef class IteratorRow:
# better: use seek/tell?
with nogil:
hdr = sam_hdr_read(self.htsfile)
assert hdr != NULL
if hdr is NULL:

This comment has been minimized.

@bioinformed

bioinformed Jul 31, 2017

Member

I'm not sure how Cython treats tests for object identity with NULL. I typically use a boolean test instead: if not hdr.

add_sq_text=True):

self.ptr = bam_hdr_init()
if self.ptr is NULL:

This comment has been minimized.

@bioinformed

bioinformed Aug 1, 2017

Member

is -> ==

@pyup-bot pyup-bot referenced this pull request Aug 24, 2017

Closed

Update pysam to 0.12 #48

@pyup-bot pyup-bot referenced this pull request Sep 7, 2017

Merged

Update pysam to 0.12.0.1 #52

@AndreasHeger

This comment has been minimized.

Contributor

AndreasHeger commented Dec 1, 2017

TODO, @bioinformed suggestion:

... add an option to the AlignmentFile constructor which allows or restricts passing the AlignmentHeader to AlignedSegments. If the header is None, then tids are handled as they are now. If not, there will be intelligent setting and remapping as aligned segments are passed between AlignmentFiles

@AndreasHeger AndreasHeger merged commit 3ea6e3a into master Dec 1, 2017

1 check was pending

continuous-integration/travis-ci/push The Travis CI build is in progress
Details

@AndreasHeger AndreasHeger deleted the AH-AddAlignmentHeader branch Dec 1, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment