Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash bam features #1145

Merged
merged 4 commits into from Jul 30, 2018

Conversation

Projects
None yet
2 participants
@cmdcolin
Copy link
Contributor

cmdcolin commented Jul 26, 2018

This is an attempt at fixing #1108

The problem of having 100% exactly the same data for reads is still not solved because external data like byte offsets leads to unmatched ID when the same bam read spans multiple chunks hence why #1087 was reverted

@wafflebot wafflebot bot added the in progress label Jul 26, 2018

@rbuels rbuels added this to the 1.15.1 milestone Jul 28, 2018

@@ -22,6 +22,26 @@ var Crc32 = {
return re;
},

crc32_raw: function( bytes, crc ) {
if( crc == window.undefined ) crc = 0;

This comment has been minimized.

@rbuels

rbuels Jul 28, 2018

Collaborator

you want a === here i think.

This comment has been minimized.

@cmdcolin
@@ -100,7 +101,7 @@ var Feature = Util.fastDeclare(
},

id: function() {
return this._get('name')+'/'+this._get('md')+'/'+this._get('cigar')+'/'+this._get('start')+'/'+this._get('multi_segment_next_segment_reversed');
return Crc32.crc32_raw(this.bytes.byteArray.slice(this.bytes.start, this.bytes.end));

This comment has been minimized.

@rbuels

rbuels Jul 28, 2018

Collaborator

do you think it would be worth it performance-wise to avoid the slice and instead make crc32_raw accept bounds for the calculation? e.g. crc32_raw(this.bytes.byteArray, this.bytes.start, this.bytes.end)

This comment has been minimized.

@rbuels

rbuels Jul 28, 2018

Collaborator

also, is id() being memoized? We need to make sure this only gets calculated once per feature.

This comment has been minimized.

@cmdcolin

cmdcolin Jul 30, 2018

Author Contributor

id() is memoized, the _get() arguments stores all lazyily parsed attributes in this.data and does not again parse it

also I changed the crc to avoid slice now :)

@rbuels rbuels merged commit 36931f7 into dev Jul 30, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@wafflebot wafflebot bot removed the in progress label Jul 30, 2018

@cmdcolin cmdcolin deleted the hash_bam_features branch Aug 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.