Skip to content

Conversation

@nmousavi
Copy link
Contributor

@nmousavi nmousavi commented Jan 31, 2018

Hash of reference_bases and alternate_bases is used for the merge key.

fixes #102

Copy link
Contributor

@arostamianfar arostamianfar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Let's run it on platinum genomes to ensure everything works as expected.
@bashir2 FYI, this would be a great candidate for the 'large' integration test based on platinum genomes.

variant.reference_bases or '',
','.join(variant.alternate_bases or [])]])
hashlib.sha256(variant.reference_bases or '').hexdigest(),
hashlib.sha256(','.join(variant.alternate_bases or [])).hexdigest()]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making a private method _get_hash_key and use it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

variant.end or '',
variant.reference_bases or '',
','.join(variant.alternate_bases or [])]])
hashlib.sha256(variant.reference_bases or '').hexdigest(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if md5 is good enough here. 256 seems like an overkill :p

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used md5 instead.

def test_get_merge_keys(self):
strategy = move_to_calls_strategy.MoveToCallsStrategy(None, None, None)

empty_string_hash = hashlib.sha256('').hexdigest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making a helper method here as well. I think it's ok to use the one in the private lib too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@nmousavi nmousavi force-pushed the hash-merge branch 2 times, most recently from a162a85 to 30c6b03 Compare January 31, 2018 21:44
@coveralls
Copy link

coveralls commented Jan 31, 2018

Pull Request Test Coverage Report for Build 283

  • 11 of 11 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 90.629%

Totals Coverage Status
Change from base Build 277: 0.02%
Covered Lines: 2766
Relevant Lines: 3052

💛 - Coveralls

Hash of reference_bases and alternate_bases is used for the merge key.

fixes googlegenomics#102
Copy link
Contributor

@arostamianfar arostamianfar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@nmousavi
Copy link
Contributor Author

Job finished successfully on platinum dataset.

Thanks for the review!

@nmousavi nmousavi merged commit 9c0e492 into googlegenomics:master Jan 31, 2018
@nmousavi nmousavi deleted the hash-merge branch April 26, 2018 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use a hash for reference/alternate_bases when making the merge key

3 participants