Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add export_bgen #6462

Merged
merged 10 commits into from Jun 27, 2019
Merged

add export_bgen #6462

merged 10 commits into from Jun 27, 2019

Conversation

@cseed
Copy link
Collaborator

@cseed cseed commented Jun 25, 2019

Exports BGEN 1.2 with 8 bits per probability.

Needs docs and more tests, not quite ready yet.

Cotton Seed and others added 3 commits Jun 25, 2019
Copy link
Collaborator

@tpoterba tpoterba left a comment

I'll take review for this PR. Two minor comments, let me know when you've written docs/tests

Loading

tmp = new_temp_file()
hl.export_bgen(bgen, tmp)
hl.index_bgen(tmp + '.bgen')
self.assertTrue(bgen._same(hl.import_bgen(tmp + '.bgen',
Copy link
Collaborator

@tpoterba tpoterba Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer plain python assert here -- pytest error messages are better than unittest's

Loading

uncompressedData(samplePloidyStart + i) = ploidy
roundWithConstantSum(gpResized, fractional, index, indexInverse, uncompressedData, totalProb)
} else {
warn(s"GP sum was not in the range [0.999, 1.001]. Found $gpSum. Emitting missing probabilities.")
Copy link
Collaborator

@tpoterba tpoterba Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this warning will go to the worker logs, creating massive logs if something is wrong in the data but not propagating back to the user. Prefer an incremented count of bad GPs per partition which is collected with the partition file name.

Loading

@tpoterba tpoterba self-assigned this Jun 25, 2019
@cseed
Copy link
Collaborator Author

@cseed cseed commented Jun 27, 2019

Thanks, @tpoterba! I wasn't done so I hadn't assigned anyone, just wanted to run the tests, but it should be good to go now.

Loading

@cseed cseed dismissed tpoterba’s stale review Jun 27, 2019

addressed

@cseed
Copy link
Collaborator Author

@cseed cseed commented Jun 27, 2019

Note, @jigold gets the credit for this, ExportBGEN.scala was basically taken from an old PR of hers that never made it in. I just simplified it to only support the 8-bit case.

Loading

@danking danking merged commit 0eef603 into hail-is:master Jun 27, 2019
1 check passed
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants