[hail] Implement concordance in Python.#6224
Merged
danking merged 4 commits intohail-is:masterfrom May 31, 2019
Merged
Conversation
This is unfortunately about 2x slower -- partly due to the fact that the column + global concordance calculations are not fused, and partly because the AggArrayPerElement stuff seems pretty slow right now and is dragging down the per-sample concordance.
patrick-schultz
previously requested changes
May 31, 2019
|
|
||
|
|
||
| def concordance2(left, right, *, _localize_global_statistics=True): | ||
| print('conc2') |
Member
There was a problem hiding this comment.
The print is clearly debugging code. Is concordance2 itself also?
Contributor
Author
There was a problem hiding this comment.
whoops! Yes, the whole function can be removed.
patrick-schultz
requested changes
May 31, 2019
Member
patrick-schultz
left a comment
There was a problem hiding this comment.
Just one thought, which you're free to ignore. Otherwise looks good.
|
|
||
| lit = hl.literal(included, dtype=hl.tset(hl.tstr)) | ||
| left = left.filter_cols(lit.contains(left.col_key[0])) | ||
| right = right.filter_cols(lit.contains(right.col_key[0])) |
Member
There was a problem hiding this comment.
This might be faster using semi_join_cols. I assume the semi-join is an ordered merge, so linear, while this is n log n, doing a binary search for each column.
Contributor
Author
There was a problem hiding this comment.
semi join can't do an ordered merge -- we don't store the columns ordered
Member
There was a problem hiding this comment.
Oh of course. Never mind!
patrick-schultz
approved these changes
May 31, 2019
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is unfortunately about 2x slower -- partly due to the fact
that the column + global concordance calculations are not fused,
and partly because the AggArrayPerElement stuff seems pretty
slow right now and is dragging down the per-sample concordance.