Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail] Implement concordance in Python. #6224

Merged
merged 4 commits into from May 31, 2019
Merged

Conversation

@tpoterba
Copy link
Collaborator

@tpoterba tpoterba commented May 30, 2019

This is unfortunately about 2x slower -- partly due to the fact
that the column + global concordance calculations are not fused,
and partly because the AggArrayPerElement stuff seems pretty
slow right now and is dragging down the per-sample concordance.

This is unfortunately about 2x slower -- partly due to the fact
that the column + global concordance calculations are not fused,
and partly because the AggArrayPerElement stuff seems pretty
slow right now and is dragging down the per-sample concordance.
@@ -285,9 +285,51 @@ def has_field_of_type(name, dtype):
return mt.annotate_rows(**{name: result})


def concordance2(left, right, *, _localize_global_statistics=True):
print('conc2')
Copy link
Collaborator

@patrick-schultz patrick-schultz May 31, 2019

The print is clearly debugging code. Is concordance2 itself also?

Copy link
Collaborator Author

@tpoterba tpoterba May 31, 2019

whoops! Yes, the whole function can be removed.

Copy link
Collaborator

@patrick-schultz patrick-schultz left a comment

Just one thought, which you're free to ignore. Otherwise looks good.


lit = hl.literal(included, dtype=hl.tset(hl.tstr))
left = left.filter_cols(lit.contains(left.col_key[0]))
right = right.filter_cols(lit.contains(right.col_key[0]))
Copy link
Collaborator

@patrick-schultz patrick-schultz May 31, 2019

This might be faster using semi_join_cols. I assume the semi-join is an ordered merge, so linear, while this is n log n, doing a binary search for each column.

Copy link
Collaborator Author

@tpoterba tpoterba May 31, 2019

semi join can't do an ordered merge -- we don't store the columns ordered

Copy link
Collaborator

@patrick-schultz patrick-schultz May 31, 2019

Oh of course. Never mind!

@danking danking merged commit 9d103e1 into hail-is:master May 31, 2019
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants