Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail] Implement concordance in Python. #6224

Merged
merged 4 commits into from May 31, 2019

Conversation

Projects
None yet
3 participants
@tpoterba
Copy link
Collaborator

commented May 30, 2019

This is unfortunately about 2x slower -- partly due to the fact
that the column + global concordance calculations are not fused,
and partly because the AggArrayPerElement stuff seems pretty
slow right now and is dragging down the per-sample concordance.

[hail] Implement concordance in Python.
This is unfortunately about 2x slower -- partly due to the fact
that the column + global concordance calculations are not fused,
and partly because the AggArrayPerElement stuff seems pretty
slow right now and is dragging down the per-sample concordance.
@@ -285,9 +285,51 @@ def has_field_of_type(name, dtype):
return mt.annotate_rows(**{name: result})


def concordance2(left, right, *, _localize_global_statistics=True):
print('conc2')

This comment has been minimized.

Copy link
@patrick-schultz

patrick-schultz May 31, 2019

Collaborator

The print is clearly debugging code. Is concordance2 itself also?

This comment has been minimized.

Copy link
@tpoterba

tpoterba May 31, 2019

Author Collaborator

whoops! Yes, the whole function can be removed.

@patrick-schultz
Copy link
Collaborator

left a comment

Just one thought, which you're free to ignore. Otherwise looks good.


lit = hl.literal(included, dtype=hl.tset(hl.tstr))
left = left.filter_cols(lit.contains(left.col_key[0]))
right = right.filter_cols(lit.contains(right.col_key[0]))

This comment has been minimized.

Copy link
@patrick-schultz

patrick-schultz May 31, 2019

Collaborator

This might be faster using semi_join_cols. I assume the semi-join is an ordered merge, so linear, while this is n log n, doing a binary search for each column.

This comment has been minimized.

Copy link
@tpoterba

tpoterba May 31, 2019

Author Collaborator

semi join can't do an ordered merge -- we don't store the columns ordered

This comment has been minimized.

Copy link
@patrick-schultz

patrick-schultz May 31, 2019

Collaborator

Oh of course. Never mind!

@danking danking merged commit 9d103e1 into hail-is:master May 31, 2019

1 check passed

ci-test success
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.