Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[experimental] Add full_outer_join_mt experimental function. #5628

Merged
merged 4 commits into from Mar 21, 2019

Conversation

Projects
None yet
3 participants
@tpoterba
Copy link
Collaborator

commented Mar 18, 2019

This was remarkably easy.

Parameters
----------
a : :class:`.ArrayExpression`
index_first: :obj:`bool`
If ``True``, the index is the first value of the element tuples. If
``False``, the index is tthe second value.

This comment has been minimized.

Copy link
@danking

danking Mar 18, 2019

Collaborator

tthe -> the

This comment has been minimized.

Copy link
@tpoterba

tpoterba Mar 18, 2019

Author Collaborator

I hate the current generation of macbook keyboards.

This comment has been minimized.

Copy link
@tpoterba

tpoterba Mar 18, 2019

Author Collaborator

done

import hail as hl


def full_outer_join_mt(left: hl.MatrixTable, right: hl.MatrixTable) -> hl.MatrixTable:

This comment has been minimized.

Copy link
@danking

danking Mar 18, 2019

Collaborator

tests!

This comment has been minimized.

Copy link
@tpoterba

tpoterba Mar 18, 2019

Author Collaborator

.________.

This comment has been minimized.

Copy link
@catoverdrive

catoverdrive Mar 19, 2019

Collaborator

tests would be great. there's a lot of logic in this function. I think I'd be happy with a single test that had out-of-order col keys and some duplicate key values.

.when(hl.is_defined(s.right_indices),
s.right_indices.map(
lambda elt: hl.struct(k=s.k, left_index=hl.null('int32'), right_index=elt)))
.or_error('assertion error')))

This comment has been minimized.

Copy link
@danking

danking Mar 18, 2019

Collaborator

this seems a bit odd to me. if you have

   a  a
1 10 11
   a  a
1 20 21

What does key_indices contain? It looks to me like you'd have

[ {k=a, left_index=0, right_index=0}
, {k=a, left_index=0, right_index=1}
, {k=a, left_index=1, right_index=0}
, {k=a, left_index=1, right_index=1}
]

This seems more like cross product than join? I'd have expected you end up with as many columns as you had distinct keys in the input and left_entries is an array of the entries with that column key.

This comment has been minimized.

Copy link
@tpoterba

tpoterba Mar 18, 2019

Author Collaborator

This does do a cross product (isn't that what our Table joins do?)

This comment has been minimized.

Copy link
@catoverdrive

catoverdrive Mar 19, 2019

Collaborator

yep! All of our (non-distinct) table joins do an outer product over the set of rows corresponding to a given key.

import hail as hl


def full_outer_join_mt(left: hl.MatrixTable, right: hl.MatrixTable) -> hl.MatrixTable:

This comment has been minimized.

Copy link
@catoverdrive

catoverdrive Mar 19, 2019

Collaborator

tests would be great. there's a lot of logic in this function. I think I'd be happy with a single test that had out-of-order col keys and some duplicate key values.

@tpoterba tpoterba requested a review from catoverdrive Mar 20, 2019

@tpoterba tpoterba dismissed stale reviews from catoverdrive and danking Mar 20, 2019

added test

@@ -2933,16 +2934,24 @@ def zip_with_index(a):
>>> hl.eval(hl.zip_with_index(['A', 'B', 'C']))
[(0, 'A'), (1, 'B'), (2, 'C')]
>>> hl.eval(hl.zip_with_index(['A', 'B', 'C'], first=False))

This comment has been minimized.

Copy link
@catoverdrive

catoverdrive Mar 20, 2019

Collaborator

index_first

(doctests are failing)

@danking danking merged commit 65978fe into hail-is:master Mar 21, 2019

1 check passed

hail-ci-0-1 successful build
Details

@tpoterba tpoterba deleted the tpoterba:mt-full-outer-join-experimental branch Mar 21, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.