Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[experimental] Add full_outer_join_mt experimental function. #5628

Merged
merged 4 commits into from Mar 21, 2019

Conversation

@tpoterba
Copy link
Collaborator

@tpoterba tpoterba commented Mar 18, 2019

This was remarkably easy.

Parameters
----------
a : :class:`.ArrayExpression`
index_first: :obj:`bool`
If ``True``, the index is the first value of the element tuples. If
``False``, the index is tthe second value.
Copy link
Collaborator

@danking danking Mar 18, 2019

tthe -> the

Copy link
Collaborator Author

@tpoterba tpoterba Mar 18, 2019

I hate the current generation of macbook keyboards.

Copy link
Collaborator Author

@tpoterba tpoterba Mar 18, 2019

done

import hail as hl


def full_outer_join_mt(left: hl.MatrixTable, right: hl.MatrixTable) -> hl.MatrixTable:
Copy link
Collaborator

@danking danking Mar 18, 2019

tests!

Copy link
Collaborator Author

@tpoterba tpoterba Mar 18, 2019

.________.

Copy link
Contributor

@catoverdrive catoverdrive Mar 19, 2019

tests would be great. there's a lot of logic in this function. I think I'd be happy with a single test that had out-of-order col keys and some duplicate key values.

.when(hl.is_defined(s.right_indices),
s.right_indices.map(
lambda elt: hl.struct(k=s.k, left_index=hl.null('int32'), right_index=elt)))
.or_error('assertion error')))
Copy link
Collaborator

@danking danking Mar 18, 2019

this seems a bit odd to me. if you have

   a  a
1 10 11
   a  a
1 20 21

What does key_indices contain? It looks to me like you'd have

[ {k=a, left_index=0, right_index=0}
, {k=a, left_index=0, right_index=1}
, {k=a, left_index=1, right_index=0}
, {k=a, left_index=1, right_index=1}
]

This seems more like cross product than join? I'd have expected you end up with as many columns as you had distinct keys in the input and left_entries is an array of the entries with that column key.

Copy link
Collaborator Author

@tpoterba tpoterba Mar 18, 2019

This does do a cross product (isn't that what our Table joins do?)

Copy link
Contributor

@catoverdrive catoverdrive Mar 19, 2019

yep! All of our (non-distinct) table joins do an outer product over the set of rows corresponding to a given key.

import hail as hl


def full_outer_join_mt(left: hl.MatrixTable, right: hl.MatrixTable) -> hl.MatrixTable:
Copy link
Contributor

@catoverdrive catoverdrive Mar 19, 2019

tests would be great. there's a lot of logic in this function. I think I'd be happy with a single test that had out-of-order col keys and some duplicate key values.

@tpoterba tpoterba requested a review from catoverdrive Mar 20, 2019
@tpoterba tpoterba dismissed stale reviews from catoverdrive and danking Mar 20, 2019

added test

@@ -2933,16 +2934,24 @@ def zip_with_index(a):
>>> hl.eval(hl.zip_with_index(['A', 'B', 'C']))
[(0, 'A'), (1, 'B'), (2, 'C')]
>>> hl.eval(hl.zip_with_index(['A', 'B', 'C'], first=False))
Copy link
Contributor

@catoverdrive catoverdrive Mar 20, 2019

index_first

(doctests are failing)

@danking danking merged commit 65978fe into hail-is:master Mar 21, 2019
1 check passed
@tpoterba tpoterba deleted the mt-full-outer-join-experimental branch Mar 21, 2019
@tpoterba tpoterba restored the mt-full-outer-join-experimental branch Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants