Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[experimental] Add full_outer_join_mt experimental function. #5628

Merged
merged 4 commits into from Mar 21, 2019

Conversation

tpoterba
Copy link
Contributor

This was remarkably easy.

Parameters
----------
a : :class:`.ArrayExpression`
index_first: :obj:`bool`
If ``True``, the index is the first value of the element tuples. If
``False``, the index is tthe second value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tthe -> the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate the current generation of macbook keyboards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import hail as hl


def full_outer_join_mt(left: hl.MatrixTable, right: hl.MatrixTable) -> hl.MatrixTable:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.________.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests would be great. there's a lot of logic in this function. I think I'd be happy with a single test that had out-of-order col keys and some duplicate key values.

.when(hl.is_defined(s.right_indices),
s.right_indices.map(
lambda elt: hl.struct(k=s.k, left_index=hl.null('int32'), right_index=elt)))
.or_error('assertion error')))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a bit odd to me. if you have

   a  a
1 10 11
   a  a
1 20 21

What does key_indices contain? It looks to me like you'd have

[ {k=a, left_index=0, right_index=0}
, {k=a, left_index=0, right_index=1}
, {k=a, left_index=1, right_index=0}
, {k=a, left_index=1, right_index=1}
]

This seems more like cross product than join? I'd have expected you end up with as many columns as you had distinct keys in the input and left_entries is an array of the entries with that column key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does do a cross product (isn't that what our Table joins do?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! All of our (non-distinct) table joins do an outer product over the set of rows corresponding to a given key.

import hail as hl


def full_outer_join_mt(left: hl.MatrixTable, right: hl.MatrixTable) -> hl.MatrixTable:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests would be great. there's a lot of logic in this function. I think I'd be happy with a single test that had out-of-order col keys and some duplicate key values.

@tpoterba tpoterba dismissed stale reviews from catoverdrive and danking March 20, 2019 17:57

added test

@@ -2933,16 +2934,24 @@ def zip_with_index(a):
>>> hl.eval(hl.zip_with_index(['A', 'B', 'C']))
[(0, 'A'), (1, 'B'), (2, 'C')]

>>> hl.eval(hl.zip_with_index(['A', 'B', 'C'], first=False))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index_first

(doctests are failing)

@danking danking merged commit 65978fe into hail-is:master Mar 21, 2019
@tpoterba tpoterba deleted the mt-full-outer-join-experimental branch March 21, 2019 16:54
@tpoterba tpoterba restored the mt-full-outer-join-experimental branch November 7, 2019 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants