New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[experimental] Add full_outer_join_mt experimental function. #5628
[experimental] Add full_outer_join_mt experimental function. #5628
Conversation
hail/python/hail/expr/functions.py
Outdated
Parameters | ||
---------- | ||
a : :class:`.ArrayExpression` | ||
index_first: :obj:`bool` | ||
If ``True``, the index is the first value of the element tuples. If | ||
``False``, the index is tthe second value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tthe -> the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate the current generation of macbook keyboards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
import hail as hl | ||
|
||
|
||
def full_outer_join_mt(left: hl.MatrixTable, right: hl.MatrixTable) -> hl.MatrixTable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.________.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests would be great. there's a lot of logic in this function. I think I'd be happy with a single test that had out-of-order col keys and some duplicate key values.
.when(hl.is_defined(s.right_indices), | ||
s.right_indices.map( | ||
lambda elt: hl.struct(k=s.k, left_index=hl.null('int32'), right_index=elt))) | ||
.or_error('assertion error'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a bit odd to me. if you have
a a
1 10 11
a a
1 20 21
What does key_indices
contain? It looks to me like you'd have
[ {k=a, left_index=0, right_index=0}
, {k=a, left_index=0, right_index=1}
, {k=a, left_index=1, right_index=0}
, {k=a, left_index=1, right_index=1}
]
This seems more like cross product than join? I'd have expected you end up with as many columns as you had distinct keys in the input and left_entries
is an array of the entries with that column key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does do a cross product (isn't that what our Table joins do?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! All of our (non-distinct) table joins do an outer product over the set of rows corresponding to a given key.
import hail as hl | ||
|
||
|
||
def full_outer_join_mt(left: hl.MatrixTable, right: hl.MatrixTable) -> hl.MatrixTable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests would be great. there's a lot of logic in this function. I think I'd be happy with a single test that had out-of-order col keys and some duplicate key values.
hail/python/hail/expr/functions.py
Outdated
@@ -2933,16 +2934,24 @@ def zip_with_index(a): | |||
>>> hl.eval(hl.zip_with_index(['A', 'B', 'C'])) | |||
[(0, 'A'), (1, 'B'), (2, 'C')] | |||
|
|||
>>> hl.eval(hl.zip_with_index(['A', 'B', 'C'], first=False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index_first
(doctests are failing)
This was remarkably easy.