[Hail][feature] add outer option to union_cols#7475
Conversation
outer option to union_colsouter option to union_cols
hail/python/hail/matrixtable.py
Outdated
| @typecheck_method(other=matrix_table_type) | ||
| def union_cols(self, other: 'MatrixTable') -> 'MatrixTable': | ||
| @typecheck_method(other=matrix_table_type, _outer=bool) | ||
| def union_cols(self, other: 'MatrixTable', _outer=False) -> 'MatrixTable': |
There was a problem hiding this comment.
Just need a better test and some docs and we can make it public.
There was a problem hiding this comment.
Sure. I was just being cautious until we'd had more discussion about the interface. Do you like outer, or should it be more explicit like outer_join_rows?
There was a problem hiding this comment.
I like row_join_type or something that takes a string, with a default of inner.
That way we can add left/right if people want that (reasonable)
| mt = mt.key_rows_by(x = mt.row_idx // 2) | ||
| assert mt.union_cols(mt).count_rows() == 5 | ||
|
|
||
| def test_union_cols_outer(self): |
There was a problem hiding this comment.
can you add a test for correct entry joining?
outer option to union_colsouter option to union_cols
tpoterba
left a comment
There was a problem hiding this comment.
docs/typecheck fixes necessary.
Nice work!
hail/python/hail/matrixtable.py
Outdated
| datasets. | ||
| - With ``row_join_type=outer``, an outer join is perfomed on rows, so | ||
| that row keys which exist in only one input dataset are also included. | ||
| For those rows, the entrie fields for the columns coming from the |
hail/python/hail/matrixtable.py
Outdated
| @typecheck_method(other=matrix_table_type) | ||
| def union_cols(self, other: 'MatrixTable') -> 'MatrixTable': | ||
| @typecheck_method(other=matrix_table_type, | ||
| row_join_type=enumeration('inner', 'outer', 'left', 'right')) |
There was a problem hiding this comment.
you don't support left/right
There was a problem hiding this comment.
Eek, lazy copy/pasting
There was a problem hiding this comment.
that's what review is for!
hail/python/hail/matrixtable.py
Outdated
| from both input datasets. The set of rows included in the result is | ||
| determined by the `row_join_type` parameter. | ||
|
|
||
| - With the default ``row_join_type=inner``, an inner join is performed |
There was a problem hiding this comment.
We generally don't format the arg bit as code: should be something like:
with the default value of ``'inner'``
closes #7465