Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hail][feature] add `outer` option to `union_cols` #7475

Merged
merged 6 commits into from Nov 11, 2019

Conversation

@patrick-schultz
Copy link
Collaborator

patrick-schultz commented Nov 6, 2019

closes #7465

@patrick-schultz patrick-schultz changed the title add `outer` option to `union_cols` [Hail] add `outer` option to `union_cols` Nov 6, 2019
@typecheck_method(other=matrix_table_type)
def union_cols(self, other: 'MatrixTable') -> 'MatrixTable':
@typecheck_method(other=matrix_table_type, _outer=bool)
def union_cols(self, other: 'MatrixTable', _outer=False) -> 'MatrixTable':

This comment has been minimized.

Copy link
@tpoterba

tpoterba Nov 6, 2019

Collaborator

why not make this public?

This comment has been minimized.

Copy link
@tpoterba

tpoterba Nov 6, 2019

Collaborator

Just need a better test and some docs and we can make it public.

This comment has been minimized.

Copy link
@patrick-schultz

patrick-schultz Nov 7, 2019

Author Collaborator

Sure. I was just being cautious until we'd had more discussion about the interface. Do you like outer, or should it be more explicit like outer_join_rows?

This comment has been minimized.

Copy link
@tpoterba

tpoterba Nov 7, 2019

Collaborator

I like row_join_type or something that takes a string, with a default of inner.

That way we can add left/right if people want that (reasonable)

This comment has been minimized.

Copy link
@patrick-schultz

patrick-schultz Nov 8, 2019

Author Collaborator

Seems good to me

@@ -530,6 +530,11 @@ def test_union_cols_distinct(self):
mt = mt.key_rows_by(x = mt.row_idx // 2)
assert mt.union_cols(mt).count_rows() == 5

def test_union_cols_outer(self):

This comment has been minimized.

Copy link
@tpoterba

tpoterba Nov 6, 2019

Collaborator

can you add a test for correct entry joining?

Copy link
Collaborator

tpoterba left a comment

rename parameter

@konradjk thoughts?

@tpoterba tpoterba changed the title [Hail] add `outer` option to `union_cols` [Hail][feature] add `outer` option to `union_cols` Nov 7, 2019
Copy link
Collaborator

tpoterba left a comment

docs/typecheck fixes necessary.

Nice work!

datasets.
- With ``row_join_type=outer``, an outer join is perfomed on rows, so
that row keys which exist in only one input dataset are also included.
For those rows, the entrie fields for the columns coming from the

This comment has been minimized.

Copy link
@tpoterba

tpoterba Nov 8, 2019

Collaborator

typo: entrie

@typecheck_method(other=matrix_table_type)
def union_cols(self, other: 'MatrixTable') -> 'MatrixTable':
@typecheck_method(other=matrix_table_type,
row_join_type=enumeration('inner', 'outer', 'left', 'right'))

This comment has been minimized.

Copy link
@tpoterba

tpoterba Nov 8, 2019

Collaborator

you don't support left/right

This comment has been minimized.

Copy link
@patrick-schultz

patrick-schultz Nov 8, 2019

Author Collaborator

Eek, lazy copy/pasting

This comment has been minimized.

Copy link
@tpoterba

tpoterba Nov 8, 2019

Collaborator

that's what review is for!

from both input datasets. The set of rows included in the result is
determined by the `row_join_type` parameter.
- With the default ``row_join_type=inner``, an inner join is performed

This comment has been minimized.

Copy link
@tpoterba

tpoterba Nov 8, 2019

Collaborator

We generally don't format the arg bit as code: should be something like:

with the default value of ``'inner'``
@danking danking merged commit 48ec6bd into hail-is:master Nov 11, 2019
1 check passed
1 check passed
ci-test success
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.