Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hail][feature] add outer option to union_cols #7475

Merged
merged 6 commits into from
Nov 11, 2019

Conversation

patrick-schultz
Copy link
Collaborator

closes #7465

@patrick-schultz patrick-schultz changed the title add outer option to union_cols [Hail] add outer option to union_cols Nov 6, 2019
@typecheck_method(other=matrix_table_type)
def union_cols(self, other: 'MatrixTable') -> 'MatrixTable':
@typecheck_method(other=matrix_table_type, _outer=bool)
def union_cols(self, other: 'MatrixTable', _outer=False) -> 'MatrixTable':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this public?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need a better test and some docs and we can make it public.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I was just being cautious until we'd had more discussion about the interface. Do you like outer, or should it be more explicit like outer_join_rows?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like row_join_type or something that takes a string, with a default of inner.

That way we can add left/right if people want that (reasonable)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me

@@ -530,6 +530,11 @@ def test_union_cols_distinct(self):
mt = mt.key_rows_by(x = mt.row_idx // 2)
assert mt.union_cols(mt).count_rows() == 5

def test_union_cols_outer(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test for correct entry joining?

Copy link
Contributor

@tpoterba tpoterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename parameter

@konradjk thoughts?

@tpoterba tpoterba changed the title [Hail] add outer option to union_cols [Hail][feature] add outer option to union_cols Nov 7, 2019
Copy link
Contributor

@tpoterba tpoterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/typecheck fixes necessary.

Nice work!

datasets.
- With ``row_join_type=outer``, an outer join is perfomed on rows, so
that row keys which exist in only one input dataset are also included.
For those rows, the entrie fields for the columns coming from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: entrie

@typecheck_method(other=matrix_table_type)
def union_cols(self, other: 'MatrixTable') -> 'MatrixTable':
@typecheck_method(other=matrix_table_type,
row_join_type=enumeration('inner', 'outer', 'left', 'right'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't support left/right

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eek, lazy copy/pasting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what review is for!

from both input datasets. The set of rows included in the result is
determined by the `row_join_type` parameter.

- With the default ``row_join_type=inner``, an inner join is performed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally don't format the arg bit as code: should be something like:

with the default value of ``'inner'``

@danking danking merged commit 48ec6bd into hail-is:master Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to outer join on rows in union_cols
3 participants