Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first and last methods to ArrayExpression #9474

Merged
merged 6 commits into from
Sep 18, 2020

Conversation

nawatts
Copy link
Contributor

@nawatts nawatts commented Sep 18, 2020

Add first method to ArrayExpression to replace head. Add last method as the counterpart to first.

>>> hl.eval(names.filter(lambda x: x.startswith('D')).tail())
None
"""
return hl.rbind(self, hl.len(self), lambda x, n: hl.or_missing(n > 0, x[n - 1]))
Copy link
Contributor Author

@nawatts nawatts Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blindly following the implementation of head with hl.rbind(self, .... Not sure why that is necessary.

# FIXME: this should generate short-circuiting IR when that is possible
return hl.rbind(self, lambda x: hl.case().when(x.length() > 0, x[0]).or_missing())

@johnc1231
Copy link
Contributor

I guess we already have head on arrays, but I think these would be better called first and last, given that Table.head takes an argument for how many elements to grab

@nawatts
Copy link
Contributor Author

nawatts commented Sep 18, 2020

Good point @johnc1231, it would be nice to have the same interface for head/tail on tables and arrays. Instead of adding tail, should we add first/last and deprecate head?

@johnc1231
Copy link
Contributor

We just had a chat about this in Dev: https://hail.zulipchat.com/#narrow/stream/123011-Hail-Dev/topic/head.2Ftail.20vs.20first.2Flast/near/210522900

To summarize, I think head shouldn't exist, neither should tail. Konrad likes the idea of having something that does what current head does, and I think a good name for that is first. last would correspond to that. I'm somewhat surprised people want this behavior often enough that hl.or_else(hl.len(x) > 0, x[0]) or hl.or_else(hl.len(x) > 0, x[-1]) is insufficient.

@johnc1231
Copy link
Contributor

We have a couple of deprecated things now, I think I'm going to go add a deprecation decorator so that people get warnings.

@nawatts nawatts changed the title Add tail method to ArrayExpression Add first and last methods to ArrayExpression Sep 18, 2020
@nawatts
Copy link
Contributor Author

nawatts commented Sep 18, 2020

Sounds good. Updated this PR to add first and last methods.

@nawatts
Copy link
Contributor Author

nawatts commented Sep 18, 2020

Also, if this addition is approved, could someone comment on whether the hl.rbind(self, ... used in head is necessary?

# FIXME: this should generate short-circuiting IR when that is possible
return hl.rbind(self, lambda x: hl.case().when(x.length() > 0, x[0]).or_missing())

@tpoterba
Copy link
Contributor

Also, if this addition is approved, could someone comment on whether the hl.rbind(self, ... used in head is necessary?

Should leave it for now, I think. Our common subexpression elimination pass would insert a binding, but it's not a bad idea to insert one explicitly in library code.

The comment in the head() implementation refers to the fact that in many cases, we don't need to bind the entire array (suppose it's range(1000000).map(lambda x: something_expensive(x)) if we're just going to use the first element.

@nawatts
Copy link
Contributor Author

nawatts commented Sep 18, 2020

Restored the binding. Not sure what to do about this:

The comment in the head() implementation refers to the fact that in many cases, we don't need to bind the entire array (suppose it's range(1000000).map(lambda x: something_expensive(x)) if we're just going to use the first element.

@tpoterba
Copy link
Contributor

yeah, we don't have a way to avoid constructing the full array right now

Copy link
Contributor

@johnc1231 johnc1231 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add a test to ensure last works. You'll find a method called test_array_head in test_expr.py. I'd change that to test_array_first, call first instead. Then I'd add test_array_last and make sure there's a test for empty and nonempty arrays. Thanks

>>> hl.eval(names.filter(lambda x: x.startswith('D')).first())
None
"""
return hl.rbind(self, lambda x: hl.or_missing(hl.len(x) > 0, x[0]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd copy the FIXME comment down here, since it's still something we should address. I'd also then change head to just call first. That the logic only exists once, and any tests that may use head will now be tests for first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (fa35a4a) and done (15bcc78)

@nawatts
Copy link
Contributor Author

nawatts commented Sep 18, 2020

You need to add a test to ensure last works. You'll find a method called test_array_head in test_expr.py. I'd change that to test_array_first, call first instead. Then I'd add test_array_last and make sure there's a test for empty and nonempty arrays. Thanks

Tests were included in e2fcfb6. Are there more checks that should be added to those?

@johnc1231
Copy link
Contributor

You're good on tests, I was looking at the wrong thing.

@danking danking merged commit beebb0d into hail-is:main Sep 18, 2020
@nawatts nawatts deleted the array-tail-method branch May 6, 2021 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants