-
Notifications
You must be signed in to change notification settings - Fork 251
[hail][feature] Super array expression #5913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if "Cannot use 'collection.foo' or" not in str(err): | ||
raise | ||
raise TypeError( | ||
f"Cannot use 'collection.foo' or 'collection['foo']' on " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you actually want foo
in here? Could you use item
, or does the recursion mess with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sure yeah item
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Should we support |
@patrick-schultz I am one step ahead of you sir: https://github.com/hail-is/hail/pull/5913/files#diff-8a90eaa51bf133eaef5d7f12c0d4decfR2736 If you pass a python string, I use the array projection approach, if you pass anything else I try to convert it to an integer. |
Every time I see this PR I get giddy. This is so in line with the Hail I want. |
etype = self.dtype.element_type | ||
if isinstance(etype, hl.tstruct): | ||
return hl.struct( | ||
**{k: self.map(lambda x: x[k]) for k in etype}).__getitem__(item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this bit. Shouldn't this be map(lambda x: x[item])
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also can't use hl.struct
here -- that can change missingness patterns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design comment - We use specialized expression classes for expression types that have unique behavior. Shouldn't we use that model here? A NestedStructCollectionExpression type of thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This converts from the row format to the column format, then it performs the get item. I suppose it can be directly written as:
return self.map(lambda x: x[item])
Conceptually though, I think of myself as operating on the column formatted data. I'll change to self.map
.
**{k: self.map(lambda x: x[k]) for k in etype}).__getitem__(item) | ||
try: | ||
if isinstance(etype, (hl.tarray, hl.tset)): | ||
return self.map(lambda x: x.__getitem__(item)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using __getitem__
in this way is bad style -- use x[item]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few last comments
if isinstance(type, tndarray) and is_numeric(type.element_type): | ||
return NDArrayNumericExpression(ir, type, indices, aggregations) | ||
elif type in scalars: | ||
if type in scalars: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change? it was clearer before, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, for consistency. I'd prefer to change the others to elif
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pylint doesn't like the else
elif
s if not necessary. at first I hated it but now I prefer it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have a PEP reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not in PEP-8, which quietly demonstrates both styles in an unrelated recommendation. In pylint it's called no-else-return
. We can add this to our standard pylint rules if you prefer this style. At first I hated it. Now I've come to prefer the uniformity of if
and the lack of indentation on the else case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose if all of hail is written using the else-return style we should stay consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, I think standard C++ style is also to never put an else clause on an if statement that returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with either, but do have a preference for elif
in this case. I'm fine with going with style guidelines, though.
I think I have somewhat mixed preferences, too. I'd prefer:
def foo(*args):
if take_other_path:
return blah(*args)
...
lots of code
...
return other_thing
to
def foo(*args):
if take_other_path:
return blah(*args)
else:
lots of code
...
return other_thing
However, in code with several short conditionals, I prefer
def foo(a, b, c, ..., z):
if a < 0:
return a
elif b < 0:
return b
...
else:
return z
to
def foo(a, b, c, ..., z):
if a < 0:
return a
if b < 0:
return b
...
return z
, since if you forget one of the "return" keywords, you get a piece of code that's much harder to debug.
@@ -420,12 +420,13 @@ def __getitem__(self, item): | |||
""" | |||
if isinstance(item, slice): | |||
return self._slice(self.dtype, item.start, item.stop, item.step) | |||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can revert this change, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah but python style prefers no else
or elif
when each branch returns and it felt weird to move the else to my new if.
""" | ||
|
||
if isinstance(item, str): | ||
return self.map(lambda x: x[item]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, super clean.
this is super |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
request clarification on PEP style w.r.t. returns
a3e5540
to
f8e9624
Compare
@tpoterba @patrick-schultz Ok, I switched to elif style and made fixes so that the surrounding code also followed the elif style. I'd like to merge this and push off any further discussions to another PR. IMO, the no-else-return style is almost always nicer when I want to:
|
if isinstance(etype, hl.tstruct): | ||
return ArrayStructExpression(ir, type, indices, aggregations) | ||
else: | ||
raise NotImplementedError(type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these not implemented errors are wrong. You want these to get to line 3440 below to go through typ_to_expr.
with this in mind, the no-elif model is way nicer since you don't need to duplicate that logic!
e52474d
to
5928e94
Compare
589b14b
to
d15c85a
Compare
First step towards treating
CollectionExpression
s exactly like non-distributed tables. Works for arrays and sets.cc: @konradjk