-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: support for clip and quantile ops on DoubleColumns #1090
Conversation
ibis/pandas/tests/test_operations.py
Outdated
| (methodcaller('quantile', 0.5), methodcaller('quantile', 0.5)), | ||
| ] | ||
| ) | ||
| def test_arrayfunctions(t, df, ibis_func, pandas_func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpcloud where do you testsfor exceptions that bubble up from the impl (e.g. an out-of-range for quantile) for instance.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just have a separate test that only contains failures (using with pytest.raises). Easier to debug that way.
ibis/expr/operations.py
Outdated
| @@ -349,6 +349,17 @@ def output_type(self): | |||
| return rules.shape_like(arg, 'double') | |||
|
|
|||
|
|
|||
| class Clip(ValueOp): | |||
| input_type = [number(name='lower', allow_boolean=False, optional=True), | |||
| number(name='upper', allow_boolean=False, optional=True)] | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the behavior when neither is passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just returns the original. not great. I could raise if not at least 1 is passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, probably should raise during execution since we don't have a rule to express "at least one argument" that I'm aware of.
ibis/expr/operations.py
Outdated
| input_type = [number(name='lower', allow_boolean=False, optional=True), | ||
| number(name='upper', allow_boolean=False, optional=True)] | ||
|
|
||
| def output_type(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output type here should be the same as the input type. You can use rules.type_of_arg(0) to get the output type of the first input argument. You should also be able to just assign that directly to output_type like this:
class FooNode(...):
output_type = rules.type_of_arg(0)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ibis/pandas/execution.py
Outdated
| @@ -170,6 +170,20 @@ def execute_series_natural_log(op, data, scope=None): | |||
| return np.log(data) | |||
|
|
|||
|
|
|||
| @execute_node.register( | |||
| ops.Clip, pd.Series, (float, int, type(None)), (float, int, type(None)) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For int use integer_types which includes numpy integers and six.integer_types. If that's not imported you can import it from ibis/pandas/core.py.
Also, does Series.clip allow Series as input? Do we want to allow other table columns as inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does, will add
ibis/pandas/execution.py
Outdated
| @execute_node.register( | ||
| ops.Quantile, pd.Series, float | ||
| ) | ||
| def execute_series_quantile(op, data, q, scopy=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misspelling here: s/scopy/scope/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we validate that 0 <= q <= 1 here?
ibis/pandas/tests/test_operations.py
Outdated
| (methodcaller('quantile', 0.5), methodcaller('quantile', 0.5)), | ||
| ] | ||
| ) | ||
| def test_arrayfunctions(t, df, ibis_func, pandas_func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have two tests here? I know there's a few examples where we dump a bunch into one, but these are distinct enough that it probably makes sense to separate them.
ibis/expr/operations.py
Outdated
| @@ -696,6 +707,12 @@ class Mean(Reduction): | |||
| output_type = rules.scalar_output(_mean_output_type) | |||
|
|
|||
|
|
|||
| class Quantile(Reduction): | |||
|
|
|||
| input_type = [number(name='q', allow_boolean=False)] | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe give this a slightly longer name :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ibis/expr/api.py
Outdated
| ------- | ||
| clipped : type depending on input | ||
| Decimal values: yield decimal | ||
| Other numeric values: yield integer (int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to enumerate any types here. Something like "The type of the returned column is the same as type of the input" or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
|
pushed, but still have to handle this input type is easy, and output type is a the type of inputscalar column, But we don't have an Index (e.g. to annotate the quantiles). How do you handle this? (return a Table)? |
| ] | ||
| ) | ||
| def test_arraylike_functions_transforms(t, df, ibis_func, pandas_func): | ||
| if isinstance(pandas_func, Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this? It looks like every pandas_func is a lambda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed in folllowing push
ibis/expr/operations.py
Outdated
|
|
||
| input_type = [value, | ||
| number(name='quantile', allow_boolean=False), | ||
| string(name='interpolation', optional=True)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to use rules.string_options(list_of_possible_interpolation_values) so that we can limit these before executing.
ibis/expr/operations.py
Outdated
| input_type = [value, | ||
| number(name='quantile', allow_boolean=False), | ||
| string(name='interpolation', optional=True)] | ||
| output_type = rules.scalar_output(_array_reduced_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct for multiple quantiles, but not for a single quantile. In this case the output_type is actually the same as the second argument: T -> T or array<T> -> array<T>. rules.type_of_arg(1) should cover this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember how to spell the correct type for the input though, which is "any scalar or array of scalars".
| class Quantile(Reduction): | ||
|
|
||
| input_type = [value, | ||
| number(name='quantile', allow_boolean=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a nice way of describing "numeric scalar or array of numeric scalars". I'll put up a PR to do this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually pretty tricky because of the distinction between a generic list of values (whose elements could each be of a different type) and an array which is a list of values all with the same type.
I'll make an issue about this to refactor this part of the code.
In the meantime, you can work around this by creating an additional node type, maybe MultipleQuantile and then in the def quantile function try to construct one and if that fails validation construct the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
|
|
||
| @execute_node.register( | ||
| ops.Quantile, pd.Series, float, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I repeated this w/o the optional arg as I couldn' get rules.string_options to be optional....?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try dispatching on six.string_types + (type(None),)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to cover this in one function by dispatching on six.string_types + (type(None),) for the interpolation parameter.
ibis/expr/operations.py
Outdated
| rules.string_options( | ||
| ['linear', 'lower', 'higher', | ||
| 'midpoint', 'nearest'], | ||
| name='interpolation', optional=True)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. here I think the optional is being ignored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, could be a bug. Looking into it.
|
@jreback let's handle the array of quantiles case in a follow up. So for now we only allow a scalar double as the quantile argument. There's some other things to fix before we can properly handle the array case. |
|
sgtm will fix up the doc string and add some tests |
|
pushed |
docs/source/release.rst
Outdated
| @@ -7,6 +7,17 @@ Release Notes | |||
| interesting. Point (minor, e.g. 0.5.1) releases will generally not be found | |||
| here and contain only bug fixes. | |||
|
|
|||
| 0.12.0 (???) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave out? change to 0.11.3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this should be 0.11.3
docs/source/release.rst
Outdated
| 0.12.0 (???) | ||
| ------------ | ||
|
|
||
| This release brings initial Pandas backend support along with a number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just leave the section blank for now.
|
|
||
| Parameters | ||
| ---------- | ||
| quantile : float, default 0.5 (50% quantile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like there's a default, so should remove this text.
| class Quantile(Reduction): | ||
|
|
||
| input_type = [value, | ||
| number(name='quantile', allow_boolean=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be rules.double()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically this should work on all numeric (and datetimelikes) actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting re datetime.
|
@jreback few more small things. other than that this looks good to go. |
|
@cpcloud if you'd have a look. I added |
|
pushed with updates |
|
I'll merge on green |
|
green now |
|
thanks @jreback ! |
No description provided.