New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: support for clip and quantile ops on DoubleColumns #1090

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
2 participants
@jreback
Contributor

jreback commented Aug 5, 2017

No description provided.

@jreback jreback added the enhancement label Aug 5, 2017

(methodcaller('quantile', 0.5), methodcaller('quantile', 0.5)),
]
)
def test_arrayfunctions(t, df, ibis_func, pandas_func):

This comment has been minimized.

@jreback

jreback Aug 5, 2017

Contributor

@cpcloud where do you testsfor exceptions that bubble up from the impl (e.g. an out-of-range for quantile) for instance.?

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

I would just have a separate test that only contains failures (using with pytest.raises). Easier to debug that way.

@cpcloud cpcloud self-requested a review Aug 5, 2017

@cpcloud cpcloud added this to the 0.11.3 milestone Aug 5, 2017

@@ -349,6 +349,17 @@ def output_type(self):
return rules.shape_like(arg, 'double')
class Clip(ValueOp):
input_type = [number(name='lower', allow_boolean=False, optional=True),
number(name='upper', allow_boolean=False, optional=True)]

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

What is the behavior when neither is passed?

This comment has been minimized.

@jreback

jreback Aug 5, 2017

Contributor

just returns the original. not great. I could raise if not at least 1 is passed.

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

Yeah, probably should raise during execution since we don't have a rule to express "at least one argument" that I'm aware of.

input_type = [number(name='lower', allow_boolean=False, optional=True),
number(name='upper', allow_boolean=False, optional=True)]
def output_type(self):

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

The output type here should be the same as the input type. You can use rules.type_of_arg(0) to get the output type of the first input argument. You should also be able to just assign that directly to output_type like this:

class FooNode(...):
    output_type = rules.type_of_arg(0)

This comment has been minimized.

@jreback

jreback Aug 5, 2017

Contributor

done

@@ -170,6 +170,20 @@ def execute_series_natural_log(op, data, scope=None):
return np.log(data)
@execute_node.register(
ops.Clip, pd.Series, (float, int, type(None)), (float, int, type(None))

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

For int use integer_types which includes numpy integers and six.integer_types. If that's not imported you can import it from ibis/pandas/core.py.

Also, does Series.clip allow Series as input? Do we want to allow other table columns as inputs?

This comment has been minimized.

@jreback

jreback Aug 5, 2017

Contributor

it does, will add

@execute_node.register(
ops.Quantile, pd.Series, float
)
def execute_series_quantile(op, data, q, scopy=None):

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

Misspelling here: s/scopy/scope/

This comment has been minimized.

@jreback

jreback Aug 5, 2017

Contributor

done

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

Should we validate that 0 <= q <= 1 here?

(methodcaller('quantile', 0.5), methodcaller('quantile', 0.5)),
]
)
def test_arrayfunctions(t, df, ibis_func, pandas_func):

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

Maybe have two tests here? I know there's a few examples where we dump a bunch into one, but these are distinct enough that it probably makes sense to separate them.

@@ -696,6 +707,12 @@ class Mean(Reduction):
output_type = rules.scalar_output(_mean_output_type)
class Quantile(Reduction):
input_type = [number(name='q', allow_boolean=False)]

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

Maybe give this a slightly longer name :)

This comment has been minimized.

@jreback

jreback Aug 5, 2017

Contributor

done

-------
clipped : type depending on input
Decimal values: yield decimal
Other numeric values: yield integer (int32)

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

I don't think you need to enumerate any types here. Something like "The type of the returned column is the same as type of the input" or something similar.

This comment has been minimized.

@jreback

jreback Aug 5, 2017

Contributor

sure

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 5, 2017

pushed, but still have to handle this

In [3]: pd.Series([1, 2, 3]).quantile(0.5)
Out[3]: 2.0

In [4]: pd.Series([1, 2, 3]).quantile([0.5, 0.75])
Out[4]: 
0.50    2.0
0.75    2.5
dtype: float64

input type is easy, and output type is a the type of inputscalar column, But we don't have an Index (e.g. to annotate the quantiles). How do you handle this? (return a Table)?

]
)
def test_arraylike_functions_transforms(t, df, ibis_func, pandas_func):
if isinstance(pandas_func, Exception):

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

Why do you need this? It looks like every pandas_func is a lambda.

This comment has been minimized.

@jreback

jreback Aug 5, 2017

Contributor

changed in folllowing push

input_type = [value,
number(name='quantile', allow_boolean=False),
string(name='interpolation', optional=True)]

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

I think we want to use rules.string_options(list_of_possible_interpolation_values) so that we can limit these before executing.

input_type = [value,
number(name='quantile', allow_boolean=False),
string(name='interpolation', optional=True)]
output_type = rules.scalar_output(_array_reduced_type)

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

This is correct for multiple quantiles, but not for a single quantile. In this case the output_type is actually the same as the second argument: T -> T or array<T> -> array<T>. rules.type_of_arg(1) should cover this case.

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

I don't remember how to spell the correct type for the input though, which is "any scalar or array of scalars".

class Quantile(Reduction):
input_type = [value,
number(name='quantile', allow_boolean=False),

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

We don't have a nice way of describing "numeric scalar or array of numeric scalars". I'll put up a PR to do this now.

This comment has been minimized.

@cpcloud

cpcloud Aug 6, 2017

Member

This is actually pretty tricky because of the distinction between a generic list of values (whose elements could each be of a different type) and an array which is a list of values all with the same type.

I'll make an issue about this to refactor this part of the code.

In the meantime, you can work around this by creating an additional node type, maybe MultipleQuantile and then in the def quantile function try to construct one and if that fails validation construct the other.

This comment has been minimized.

@cpcloud
@execute_node.register(
ops.Quantile, pd.Series, float,

This comment has been minimized.

@jreback

jreback Aug 5, 2017

Contributor

I repeated this w/o the optional arg as I couldn' get rules.string_options to be optional....?

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

Did you try dispatching on six.string_types + (type(None),)?

This comment has been minimized.

@cpcloud

cpcloud Aug 7, 2017

Member

You should be able to cover this in one function by dispatching on six.string_types + (type(None),) for the interpolation parameter.

rules.string_options(
['linear', 'lower', 'higher',
'midpoint', 'nearest'],
name='interpolation', optional=True)]

This comment has been minimized.

@jreback

jreback Aug 5, 2017

Contributor

e.g. here I think the optional is being ignored

This comment has been minimized.

@cpcloud

cpcloud Aug 5, 2017

Member

Hm, could be a bug. Looking into it.

@cpcloud

This comment has been minimized.

Member

cpcloud commented Aug 6, 2017

@jreback let's handle the array of quantiles case in a follow up. So for now we only allow a scalar double as the quantile argument. There's some other things to fix before we can properly handle the array case.

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 6, 2017

sgtm

will fix up the doc string and add some tests

@jreback jreback force-pushed the jreback:funcs branch from d9c9196 to 84e8801 Aug 6, 2017

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 6, 2017

pushed

@@ -7,6 +7,17 @@ Release Notes
interesting. Point (minor, e.g. 0.5.1) releases will generally not be found
here and contain only bug fixes.
0.12.0 (???)

This comment has been minimized.

@jreback

jreback Aug 6, 2017

Contributor

leave out? change to 0.11.3?

This comment has been minimized.

@cpcloud

cpcloud Aug 7, 2017

Member

Yep, this should be 0.11.3

0.12.0 (???)
------------
This release brings initial Pandas backend support along with a number of

This comment has been minimized.

@cpcloud

cpcloud Aug 7, 2017

Member

Maybe just leave the section blank for now.

Parameters
----------
quantile : float, default 0.5 (50% quantile)

This comment has been minimized.

@cpcloud

cpcloud Aug 7, 2017

Member

Doesn't look like there's a default, so should remove this text.

class Quantile(Reduction):
input_type = [value,
number(name='quantile', allow_boolean=False),

This comment has been minimized.

@cpcloud

cpcloud Aug 7, 2017

Member

This should be rules.double()

This comment has been minimized.

@jreback

jreback Aug 7, 2017

Contributor

technically this should work on all numeric (and datetimelikes) actually

This comment has been minimized.

@cpcloud

cpcloud Aug 7, 2017

Member

Oh interesting re datetime.

@cpcloud

This comment has been minimized.

Member

cpcloud commented Aug 7, 2017

@jreback few more small things. other than that this looks good to go.

@jreback jreback force-pushed the jreback:funcs branch from 84e8801 to 9544739 Aug 7, 2017

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 7, 2017

@cpcloud if you'd have a look. I added default='linear' for Quantile. but still looking for original signature (and now that I have deleted it, the testsfail :>)

@jreback jreback force-pushed the jreback:funcs branch from 2111a42 to aba7544 Aug 7, 2017

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 7, 2017

pushed with updates

@cpcloud

cpcloud approved these changes Aug 7, 2017

@cpcloud

This comment has been minimized.

Member

cpcloud commented Aug 7, 2017

I'll merge on green

@jreback

This comment has been minimized.

Contributor

jreback commented Aug 7, 2017

green now

@cpcloud cpcloud closed this in 6388baf Aug 7, 2017

@cpcloud

This comment has been minimized.

Member

cpcloud commented Aug 7, 2017

thanks @jreback !

jreback added a commit to jreback/ibis that referenced this pull request Aug 8, 2017

jreback added a commit to jreback/ibis that referenced this pull request Aug 8, 2017

cpcloud added a commit that referenced this pull request Aug 9, 2017

support for passing multiple quantiles in .quantile()
xref #1090

Author: Jeff Reback <jeff@reback.net>

Closes #1094 from jreback/multiple and squashes the following commits:

0e235cc [Jeff Reback] allow quantile to accept int input specify output type as double if integer input
2337321 [Jeff Reback] support for passing multiple quantiles in .quantile()

@jreback jreback referenced this pull request Sep 15, 2017

Closed

support for .quantile() #1152

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment