Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: support for clip and quantile ops on DoubleColumns #1090

Closed
wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Aug 5, 2017

No description provided.

@jreback jreback added the feature Features or general enhancements label Aug 5, 2017
(methodcaller('quantile', 0.5), methodcaller('quantile', 0.5)),
]
)
def test_arrayfunctions(t, df, ibis_func, pandas_func):
Copy link
Contributor Author

@jreback jreback Aug 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpcloud where do you testsfor exceptions that bubble up from the impl (e.g. an out-of-range for quantile) for instance.?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just have a separate test that only contains failures (using with pytest.raises). Easier to debug that way.

@cpcloud cpcloud self-requested a review August 5, 2017 21:04
@cpcloud cpcloud added this to the 0.11.3 milestone Aug 5, 2017
@@ -349,6 +349,17 @@ def output_type(self):
return rules.shape_like(arg, 'double')


class Clip(ValueOp):
input_type = [number(name='lower', allow_boolean=False, optional=True),
number(name='upper', allow_boolean=False, optional=True)]
Copy link
Member

@cpcloud cpcloud Aug 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior when neither is passed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just returns the original. not great. I could raise if not at least 1 is passed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably should raise during execution since we don't have a rule to express "at least one argument" that I'm aware of.

input_type = [number(name='lower', allow_boolean=False, optional=True),
number(name='upper', allow_boolean=False, optional=True)]

def output_type(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output type here should be the same as the input type. You can use rules.type_of_arg(0) to get the output type of the first input argument. You should also be able to just assign that directly to output_type like this:

class FooNode(...):
    output_type = rules.type_of_arg(0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -170,6 +170,20 @@ def execute_series_natural_log(op, data, scope=None):
return np.log(data)


@execute_node.register(
ops.Clip, pd.Series, (float, int, type(None)), (float, int, type(None))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For int use integer_types which includes numpy integers and six.integer_types. If that's not imported you can import it from ibis/pandas/core.py.

Also, does Series.clip allow Series as input? Do we want to allow other table columns as inputs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does, will add

@execute_node.register(
ops.Quantile, pd.Series, float
)
def execute_series_quantile(op, data, q, scopy=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misspelling here: s/scopy/scope/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we validate that 0 <= q <= 1 here?

(methodcaller('quantile', 0.5), methodcaller('quantile', 0.5)),
]
)
def test_arrayfunctions(t, df, ibis_func, pandas_func):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have two tests here? I know there's a few examples where we dump a bunch into one, but these are distinct enough that it probably makes sense to separate them.

@@ -696,6 +707,12 @@ class Mean(Reduction):
output_type = rules.scalar_output(_mean_output_type)


class Quantile(Reduction):

input_type = [number(name='q', allow_boolean=False)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give this a slightly longer name :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cpcloud cpcloud added expressions Issues or PRs related to the expression API pandas The pandas backend labels Aug 5, 2017
ibis/expr/api.py Outdated
-------
clipped : type depending on input
Decimal values: yield decimal
Other numeric values: yield integer (int32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to enumerate any types here. Something like "The type of the returned column is the same as type of the input" or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@jreback
Copy link
Contributor Author

jreback commented Aug 5, 2017

pushed, but still have to handle this

In [3]: pd.Series([1, 2, 3]).quantile(0.5)
Out[3]: 2.0

In [4]: pd.Series([1, 2, 3]).quantile([0.5, 0.75])
Out[4]: 
0.50    2.0
0.75    2.5
dtype: float64

input type is easy, and output type is a the type of inputscalar column, But we don't have an Index (e.g. to annotate the quantiles). How do you handle this? (return a Table)?

]
)
def test_arraylike_functions_transforms(t, df, ibis_func, pandas_func):
if isinstance(pandas_func, Exception):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? It looks like every pandas_func is a lambda.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed in folllowing push


input_type = [value,
number(name='quantile', allow_boolean=False),
string(name='interpolation', optional=True)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to use rules.string_options(list_of_possible_interpolation_values) so that we can limit these before executing.

input_type = [value,
number(name='quantile', allow_boolean=False),
string(name='interpolation', optional=True)]
output_type = rules.scalar_output(_array_reduced_type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct for multiple quantiles, but not for a single quantile. In this case the output_type is actually the same as the second argument: T -> T or array<T> -> array<T>. rules.type_of_arg(1) should cover this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember how to spell the correct type for the input though, which is "any scalar or array of scalars".

class Quantile(Reduction):

input_type = [value,
number(name='quantile', allow_boolean=False),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a nice way of describing "numeric scalar or array of numeric scalars". I'll put up a PR to do this now.

Copy link
Member

@cpcloud cpcloud Aug 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually pretty tricky because of the distinction between a generic list of values (whose elements could each be of a different type) and an array which is a list of values all with the same type.

I'll make an issue about this to refactor this part of the code.

In the meantime, you can work around this by creating an additional node type, maybe MultipleQuantile and then in the def quantile function try to construct one and if that fails validation construct the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



@execute_node.register(
ops.Quantile, pd.Series, float,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I repeated this w/o the optional arg as I couldn' get rules.string_options to be optional....?

Copy link
Member

@cpcloud cpcloud Aug 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try dispatching on six.string_types + (type(None),)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to cover this in one function by dispatching on six.string_types + (type(None),) for the interpolation parameter.

rules.string_options(
['linear', 'lower', 'higher',
'midpoint', 'nearest'],
name='interpolation', optional=True)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. here I think the optional is being ignored

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, could be a bug. Looking into it.

@cpcloud
Copy link
Member

cpcloud commented Aug 6, 2017

@jreback let's handle the array of quantiles case in a follow up. So for now we only allow a scalar double as the quantile argument. There's some other things to fix before we can properly handle the array case.

@jreback
Copy link
Contributor Author

jreback commented Aug 6, 2017

sgtm

will fix up the doc string and add some tests

@jreback
Copy link
Contributor Author

jreback commented Aug 6, 2017

pushed

@@ -7,6 +7,17 @@ Release Notes
interesting. Point (minor, e.g. 0.5.1) releases will generally not be found
here and contain only bug fixes.

0.12.0 (???)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave out? change to 0.11.3?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this should be 0.11.3

0.12.0 (???)
------------

This release brings initial Pandas backend support along with a number of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just leave the section blank for now.


Parameters
----------
quantile : float, default 0.5 (50% quantile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like there's a default, so should remove this text.

class Quantile(Reduction):

input_type = [value,
number(name='quantile', allow_boolean=False),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be rules.double()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically this should work on all numeric (and datetimelikes) actually

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting re datetime.

@cpcloud
Copy link
Member

cpcloud commented Aug 7, 2017

@jreback few more small things. other than that this looks good to go.

@jreback
Copy link
Contributor Author

jreback commented Aug 7, 2017

@cpcloud if you'd have a look. I added default='linear' for Quantile. but still looking for original signature (and now that I have deleted it, the testsfail :>)

@jreback
Copy link
Contributor Author

jreback commented Aug 7, 2017

pushed with updates

@cpcloud
Copy link
Member

cpcloud commented Aug 7, 2017

I'll merge on green

@jreback
Copy link
Contributor Author

jreback commented Aug 7, 2017

green now

@cpcloud cpcloud closed this in 6388baf Aug 7, 2017
@cpcloud
Copy link
Member

cpcloud commented Aug 7, 2017

thanks @jreback !

jreback added a commit to jreback/ibis that referenced this pull request Aug 8, 2017
jreback added a commit to jreback/ibis that referenced this pull request Aug 8, 2017
cpcloud pushed a commit that referenced this pull request Aug 9, 2017
xref #1090

Author: Jeff Reback <jeff@reback.net>

Closes #1094 from jreback/multiple and squashes the following commits:

0e235cc [Jeff Reback] allow quantile to accept int input specify output type as double if integer input
2337321 [Jeff Reback] support for passing multiple quantiles in .quantile()
@jreback jreback mentioned this pull request Sep 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions Issues or PRs related to the expression API feature Features or general enhancements pandas The pandas backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants