Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for passing multiple quantiles in .quantile() #1094

Closed
wants to merge 2 commits into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Aug 8, 2017

xref #1090

@jreback
Copy link
Contributor Author

jreback commented Aug 8, 2017

@cpcloud currently failing like:

(pandas) bash-3.2$ pytest ibis/pandas/tests/test_operations.py  --lf --pdb 
========================================================================================================== test session starts ===========================================================================================================
platform darwin -- Python 3.6.2, pytest-3.1.3, py-1.4.34, pluggy-0.4.0
run-last-failure: rerun last 3 failures
rootdir: /Users/jreback/ibis, inifile:
plugins: cov-2.3.1, xdist-1.17.1
collected 580 items 

ibis/pandas/tests/test_operations.py F
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

t = DatabaseTable[table]
  name: df
  schema:
    datetime_strings : string
    decimal : decimal(4, 3)
    dup_ints : int...uble
    plain_int64 : int64
    plain_strings : string
    strings_with_nulls : string
    strings_with_space : string
df =           datetime_strings decimal  dup_ints dup_strings float64_as_strings  \
0  2017-01-02 01:02:03.234     1.0     ...             a                     
1               None               abab  
ibis_func = <function <lambda> at 0x10e855d90>, pandas_func = <function <lambda> at 0x10e855e18>

    @pytest.mark.parametrize(
        ('ibis_func', 'pandas_func'),
        [
            (lambda x: x.clip(lower=0), lambda x: x.clip(lower=0)),
            (lambda x: x.clip(lower=0.0), lambda x: x.clip(lower=0.0)),
            (lambda x: x.clip(upper=0), lambda x: x.clip(upper=0)),
            (lambda x: x.clip(lower=x - 1, upper=x + 1),
             lambda x: x.clip(lower=x - 1, upper=x + 1)),
            (lambda x: x.clip(lower=0, upper=1),
             lambda x: x.clip(lower=0, upper=1)),
            (lambda x: x.clip(lower=0, upper=1.0),
             lambda x: x.clip(lower=0, upper=1.0)),
            (lambda x: x.quantile([0.25, 0.75]),
             lambda x: x.quantile([0.25, 0.75])),
        ]
    )
    def test_arraylike_functions_transforms(t, df, ibis_func, pandas_func):
>       result = ibis_func(t.float64_with_zeros).execute()

ibis/pandas/tests/test_operations.py:934: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
ibis/expr/types.py:204: in execute
    return execute(self, limit=limit, async=async)
ibis/client.py:318: in execute
    return backend.execute(expr, limit=limit, async=async)
ibis/pandas/client.py:86: in execute
    return execute(query)
../miniconda3/envs/pandas/lib/python3.6/site-packages/multipledispatch/dispatcher.py:164: in __call__
    return func(*args, **kwargs)
ibis/pandas/core.py:85: in execute_without_scope
    return execute(expr, scope)
../miniconda3/envs/pandas/lib/python3.6/site-packages/multipledispatch/dispatcher.py:164: in __call__
    return func(*args, **kwargs)
ibis/pandas/core.py:74: in execute_with_scope
    for arg in args if isinstance(arg, _VALID_INPUT_TYPES)
ibis/pandas/core.py:74: in <listcomp>
    for arg in args if isinstance(arg, _VALID_INPUT_TYPES)
../miniconda3/envs/pandas/lib/python3.6/site-packages/multipledispatch/dispatcher.py:164: in __call__
    return func(*args, **kwargs)
ibis/pandas/core.py:75: in execute_with_scope
    ] or [scope.get(arg, arg) for arg in args]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

.0 = <tuple_iterator object at 0x10eb49e48>

>   ] or [scope.get(arg, arg) for arg in args]
E   TypeError: unhashable type: 'list'

ibis/pandas/core.py:75: TypeError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /Users/jreback/ibis/ibis/pandas/core.py(75)<listcomp>()
-> ] or [scope.get(arg, arg) for arg in args]

class MultiQuantile(Reduction):

input_type = [value,
rules.collection(name='quantile'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want rules.array(dt.double) here (dt.double for now, we'll extend as needed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collection here means "column or table". I think for now we want to be very strict about the inputs and we can expand support for different concrete inputs (like Series) as needed.

)
def execute_series_quantile_with_interpolation(
op, data, quantile, scope=None):
return data.quantile(q=quantile, interpolation=op.interpolation)


@execute_node.register(
ops.MultiQuantile, pd.Series, ir.ValueList,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueList will need to be list.

tm.assert_series_equal(result, expected)
result = ibis_func(t.float64_with_zeros).execute()
expected = pandas_func(df.float64_with_zeros)
tm.assert_series_equal(result, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll return a list here since this is a reduction to a single array-scalar so assert is probably good enough.

ibis/expr/api.py Outdated
op = _ops.Quantile(arg, quantile, interpolation)
try:
op = _ops.Quantile(arg, quantile, interpolation)
except _com.IbisTypeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I originally suggested this, but I'm not actually sure whether IbisTypeError can be thrown for any other reason here than quantile being the wrong type.

Probably better to just check isinstance(quantile, collections.Sequence). I think it's more readable that way, and expresses the type checking intent more clearly.

ibis/expr/api.py Outdated
@@ -1043,7 +1043,7 @@ def quantile(arg, quantile, interpolation='linear'):

Parameters
----------
quantile : float, default 0.5 (50% quantile)
quantile : float or array-like, default 0.5 (50% quantile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove the 0.5 since we're not going to default to anything here.

@@ -180,13 +180,21 @@ def execute_series_clip(op, data, lower, upper, scope=None):


@execute_node.register(
ops.Quantile, pd.Series, float,
ops.Quantile, pd.Series, float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to just do (float, list) here so you don't have to duplicate anything.

@cpcloud
Copy link
Member

cpcloud commented Aug 8, 2017

@jreback regarding the failure, try adding list to this line https://github.com/ibis-project/ibis/blob/master/ibis/pandas/core.py#L61

@jreback
Copy link
Contributor Author

jreback commented Aug 8, 2017

ok, after inferring, getting this:

ibis/pandas/tests/test_operations.py:934: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
ibis/pandas/tests/test_operations.py:929: in <lambda>
    (lambda x: x.quantile([0.25, 0.75]),
ibis/expr/api.py:1069: in quantile
    op = _ops.MultiQuantile(arg, quantile, interpolation)
ibis/expr/types.py:419: in __init__
    super(ValueOp, self).__init__(args)
ibis/expr/types.py:292: in __init__
    self.args = self._validate_args(args)
ibis/expr/types.py:298: in _validate_args
    return self.input_type.validate(args)
ibis/expr/rules.py:330: in validate
    return self._validate(args, self.types)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <class 'ibis.expr.rules.TypeSignature'>
    arg 0: ValueTyped([<class 'ibis.expr.types.ValueExpr'>])
    arg 1: ValueTyped([Array(double)])
    arg 2: <ibis.expr.rules.StringOptions object at 0x1167bb400>
args = (ref_0
DatabaseTable[table]
  name: df
  schema:
    datetime_strings : string
    decimal : decimal(4, 3)
    dup_int..._space : string

float64_with_zeros = Column[double*] 'float64_with_zeros' from table
  ref_0, Array(double), 'linear')
types = [ValueTyped([<class 'ibis.expr.types.ValueExpr'>]), ValueTyped([Array(double)]), <ibis.expr.rules.StringOptions object at 0x1167bb400>]

    def _validate(self, args, types):
        clean_args = list(args)
        for i, validator in enumerate(types):
            try:
                clean_args[i] = validator.validate(clean_args, i)
            except IbisTypeError as e:
                exc = e.args[0]
                msg = ('Argument {0}: {1}'.format(i, exc) +
                       '\nArgument was: {0}'.format(ir._safe_repr(args[i])))
>               raise IbisTypeError(msg)
E               ibis.common.IbisTypeError: Argument 1: array<double>
E               Argument was: Array(double)

ibis/expr/rules.py:341: IbisTypeError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /Users/jreback/ibis/ibis/expr/rules.py(341)_validate()

@jreback jreback self-assigned this Aug 8, 2017
@cpcloud
Copy link
Member

cpcloud commented Aug 8, 2017

@jreback rebase and try on top of master

ibis/expr/api.py Outdated
"""
op = _ops.Quantile(arg, quantile, interpolation)
if isinstance(quantile, collections.Sequence):
quantile = infer_literal_type(quantile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just need to remove this line and your tests pass.

@jreback jreback added the feature Features or general enhancements label Aug 8, 2017
@jreback jreback added this to the 0.11.3 milestone Aug 8, 2017
@jreback
Copy link
Contributor Author

jreback commented Aug 8, 2017

pushed, should be all green. now returning a list for MultiQuantile.

side issue. test_operations.py getting big, maybe should split to multiple files.

@jreback
Copy link
Contributor Author

jreback commented Aug 8, 2017

@cpcloud green.

@@ -720,6 +720,17 @@ class Quantile(Reduction):
output_type = rules.type_of_arg(1)
Copy link
Member

@cpcloud cpcloud Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think this needs to be:

def output_type(self):
    first_arg = self.args[0]
    first_arg_type = first_arg.type()
    if isinstance(first_arg_type, dt.Integer):
        result_type = dt.double
    else:
        result_type = first_arg_type
    return result_type.scalar_type()

because the result will be a double if the input is an integer and it will take on the type of the first argument otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think the number(name='quantile') should be double(name='quantile') since we only allow doubles there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, though no test on the output (IIRC we are not actually checking via ibis anywhere, only as a test result).

Also I think the number(name='quantile') should be double(name='quantile')

we allow float & int input. (e.g. 0 and 1 are valid).

@cpcloud
Copy link
Member

cpcloud commented Aug 9, 2017

@jreback Yeah, test_operations.py is getting a little big. I'll see about splitting it up.

specify output type as double if integer input
@cpcloud cpcloud added the pandas The pandas backend label Aug 9, 2017
@cpcloud cpcloud closed this in fe2a4d8 Aug 9, 2017
@cpcloud
Copy link
Member

cpcloud commented Aug 9, 2017

thanks @jreback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements pandas The pandas backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants