Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Allow group_by and order_by column as window input in pandas backend #2252

Merged

Conversation

icexelloss
Copy link
Contributor

@icexelloss icexelloss commented Jun 16, 2020

What change is proposed?

This PR fixes a bug where group_by and order_by column cannot be used as window operation input columns. It used to throw an exception.

# This throws an exception before the patch

w = ibis.trailing_window(
    order_by="time",
    group_by="tid",
    preceding = 1000,
)
table['time'].count().over(w).execute()

Exception:

../generic.py:460: in execute_reduction_series_groupby
    return aggcontext.agg(data, type(op).__name__.lower())
../../aggcontext.py:489: in agg
    indexed_by_ordering = frame[columns].set_index(order_by)
/Users/icexelloss/miniconda3/envs/ibis-dev/lib/python3.7/site-packages/pandas/core/frame.py:4351: in set_index
    index = ensure_index_from_sequences(arrays, names)
/Users/icexelloss/miniconda3/envs/ibis-dev/lib/python3.7/site-packages/pandas/core/indexes/base.py:5288: in ensure_index_from_sequences
    return Index(sequences[0], name=names)
/Users/icexelloss/miniconda3/envs/ibis-dev/lib/python3.7/site-packages/pandas/core/indexes/base.py:390: in __new__
    return Int64Index(data, copy=copy, dtype=dtype, name=name)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = <class 'pandas.core.indexes.numeric.Int64Index'>
data = array([[1, 1],
       [3, 3],
       [2, 2]]), dtype = None, copy = False
name = 'plain_int64'

    def __new__(cls, data=None, dtype=None, copy=False, name=None):
        cls._validate_dtype(dtype)

	# Coerce to ndarray if not already ndarray or Index
        if not isinstance(data, (np.ndarray, Index)):
            if is_scalar(data):
                raise cls._scalar_data_error(data)

            # other iterable of some kind
            if not isinstance(data, (ABCSeries, list, tuple)):
                data = list(data)

            data = np.asarray(data, dtype=dtype)

	if issubclass(data.dtype.type, str):
            cls._string_data_error(data)

	if copy or not is_dtype_equal(data.dtype, cls._default_dtype):
            subarr = np.array(data, dtype=cls._default_dtype, copy=copy)
            cls._assert_safe_casting(data, subarr)
        else:
            subarr = data

	if subarr.ndim > 1:
            # GH#13601, GH#20285, GH#27125
>           raise ValueError("Index data must be 1-dimension

This PR fixes this issue.

How is this tested

test_window_on_and_by_key_as_window_input is added

@icexelloss icexelloss changed the title Pandas window on and by column BUG: Allow group_by and order_by column as window input in pandas backend Jun 16, 2020
@icexelloss icexelloss added pandas The pandas backend bug Incorrect behavior inside of ibis labels Jun 16, 2020
@icexelloss icexelloss requested a review from jreback June 16, 2020 19:02
@icexelloss icexelloss force-pushed the pandas-window-on-and-by-column branch from a57c03d to 66ed6fe Compare June 16, 2020 19:07
@jreback jreback added this to the Next Bugfix Release milestone Jun 18, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ping on green.

@jreback jreback merged commit d3c6ed4 into ibis-project:master Jun 18, 2020
@jreback
Copy link
Contributor

jreback commented Jun 18, 2020

thanks @icexelloss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis pandas The pandas backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants