Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix aggregate exploding the output of Reduction ops that return a list/ndarray #2702

Merged
merged 4 commits into from
Mar 23, 2021

Conversation

timothydijamco
Copy link
Contributor

@timothydijamco timothydijamco commented Mar 23, 2021

Problem

On the Pandas backend, when .aggregate() is passed an aggregation/reduction that results in a list/ndarray, aggregate will automatically explode that list/ndarray into many rows (one row per element in the array). This is not the desired behavior. A list/ndarray coming from a Reduction op should actually be treated as a scalar (i.e., should result in a single row, not many rows).

Example code

## Setup Pandas backend
df = pd.DataFrame({
    'key': pd.Series([1, 1, 2]),
    'foo': pd.Series([100, 200, 300]),
})

ibis_pandas = ibis.pandas.connect({'test': df})

t = ibis_pandas.table('test')


## Create two Reduction UDFs: one that returns a list, and one that returns an ndarray
@udf.reduction(input_type=[dt.int32], output_type=dt.Array(dt.int32))
def to_list_udf(v):
    return list(v)

@udf.reduction(input_type=[dt.int32], output_type=dt.Array(dt.int32))
def to_np_array_udf(v):
    return np.array(v)


## Use UDFs in `aggregate`
ibis_pandas.execute(
    t.aggregate(
        foo_list=to_list_udf(t['foo']),
        foo_np_array=to_np_array_udf(t['foo']),
    )
)

Current output

foo_list foo_np_array
0 100 100
1 200 200
2 300 300

Desired output

foo_list foo_np_array
0 [100, 200, 300] [100 200 300]

Solution

This PR fixes the coerce_to_output function used by .aggregate() so that it treats lists or ndarrays from Reduction ops as scalars, and doesn't explode them into multiple rows.

@jreback jreback added the expressions Issues or PRs related to the expression API label Mar 23, 2021
@jreback jreback added this to the Next release milestone Mar 23, 2021
@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

looks fine, can you add this pr number to the original whatsnew note. ping on green.

@timothydijamco
Copy link
Contributor Author

@jreback added a release note, and CI is green!

@jreback jreback merged commit a835082 into ibis-project:master Mar 23, 2021
@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

thanks @timothydijamco

@cpcloud cpcloud removed this from the Next release milestone Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions Issues or PRs related to the expression API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants