Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: groupby and agg on read-only array gives ValueError: buffer source array is read-only #36061

Merged
merged 2 commits into from
Sep 4, 2020

Conversation

jeet-parekh
Copy link
Contributor

I found multiple aggregate functions that were failing for a read-only array. I applied the change suggested in #36014 to all of them.

A couple of questions.

  1. I had to add the Py_ssize_t cdefs to suppress the error below for len(values) != len(labels). Let me know if there is another preferred way of doing it.
error: comparison of integer expressions of different signedness: ‘Py_ssize_t’ {aka ‘long int’} and ‘size_t’ {aka ‘long unsigned int’} [-Werror=sign-compare]
  1. I see some functions don't have the len(values) != len(labels) check. Should it be there in all the functions?

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Groupby labels Sep 2, 2020
@jreback jreback added this to the 1.1.2 milestone Sep 2, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, can you add a note in 1.1.2 fixed regression section.

also you are referencing 2 issues. can you add tests that are in these issues.

pandas/_libs/groupby.pyx Outdated Show resolved Hide resolved
pandas/_libs/groupby.pyx Outdated Show resolved Hide resolved
pandas/tests/groupby/aggregate/test_cython.py Outdated Show resolved Hide resolved

if len(values) != len(labels):
if len_values != len_labels:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think elsewhere we get around this be using values.shape[0] instead of len. IIRC this is something we wont need to do anymore in cython3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are indeed a few cases of

    # TODO(cython 3.0):
    # Instead of `labels.shape[0]` use `len(labels)`
    if not len(values) == labels.shape[0]:

in this file. But so if we need to update it anyways after cython 3.0, it doesn't maybe matter that much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think ok for now

@jbrockmendel
Copy link
Member

LGTM pending Jeff's test request

@jeet-parekh
Copy link
Contributor Author

jeet-parekh commented Sep 2, 2020

@jreback,

also you are referencing 2 issues. can you add tests that are in these issues.

Tests for #35436 and #34857? Those issues are not fixed by this PR. Even if I add tests, they'd fail. Unless you mean something else which I am missing.

I tried @jbrockmendel's suggestion. if values.shape[0] != labels.shape[0] works. What do we want to use? Py_ssize_t cdefs, or .shape[0]?

Also, should the len(values) != len(labels) check be there in all functions?

@jreback
Copy link
Contributor

jreback commented Sep 2, 2020

Tests for #35436 and #34857? Those issues are not fixed by this PR. Even if I add tests, they'd fail. Unless you mean something else which I am missing

hmm thought i saw this, ok no prob then.


if len(values) != len(labels):
if len_values != len_labels:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think ok for now

@jreback jreback merged commit b80dcbc into pandas-dev:master Sep 4, 2020
@jreback
Copy link
Contributor

jreback commented Sep 4, 2020

thanks @jeet-parekh

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Sep 4, 2020
… gives ValueError: buffer source array is read-only
jreback pushed a commit that referenced this pull request Sep 4, 2020
…ueError: buffer source array is read-only (#36117)

Co-authored-by: Jeet Parekh <12874561+jeet-parekh@users.noreply.github.com>
@jeet-parekh jeet-parekh deleted the 36014 branch September 5, 2020 08:26
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: groupby and agg on read-only array gives ValueError: buffer source array is read-only
4 participants