Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: buffer source array is read-only #31710

Closed
kernc opened this issue Feb 5, 2020 · 9 comments · Fixed by #32758
Closed

ValueError: buffer source array is read-only #31710

kernc opened this issue Feb 5, 2020 · 9 comments · Fixed by #32758
Labels
Groupby Regression Functionality that used to work in a prior pandas version Resample resample method
Milestone

Comments

@kernc
Copy link
Contributor

kernc commented Feb 5, 2020

Code Sample, a copy-pastable example if possible

>>> index = pd.date_range('2020', 'now', freq='1h') 
>>> arr = np.zeros_like(index) 
>>> arr.setflags(write=False)
>>> pd.Series(arr, index=index).resample('1d').agg('last')

-------------------------------------------------------------------------
~/pandas/pandas/_libs/groupby.pyx in pandas._libs.groupby.group_last()
~/pandas/pandas/_libs/groupby.cpython-37m-x86_64-linux-gnu.so in View.MemoryView.memoryview_cwrapper()
~/pandas/pandas/_libs/groupby.cpython-37m-x86_64-linux-gnu.so in View.MemoryView.memoryview.__cinit__()
ValueError: buffer source array is read-only

Problem description

Groupby fails on some read-only buffers (I couldn't quickly reproduce it with .groupby() itself, sorry).

The prime solution would be to add const specifier to the input values here (and related entries):

rank_t[:, :] values,

if it were not for Cython's non-support of const fused types (cython/cython#1772), resolved in (cython/cython#3118), but despite miniscule change only scheduled for release in Cython 3.0. I guess wait until then.

Expected Output

Resampling/groupby works with read-only arrays.

Output of pd.show_versions()

pandas 1.1.0.dev0+361.gf0b00f887
cython 0.29.14

@jorisvandenbossche jorisvandenbossche added Regression Functionality that used to work in a prior pandas version Resample resample method labels Feb 5, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.2 milestone Feb 5, 2020
@PaddyAlton
Copy link

PaddyAlton commented Feb 27, 2020

I think I've found a related issue with isin (at least, it throws the same sort of error!). May give some more insight though, as it was tricky to reproduce:

# python 3.7, on a Windows machine, Anaconda distribution, Cython = v0.28.5, then v0.29.15
import numpy as np  # v1.16.4, then v1.18.1
import pandas as pd  # v1.0.1

arr = np.array([1,2,3], dtype=np.int64) # works fine if I don't set the dtype!

arr.setflags(write=False) # make it read-only

df = pd.DataFrame({"col": [2,4,8]})

test = df.col.isin(arr)

... and this then raises an exception as above:

...
pandas\_libs\hashtable_func_helper.pxi in pandas._libs.hashtable.ismember_int64()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\_libs\hashtable.cp37-win_amd64.pyd in View.MemoryView.memoryview_cwrapper()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\_libs\hashtable.cp37-win_amd64.pyd in View.MemoryView.memoryview.__cinit__()
ValueError: buffer source array is read-only

Interesting, huh?

FWIW I originally ran into this when using Google's BigQuery python client to retrieve data from a table with an array-type field. When retrieved via .to_dataframe() it came back as a column of np.array(dtype=int64)s with the OWNDATA and WRITEABLE flags set to False. The latter flag was the problem, although as stated above the data-type appears to matter (note that the stack trace refers to pandas._libs.hashtable.ismember_int64()).

@kernc kernc changed the title Groupby fails on read-only buffers ValueError: buffer source array is read-only Feb 27, 2020
@TomAugspurger
Copy link
Contributor

@jbrockmendel you think this is essentially unfixable until we use cython>=0.30? If so, let's remove it from the 1.0.2 milestone.

@jbrockmendel
Copy link
Member

you think this is essentially unfixable until we use cython>=0.30?

It's viable, just invasive and a hassle, whereas with cython 0.30 it should be a 1-liner. Depends on when we expect that to be available cc @scoder?

@scoder
Copy link

scoder commented Mar 9, 2020

when we expect that to be available

This year. :)
I'm planning to release an "official alpha" in a couple of weeks, as soon as I catch up with the currently pending PRs, and as soon as it's in a state that allows users to prepare for it.

@scoder
Copy link

scoder commented Mar 9, 2020

(Remember that the good thing about Cython is that you can control when and how the C code is generated, so you're rather free to use whatever released or unreleased Cython package you want, as long as you test it sufficiently on your side. That said, it's obviously easier to use a proper PyPI release, that's why I'd like to get at least something out soon.)

@scoder
Copy link

scoder commented Mar 9, 2020

(Also, the "miniscule change" for this feature is only small in Cython's master/3.0 branch. It's not quite that small in a backport.)

@jbrockmendel
Copy link
Member

@scoder thanks for the rundown. Big releases aren't easy, and we appreciate all the time and effort you put in.

@TomAugspurger I'm inclined to push this off to 1.0.3 as long as we have lower-hanging-fruit regressions to handle

@TomAugspurger
Copy link
Contributor

Agreed.

@TomAugspurger TomAugspurger removed this from the 1.0.2 milestone Mar 9, 2020
@jorisvandenbossche
Copy link
Member

Do we know what caused this regression? Because if this was supported before with cython < 0.30, why is it hard to support it now with that cython version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Regression Functionality that used to work in a prior pandas version Resample resample method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants