ValueError: buffer source array is read-only #31710

kernc · 2020-02-05T16:47:31Z

Code Sample, a copy-pastable example if possible

>>> index = pd.date_range('2020', 'now', freq='1h') 
>>> arr = np.zeros_like(index) 
>>> arr.setflags(write=False)
>>> pd.Series(arr, index=index).resample('1d').agg('last')

-------------------------------------------------------------------------
~/pandas/pandas/_libs/groupby.pyx in pandas._libs.groupby.group_last()
~/pandas/pandas/_libs/groupby.cpython-37m-x86_64-linux-gnu.so in View.MemoryView.memoryview_cwrapper()
~/pandas/pandas/_libs/groupby.cpython-37m-x86_64-linux-gnu.so in View.MemoryView.memoryview.__cinit__()
ValueError: buffer source array is read-only

Problem description

Groupby fails on some read-only buffers (I couldn't quickly reproduce it with .groupby() itself, sorry).

The prime solution would be to add const specifier to the input values here (and related entries):

pandas/pandas/_libs/groupby.pyx

Line 853 in 0b6debf

rank_t[:, :] values,

if it were not for Cython's non-support of const fused types (cython/cython#1772), resolved in (cython/cython#3118), but despite miniscule change only scheduled for release in Cython 3.0. I guess wait until then.

Expected Output

Resampling/groupby works with read-only arrays.

Output of `pd.show_versions()`

pandas 1.1.0.dev0+361.gf0b00f887
cython 0.29.14

The text was updated successfully, but these errors were encountered:

PaddyAlton · 2020-02-27T18:17:39Z

I think I've found a related issue with isin (at least, it throws the same sort of error!). May give some more insight though, as it was tricky to reproduce:

# python 3.7, on a Windows machine, Anaconda distribution, Cython = v0.28.5, then v0.29.15
import numpy as np  # v1.16.4, then v1.18.1
import pandas as pd  # v1.0.1

arr = np.array([1,2,3], dtype=np.int64) # works fine if I don't set the dtype!

arr.setflags(write=False) # make it read-only

df = pd.DataFrame({"col": [2,4,8]})

test = df.col.isin(arr)

... and this then raises an exception as above:

...
pandas\_libs\hashtable_func_helper.pxi in pandas._libs.hashtable.ismember_int64()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\_libs\hashtable.cp37-win_amd64.pyd in View.MemoryView.memoryview_cwrapper()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\_libs\hashtable.cp37-win_amd64.pyd in View.MemoryView.memoryview.__cinit__()
ValueError: buffer source array is read-only

Interesting, huh?

FWIW I originally ran into this when using Google's BigQuery python client to retrieve data from a table with an array-type field. When retrieved via .to_dataframe() it came back as a column of np.array(dtype=int64)s with the OWNDATA and WRITEABLE flags set to False. The latter flag was the problem, although as stated above the data-type appears to matter (note that the stack trace refers to pandas._libs.hashtable.ismember_int64()).

TomAugspurger · 2020-03-06T17:52:31Z

@jbrockmendel you think this is essentially unfixable until we use cython>=0.30? If so, let's remove it from the 1.0.2 milestone.

jbrockmendel · 2020-03-06T19:05:16Z

you think this is essentially unfixable until we use cython>=0.30?

It's viable, just invasive and a hassle, whereas with cython 0.30 it should be a 1-liner. Depends on when we expect that to be available cc @scoder?

scoder · 2020-03-09T10:33:31Z

when we expect that to be available

This year. :)
I'm planning to release an "official alpha" in a couple of weeks, as soon as I catch up with the currently pending PRs, and as soon as it's in a state that allows users to prepare for it.

scoder · 2020-03-09T10:37:09Z

(Remember that the good thing about Cython is that you can control when and how the C code is generated, so you're rather free to use whatever released or unreleased Cython package you want, as long as you test it sufficiently on your side. That said, it's obviously easier to use a proper PyPI release, that's why I'd like to get at least something out soon.)

scoder · 2020-03-09T10:57:25Z

(Also, the "miniscule change" for this feature is only small in Cython's master/3.0 branch. It's not quite that small in a backport.)

jbrockmendel · 2020-03-09T17:37:47Z

@scoder thanks for the rundown. Big releases aren't easy, and we appreciate all the time and effort you put in.

@TomAugspurger I'm inclined to push this off to 1.0.3 as long as we have lower-hanging-fruit regressions to handle

TomAugspurger · 2020-03-09T18:19:45Z

Agreed.

jorisvandenbossche · 2020-03-09T20:29:21Z

Do we know what caused this regression? Because if this was supported before with cython < 0.30, why is it hard to support it now with that cython version?

jorisvandenbossche added Regression Functionality that used to work in a prior pandas version Resample resample method labels Feb 5, 2020

jorisvandenbossche added this to the 1.0.2 milestone Feb 5, 2020

jorisvandenbossche added the Groupby label Feb 5, 2020

kernc changed the title ~~Groupby fails on read-only buffers~~ ValueError: buffer source array is read-only Feb 27, 2020

jbrockmendel mentioned this issue Mar 4, 2020

RLS: 1.0.2 #32415

Closed

TomAugspurger removed this from the 1.0.2 milestone Mar 9, 2020

jbrockmendel mentioned this issue Mar 16, 2020

BUG: resample.agg with read-only data #32758

Merged

5 tasks

jreback added this to the 1.0.3 milestone Mar 17, 2020

jreback closed this as completed in #32758 Mar 17, 2020

erik-hasse mentioned this issue Apr 8, 2020

BUG: ValueError: buffer source array is read-only during groupby #33410

Closed

3 tasks

clarkzinzow mentioned this issue Aug 15, 2020

[dask-on-ray] ValueError on read-only memory ray-project/ray#10124

Open

2 tasks

dmitra79 mentioned this issue Oct 16, 2020

BUG: ValueError: buffer source array is read-only #37174

Closed

2 tasks

dsaxton mentioned this issue Oct 17, 2020

BUG: Fix isin with read-only target #37181

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: buffer source array is read-only #31710

ValueError: buffer source array is read-only #31710

kernc commented Feb 5, 2020 •

edited

Loading

PaddyAlton commented Feb 27, 2020 •

edited

Loading

TomAugspurger commented Mar 6, 2020

jbrockmendel commented Mar 6, 2020

scoder commented Mar 9, 2020

scoder commented Mar 9, 2020

scoder commented Mar 9, 2020

jbrockmendel commented Mar 9, 2020

TomAugspurger commented Mar 9, 2020

jorisvandenbossche commented Mar 9, 2020

ValueError: buffer source array is read-only #31710

ValueError: buffer source array is read-only #31710

Comments

kernc commented Feb 5, 2020 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

PaddyAlton commented Feb 27, 2020 • edited Loading

TomAugspurger commented Mar 6, 2020

jbrockmendel commented Mar 6, 2020

scoder commented Mar 9, 2020

scoder commented Mar 9, 2020

scoder commented Mar 9, 2020

jbrockmendel commented Mar 9, 2020

TomAugspurger commented Mar 9, 2020

jorisvandenbossche commented Mar 9, 2020

kernc commented Feb 5, 2020 •

edited

Loading

Output of `pd.show_versions()`

PaddyAlton commented Feb 27, 2020 •

edited

Loading