Msgpack - ValueError: buffer source array is read-only #11880

Closed
ikilledthecat opened this Issue Dec 21, 2015 · 5 comments

Comments

Projects
None yet
4 participants
Contributor

ikilledthecat commented Dec 21, 2015

I get the Value error when processing data using pandas. I followed the following steps:

  1. convert to msgpack format with compress flag
  2. subsequently read file into a dataframe
  3. push to sql table with to_sql

On the third step i get ValueError: buffer source array is read-only.

This problem does not arise if I wrap the read_msgpack call inside a pandas.concat

Example

import pandas as pd
import numpy as np

from sqlalchemy import create_engine

eng = create_engine("sqlite:///:memory:")

df1 = pd.DataFrame({ 'A' : 1.,
                                   'B' : pd.Timestamp('20130102'),
                                   'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                                   'D' : np.array([3] * 4,dtype='int32'),
                                   'E' : 'foo' })

df1.to_msgpack('test.msgpack', compress='zlib')
df2 = pd.read_msgpack('test.msgpack')

df2.to_sql('test', eng, if_exists='append', chunksize=1000) # throws value error

df2 = pd.cooncat([pd.read_msgpack('test.msgpack')])

df2.to_sql('test', eng, if_exists='append', chunksize=1000) # works

This happens with both blosc and zlib compression. While I have found a solution, this behaviour seems very odd and for very large files there is a small performance hit.

edit: @TomAugspurger changed the sql engine to sqlite

Contributor

jreback commented Dec 21, 2015

pls pd.show_versions()

Contributor

TomAugspurger commented Dec 21, 2015

replace eng = create_engine("mysql+mysqldb://user:pass@localhost/dbname") with eng = create_engine("sqlite:///:memory:") to make this easier to reproduce (still raises)

Contributor

ikilledthecat commented Dec 22, 2015

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-21-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_IN

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 18.2
Cython: 0.23.4
numpy: 1.10.2
scipy: 0.16.1
statsmodels: None
IPython: 4.0.1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: 1.2.8
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.10
pymysql: None
psycopg2: None
Jinja2: None

jreback added this to the Next Major Release milestone Dec 26, 2015

Contributor

jreback commented Dec 26, 2015

I think we need to tell numpy to take ownership of the data, maybe np.array with copy=False around the np.frombuffer. @shoyer how does one normally do this?

In [9]: df2._data.blocks[0].values.flags
Out[9]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : False
  UPDATEIFCOPY : False

In [10]: df1._data.blocks[0].values.flags
Out[10]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
Contributor

kawochen commented Jan 11, 2016

df2['E'].a exhibits the bug as well.

@jreback jreback modified the milestone: 0.18.0, Next Major Release Jan 11, 2016

jreback closed this in #12013 Jan 15, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment