New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trying to use to_sql on dataframe with numpy.ndarray columns causes ValueError #29240
Comments
I found the solution from this link Essentially, np.ndarray needs something to convert the array in to something that sqlite can actually handle as it has no array datatype, and apparently something goes wrong when trying to buffer the BLOB'd array directly. So, instead, convert the whole mess to text using numpy's save functionality. Taking the code from the above link and attaching it to my (now working) code from above. import sqlalchemy
import sqlite3
import numpy as np
import io
def adapt_array(arr):
"""
http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
"""
out = io.BytesIO()
np.save(out, arr)
out.seek(0)
return sqlite3.Binary(out.read())
def convert_array(text):
out = io.BytesIO(text)
out.seek(0)
return np.load(out)
# Converts np.array to TEXT when inserting
sqlite3.register_adapter(np.ndarray, adapt_array)
# Converts TEXT to np.array when selecting
sqlite3.register_converter("array", convert_array)
engine = sqlalchemy.create_engine('sqlite:///:memory:',connect_args={'detect_types':sqlite3.PARSE_DECLTYPES}) This feels like something that should maybe be default behavior for Panda's underlying communication with sqlalchemy (or maybe sqlalchemy's communication with sqlite3)? |
Just to add on to my particular issue here - I am finding that reading back does not seem to work properly if everything is done as I put above. I can save to the database, but trying to read back I just get binary data. I confirmed that the converter does not fire when the column is read back. The reason for this is that it appears engine = sqlalchemy.create_engine('sqlite:///:memory:',connect_args={'detect_types':sqlite3.PARSE_DECLTYPES},echo=True)
con = engine.connect()
con.execute("create table dftest (\"index\" BIGINT, name TEXT, start FLOAT, \"end\" FLOAT, channel1 array, channel2 array, channel3 array, channel4 array)") # NOTE: 'index' column needs to be included for the Pandas functions for SQLite to work After this the pandas functions can be used normally as long as the This may end up being an issue for the folks working on sqlalchemy rather than Pandas it appears, though. |
Okay, another update for anyone who stumbles across this in the future with a similar problem. It seems to be that this is not (easily/obviously) possible when using the sqlalchemy engine since something seems to get lost on its way in/out of there. I can get a custom type (as outlined here) On the other hand, I have gotten this working using some tweaks on what I've posted above. Basically, the code in my earlier replies seems to work as long as an sqlite3 connection is passed in instead of an sqlalchemy engine. I've left the (now commented out) parts of what I tried using import pandas as pd
pd.__version__
import os
import sys
# import sqlalchemy
# import sqlalchemy.types as types
# from sqlalchemy.ext.compiler import compiles
import sqlite3
import numpy as np
import io
#
# class arraytodb(types.TypeDecorator):
# impl = types.Text # This is needed to indicate what "base type" we're using
# def process_bind_param(self, value, dialect):
# print("Binding")
# return "a"
# # out = io.BytesIO()
# # np.save(out, value)
# # out.seek(0)
# # return sqlite3.Binary(out.read())
#
# def process_result_value(self, value, dialect):
# print("Process result")
# print(value)
# print(dialect)
# return "b"
# # return value
#
# @compiles(arraytodb)
# def arraycompiler(type_, compiler, **kw):
# return "arraytodb"
def adapt_array(arr):
"""
http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
"""
out = io.BytesIO()
np.save(out, arr)
out.seek(0)
return sqlite3.Binary(out.read())
def convert_array(text):
out = io.BytesIO(text)
out.seek(0)
return np.load(out)
# Converts np.array to TEXT when inserting
sqlite3.register_adapter(np.ndarray, adapt_array)
# Converts TEXT to np.array when selecting
sqlite3.register_converter("arraytodb", convert_array)
con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
# engine = sqlalchemy.create_engine('sqlite:///:memory:', connect_args={'detect_types':sqlite3.PARSE_DECLTYPES},echo=True)
allPts = []
for i in range(0,30):
name = "R"+str(i).zfill(3)
start = np.random.rand()
end = np.random.rand()
data = np.random.random((3000,4))
thisDict = {'name':str(name),'start':start,'end':end,'channel1':data[:,0],'channel2':data[:,1],'channel3':data[:,2],'channel4':data[:,3]}
allPts.append(pd.Series(thisDict))
asDF = pd.DataFrame(allPts)
asDF.to_sql('dftest',con,if_exists='replace',dtype={'channel1':"arraytodb",'channel2':"arraytodb",'channel3':"arraytodb",'channel4':"arraytodb"})
print(pd.read_sql('SELECT * from dftest',con)) |
I am trying out using pandas+sqlalchemy (specifically sqlite) as a means to store my data. In my use case, I will have a number of rows where most will be obvious types such as string, int, float, etc - but I will want to store a few numpy ndarrays. In the example below, each 'channel' entry is a 1x3000 array of floats. For higher dimensional arrays (at most 2D) I plan to store the shape of the data alongside the flattened array, so this is all I need.
Code Sample, a copy-pastable example if possible
This was run in Jupyter Lab.
This produces the traceback
Problem description
I notice that the bottom of the traceback specifies the error started in sqlalchemy - though for all my googling I can't find any record of this problem (or error message) anywhere either with sqlalchemy or pandas. I'm assuming this means I probably am doing something wrong, but I'm not seeing it when looking at docs etc. Storing a numpy array as a binary blob makes sense to me, so I feel like the type inference makes sense.
Expected Output
Writing to the database/no error. :)
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 11, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.1
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.10
tables : 3.6.0
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
The text was updated successfully, but these errors were encountered: