Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataframe to_sql is inserting raw text into oracle database instead of actual text when utf-8 text is present #27177

Open
jonanem opened this issue Jul 2, 2019 · 0 comments
Labels
Bug IO SQL to_sql, read_sql, read_sql_query

Comments

@jonanem
Copy link

jonanem commented Jul 2, 2019

Problem description

When trying to insert data into oracle database through dataframe to_sql it is taking long time so i have made changes type(sqlalchemy types) of object to VARCHAR using below code

dtyp = {c:types.VARCHAR(df[c].str.len().max())
    for c in df.columns[df.dtypes == 'object'].tolist()}

df.to_sql(name='TEST_TABLE', con=engine, if_exists='append', index=False, chunksize=1000, dtype=dtyp)

Column of the dataframe contains text as utf-8, eg data is b'This is oracle'

b'This is oracle' is actual data present in the dataframe but the data is inserted in the oracle database is '415041432053656D69636F6E647563746F7' (example values) not the actual text 'This is oracle'. Seems like 'b' present before the text is causing the issue when converting it to VARCHAR if i trim out 'b' using decode UTF-8 i am getting UnicodeEncodeError: 'ascii' codec can't encode character error while inserting into database

Seems like to_sql having issue when utf-8 text is inserted as VARCHAR using dtypes

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.1.3.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.24.2

@jbrockmendel jbrockmendel added the IO SQL to_sql, read_sql, read_sql_query label Jul 23, 2019
@mroeschke mroeschke added the Bug label May 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

3 participants