Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame to_sql fails with percentage symbol % in column name with SQLAlchemy and PostgreSQL #11896

Open
LeeMendelowitz opened this issue Dec 24, 2015 · 4 comments
Labels
Bug IO SQL to_sql, read_sql, read_sql_query

Comments

@LeeMendelowitz
Copy link

The following call to to_sql fails due to the percentage (%) symbol in the column names in pandas 0.17.1.

import pandas as pd
from sqlalchemy import create_engine

connect_str = "mysql://{USER}:{PASSWORD}@{HOST}/{DBNAME}".format(
    USER = 'root',
    PASSWORD = '',
    HOST = '127.0.0.1',
    DBNAME = 'test'
)

engine = create_engine(connect_str, echo = False)


# Create a dataframe with '%' in column name
df = pd.DataFrame()
df['A%'] = [0.1, 0.2, 0.3]
df['B%'] = [0.1, 0.2, 0.3]

# Save to database
df.to_sql('test_table', engine, if_exists = 'replace', index = False)

# Throws error:
# OperationalError: (_mysql_exceptions.OperationalError) (1054, "Unknown column 'A%%' in 'field list'") [SQL: u'INSERT INTO test_table (`index`, `A%%`, `B%%`) VALUES (%s, %s, %s)'] [parameters: ((0, 0.1, 0.1), (1, 0.2, 0.2), (2, 0.3, 0.3))]

Backtrace:

Traceback (most recent call last):
  File "test_pandas_error.py", line 20, in <module>
    df.to_sql('test_table', engine, if_exists = 'replace', index = False)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/generic.py", line 1003, in to_sql
    dtype=dtype)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/sql.py", line 569, in to_sql
    chunksize=chunksize, dtype=dtype)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/sql.py", line 1241, in to_sql
    table.insert(chunksize)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/sql.py", line 765, in insert
    self._execute_insert(conn, keys, chunk_iter)
  File "/usr/local/lib/python2.7/site-packages/pandas/io/sql.py", line 740, in _execute_insert
    conn.execute(self.insert_statement(), data)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1116, in _execute_context
    context)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/mysqldb.py", line 95, in do_executemany
    rowcount = cursor.executemany(statement, parameters)
  File "/usr/local/lib/python2.7/site-packages/MySQLdb/cursors.py", line 262, in executemany
    r = self._query('\n'.join([query[:p], ',\n'.join(q), query[e:]]))
  File "/usr/local/lib/python2.7/site-packages/MySQLdb/cursors.py", line 354, in _query
    rowcount = self._do_query(q)
  File "/usr/local/lib/python2.7/site-packages/MySQLdb/cursors.py", line 318, in _do_query
    db.query(q)
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1054, "Unknown column 'A%%' in 'field list'") [SQL: u'INSERT INTO test_table (`A%%`, `B
%%`) VALUES (%s, %s)'] [parameters: ((0.1, 0.1), (0.2, 0.2), (0.3, 0.3))]

Versions:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 15.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 18.0.1
Cython: None
numpy: 1.10.2
scipy: 0.16.0
statsmodels: None
IPython: 4.0.0
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: 2.2.0-b1
xlrd: 0.9.4
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.8
pymysql: None
psycopg2: None
Jinja2: None
@jreback jreback added the IO SQL to_sql, read_sql, read_sql_query label Dec 28, 2015
@Tanguyabel
Copy link

Hi, I encountered a similar issue using DataFrame.read_sql_query to read data from SQL.
Following the answer from this post on stackoverflow, I got it fixed using the sqlalchemy text() function when executing the query. This fix takes place in the execute method of SQLDataBase class. I've never contributed to pandas so I am not sure how I should provide a patch and test it. Please feel free to tell me about.
Hope it helps!

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.14.32-xxxx-grs-ipv6-64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 3.3
Cython: None
numpy: 1.10.4
scipy: 0.16.1
statsmodels: None
IPython: 4.0.1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: 0.6.7.None
psycopg2: None
Jinja2: None

@jfunction
Copy link

For me this (rather old) issue was due to an outdated MySQLdb. For me, simply running the following fixed the issue:

sudo apt-get install python-dev libmysqlclient-dev;
pip install mysqlclient --upgrade --user

Your mileage may vary. I hope that saves someone an hour or two.

@ThibTrip
Copy link
Contributor

ThibTrip commented Apr 2, 2019

This issue seems to still be there with PostgreSQL. Here is a code sample to test this (you'll need to provide your engine and create a schema "python_tests" or set the schema to "public" if you don't mind adding a table there).

As a workaround I will obviously change "%" to "percent". It is actually probably a bad idea to have "%" in a column name anyways so I'd be fine if an error was thrown immediatly instead of pandas trying to save then failing and then only throwing the error.

The reason why I'm writing this is because in my case I was working on a table with tens of thousands of rows and jupyter crashed (the % sign was causing a great amount of spam as I saw by running my script with IPython afterwards).


import pandas as pd
import sqlalchemy

values = [[pd.Timestamp('2019-01-01 00:00:00'), 'New Year', 0, 'Tuesday']]
columns = ['Date', 'Holiday Name', '% worked', 'Weekday']
df_test = pd.DataFrame(data = values, columns = columns)

# YOUR ENGINE
engine = sqlalchemy.create_engine('...')

df_test.to_sql(name = 'pg_table_with_percent',
               con = engine, 
               schema = 'python_tests', # you need to create the schema if it does not exist
               if_exists = 'replace', 
               index = False, 
               method = 'multi')


---------------------------------------------------------------------------
ProgrammingError                          Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1181                         parameters,
-> 1182                         context)
   1183         except BaseException as e:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\default.py in do_execute(self, cursor, statement, parameters, context)
    469     def do_execute(self, cursor, statement, parameters, context=None):
--> 470         cursor.execute(statement, parameters)
    471 

ProgrammingError: incomplete placeholder: '%(' without ')'

The above exception was the direct cause of the following exception:

ProgrammingError                          Traceback (most recent call last)
<ipython-input-4-88fe30bc8405> in <module>
     16                if_exists = 'replace',
     17                index = False,
---> 18                method = 'multi')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in to_sql(self, name, con, schema, if_exists, index, index_label, chunksize, dtype, method)
   2530         sql.to_sql(self, name, con, schema=schema, if_exists=if_exists,
   2531                    index=index, index_label=index_label, chunksize=chunksize,
-> 2532                    dtype=dtype, method=method)
   2533 
   2534     def to_pickle(self, path, compression='infer',

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py in to_sql(frame, name, con, schema, if_exists, index, index_label, chunksize, dtype, method)
    458     pandas_sql.to_sql(frame, name, if_exists=if_exists, index=index,
    459                       index_label=index_label, schema=schema,
--> 460                       chunksize=chunksize, dtype=dtype, method=method)
    461 
    462 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py in to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype, method)
   1172                          schema=schema, dtype=dtype)
   1173         table.create()
-> 1174         table.insert(chunksize, method=method)
   1175         if (not name.isdigit() and not name.islower()):
   1176             # check for potentially case sensitivity issues (GH7815)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py in insert(self, chunksize, method)
    684 
    685                 chunk_iter = zip(*[arr[start_i:end_i] for arr in data_list])
--> 686                 exec_insert(conn, keys, chunk_iter)
    687 
    688     def _query_iterator(self, result, chunksize, columns, coerce_float=True,

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\sql.py in _execute_insert_multi(self, conn, keys, data_iter)
    607         """
    608         data = [dict(zip(keys, row)) for row in data_iter]
--> 609         conn.execute(self.table.insert(data))
    610 
    611     def insert_data(self):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py in execute(self, object, *multiparams, **params)
    943             raise exc.ObjectNotExecutableError(object)
    944         else:
--> 945             return meth(self, multiparams, params)
    946 
    947     def _execute_function(self, func, multiparams, params):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\sql\elements.py in _execute_on_connection(self, connection, multiparams, params)
    261     def _execute_on_connection(self, connection, multiparams, params):
    262         if self.supports_execution:
--> 263             return connection._execute_clauseelement(self, multiparams, params)
    264         else:
    265             raise exc.ObjectNotExecutableError(self)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py in _execute_clauseelement(self, elem, multiparams, params)
   1051             compiled_sql,
   1052             distilled_params,
-> 1053             compiled_sql, distilled_params
   1054         )
   1055         if self._has_events or self.engine._has_events:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1187                 parameters,
   1188                 cursor,
-> 1189                 context)
   1190 
   1191         if self._has_events or self.engine._has_events:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
   1400                 util.raise_from_cause(
   1401                     sqlalchemy_exception,
-> 1402                     exc_info
   1403                 )
   1404             else:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\util\compat.py in raise_from_cause(exception, exc_info)
    201     exc_type, exc_value, exc_tb = exc_info
    202     cause = exc_value if exc_value is not exception else None
--> 203     reraise(type(exception), exception, tb=exc_tb, cause=cause)
    204 
    205 if py3k:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\util\compat.py in reraise(tp, value, tb, cause)
    184             value.__cause__ = cause
    185         if value.__traceback__ is not tb:
--> 186             raise value.with_traceback(tb)
    187         raise value
    188 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1180                         statement,
   1181                         parameters,
-> 1182                         context)
   1183         except BaseException as e:
   1184             self._handle_dbapi_exception(

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sqlalchemy\engine\default.py in do_execute(self, cursor, statement, parameters, context)
    468 
    469     def do_execute(self, cursor, statement, parameters, context=None):
--> 470         cursor.execute(statement, parameters)
    471 
    472     def do_execute_no_params(self, cursor, statement, context=None):

ProgrammingError: (psycopg2.ProgrammingError) incomplete placeholder: '%(' without ')' [SQL: 'INSERT INTO python_tests.pg_table_with_percent ("Date", "Holiday Name", "%% worked", "Weekday") VALUES (%(Date_m0)s, %(Holiday Name_m0)s, %(% worked_m0)s, %(Weekday_m0)s)'] [parameters: {'Date_m0': datetime.datetime(2019, 1, 1, 0, 0), 'Holiday Name_m0': 'New Year', '% worked_m0': 0, 'Weekday_m0': 'Tuesday'}]


@mroeschke mroeschke added the Bug label May 16, 2020
@mroeschke mroeschke changed the title DataFrame to_sql fails with percentage symbol % in column name with SQLAlchemy and MySQL DataFrame to_sql fails with percentage symbol % in column name with SQLAlchemy and PostgreSQL May 19, 2020
@XiaoMutt
Copy link

XiaoMutt commented Dec 9, 2020

I encountered the same issue using pandas 1.14.
It turns out that %() characters are used by SQLAlchemy as special command characters when constructing insert commands. Creating the table is fine, but to_sql fail to handle these characters while doing the insert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

7 participants