BUG/API: converting invalid column names in to_sql #6796

jorisvandenbossche · 2014-04-04T08:27:52Z

At the moment, with the new sql code, there is a bug in the conversion of invalid column names (trough the function _safe_col_name). Eg:

In [1]: import sqlalchemy
In [2]: engine = sqlalchemy.create_engine('sqlite:///:memory:')
In [4]: df = pd.DataFrame({'safe_col':[1,2], 'invalid col':[3,4]})
In [7]: pd.io.sql.to_sql(df, 'test', engine)
In [9]: pd.io.sql.read_table('test', engine)
Out[9]:
   pandas_index invalid_col  safe_col
0             0        None         1
1             1        None         2

I know the reason, and it is easily fixed (the table statement uses the adapted column, but the inserting still the old one, so the data are not inserted). But I was wondering if this is actually necessary?

Because, databases can handle column names with spaces (at least sqlite, mysql and postgresql that I know a little bit), the names are then just quoted and sqlalchemy ensures this is done properly for us.
Often, it is advised to not use spaces (or capital letters) in column names, as it is just tedious to always have to quote them and it is more robust to not have to do that. But it does work, so I would think it is up to the user to provide the column names he/she wants.

So proposal:

we don't change the column names and leave them as the user provides them
we could provide a kwarg to do this if the users wants to, but I don't think this is priority at the moment (and I think we should then also convert to lower case, as there is as much reason to do that as to changes spaces to underscores)

@mangecoeur @hayd

PS: the fact there are Nones in the example above and not NaNs is still another issue I think.

The text was updated successfully, but these errors were encountered:

jreback · 2014-04-04T12:14:05Z

@jorisvandenbossche I would go with an option, may translate_names=None (or accept a string to describe what is happening), to translate spaces/case. Default it not to do anything. If you have time for 0.14.0 or bump to later version ok too.

danielballan · 2014-04-11T15:21:54Z

I can pick this one up if @jorisvandenbossche has not already begun work on it.

jorisvandenbossche · 2014-04-11T15:47:43Z

@danielballan certainly do! (very welcome as I have still other work that I would like to do for the sql code)

I have not yet begun to work on it, the problems just lies in the fact that the insert statement uses the frame's column names, and not the converted ones (https://github.com/pydata/pandas/blob/62b4d6405c21cfd74b4e9e00c298648d6b2e2e82/pandas/io/sql.py#L442) as in the table statement (https://github.com/pydata/pandas/blob/62b4d6405c21cfd74b4e9e00c298648d6b2e2e82/pandas/io/sql.py#L502))

I would certainly default to do nothing, and maybe provide a keyword argument to do something else (remove space, to lower case). But the first is the priority I think.

But maybe it would be better that first #6735 gets merged. If you can take a look at it, please do.

mangecoeur · 2014-04-15T23:05:56Z

@jorisvandenbossche @danielballan I think the whole safe columns thing is unnecessary if we trust SQLAlchemy to escape names correctly (which I think we can). In that case we don't even need an option for translate names (we're accumulating too many options as it is, as they say options is where good design goes to die).
I would nuke safe columns, make sure Legacy mode does it's escaping properly, and if users want some custom renaming they're free to use column renaming just like everyone else.

danielballan · 2014-04-15T23:10:47Z

Good. I was coming to the same conclusion.

On Tuesday, April 15, 2014, mangecoeur notifications@github.com wrote:

@jorisvandenbossche https://github.com/jorisvandenbossche @danielballanhttps://github.com/danielballanI think the whole safe columns thing is unnecessary if we trust SQLAlchemy
to escape names correctly (which I think we can). In that case we don't
even need an option for translate names (we're accumulating too many
options as it is, as they say options is where good design goes to die).
I would nuke safe columns, make sure Legacy mode does it's escdaping
properly, and if users want some custom renaming they're free to use column
renaming just like everyone else.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/6796#issuecomment-40545324
.

jorisvandenbossche · 2014-04-15T23:21:00Z

@mangecoeur Fully agree, that was also in line with what I proposed. I would let it fully up to the user to provide the column names it wants.
And indeed, for legacy mode, try to keep the same behaviour with write_frame as before, and the new behaviour for to_sql as I did for the index=True/False

jorisvandenbossche · 2014-04-16T09:19:43Z

See here: #6883 (comment), I was completely mistaken about the to_sql being new. So if we want to be sure that the legacy mode keeps it behaviour, this should also be the case for to_sql, which makes it difficult as we have one docstring for both new and legacy to_sql, and they would have different defaults then.

danielballan · 2014-04-16T12:57:06Z

If we commit to one backward-incompatible change in to_sql, with regard to the index, we have an opportunity to make a second one "for free" in the same move, right? If we all agree that safe columns is unnecessary, should we remove it now and document it along with the change in #6883?

jorisvandenbossche · 2014-04-16T22:59:47Z

It's not completely "for free" as it are two different changes (with possible different side effects for existing code), but I agree if we want to change it, we do it better now and together.

Or, we could also trigger a warning for legacy mode now that it will change it next version (but maybe that is not needed?)

jorisvandenbossche · 2014-04-17T11:42:18Z

An idea: could we do a quick check (when in legacy mode) if there are columns names with spaces? (the names which would have been converted previously) And then raise a warning, to warn the user that the behaviour changed and that the column names will not be converted anymore (and maybe hint how to do it themselves). Or would this just be annoying?

I think this could help prevent surprises, but of course, once you've seen it and decide to keep using column names with spaces this can become annoying.

jreback · 2014-04-17T11:55:38Z

a warning sounds good

don't need to try too hard for backwards compat - it's an API change - users need to adjust

jorisvandenbossche added this to the 0.14.0 milestone Apr 4, 2014

jorisvandenbossche added the SQL label Apr 4, 2014

jorisvandenbossche mentioned this issue Apr 4, 2014

ENH: sql support with SQLAlchemy - follow-up #6292

Closed

17 tasks

danielballan mentioned this issue Apr 14, 2014

SQL: add multi-index support to legacy mode #6883

Merged

danielballan mentioned this issue Apr 17, 2014

API: Stop modifying SQL column and names, and warn when pertinent. #6902

Merged

jorisvandenbossche closed this as completed in #6902 Apr 22, 2014

mbkupfer mentioned this issue Mar 2, 2021

BUG: to_sql not properly escaping column names postgresql #40183

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG/API: converting invalid column names in to_sql #6796

BUG/API: converting invalid column names in to_sql #6796

jorisvandenbossche commented Apr 4, 2014

jreback commented Apr 4, 2014

danielballan commented Apr 11, 2014

jorisvandenbossche commented Apr 11, 2014

mangecoeur commented Apr 15, 2014

danielballan commented Apr 15, 2014

jorisvandenbossche commented Apr 15, 2014

jorisvandenbossche commented Apr 16, 2014

danielballan commented Apr 16, 2014

jorisvandenbossche commented Apr 16, 2014

jorisvandenbossche commented Apr 17, 2014

jreback commented Apr 17, 2014

BUG/API: converting invalid column names in to_sql #6796

BUG/API: converting invalid column names in to_sql #6796

Comments

jorisvandenbossche commented Apr 4, 2014

jreback commented Apr 4, 2014

danielballan commented Apr 11, 2014

jorisvandenbossche commented Apr 11, 2014

mangecoeur commented Apr 15, 2014

danielballan commented Apr 15, 2014

jorisvandenbossche commented Apr 15, 2014

jorisvandenbossche commented Apr 16, 2014

danielballan commented Apr 16, 2014

jorisvandenbossche commented Apr 16, 2014

jorisvandenbossche commented Apr 17, 2014

jreback commented Apr 17, 2014