to_sql if_exists argument with SQL server and other schema #7422

aergener · 2014-06-10T20:50:01Z

The if_exists argument of the to_sql function doesn't check all schema for the table while checking if it exists. Furthermore, it inserts to the default schema, causing somewhat contradictory behavior.

For example, while using SQL Server with my default schema set to test, to_sql inserts the table into test.table_name. However, trying this again, with if_exists='replace', to_sql finds no table of the name dbo.table_name, and then tries to create test.table_name, causing an error.

Details :

http://stackoverflow.com/questions/24126883/pandas-dataframe-to-sql-function-if-exists-parameter-not-working

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2014-06-10T20:53:08Z

@aergener How did you connect to SQL server (how do you set the default schema)?

It is a bit strange that when checking for the table it looks in dbo, but when writing it used the sepcified test schema.

aergener · 2014-06-10T20:56:18Z

@jorisvandenbossche I connected to SQL Server using sqlalchemy.create_engine. The default schema is a user setting within SQL Server itself (not specified while connecting in python).

jorisvandenbossche · 2014-06-10T21:02:55Z

Can you check engine.has_table('foobar')? (instead of the pd.io.sql.has_table('foobar', engine) which gave a false False).

aergener · 2014-06-10T21:10:25Z

When the table is in the test schema:

>>> engine.has_table('foobar')
Out[9]: False

When creating a table in SQL Server if a schema is not specifically designated, the table will be created in the user's default schema. Perhaps this issue could be fixed if pandas specified a schema for sqlalchemy to use that the user could potentially override?

jorisvandenbossche · 2014-06-11T19:45:22Z

I cannot reproduce this with postgresql (set the default schema to something else than the default 'public', but both the checking if the table exists or writing the data uses the same schema I set as default). So I am thinking it is possibly a bug in the sqlalchemy interface to SQL server (just guessing, I don't use SQL server). As engine.has_table('foobar') should look in the default schema I think.

jorisvandenbossche · 2014-06-12T07:30:10Z

@aergener I posted on the sqlalchemy mailing list: https://groups.google.com/forum/?fromgroups#!topic/sqlalchemy/Ktu06z9x97c Can you try what is said there?

So add echo='debug' to create_engine( ), then then look in the logs (those are printed) when you use the to_sql function. There should somewhere be a query to get the default schema name.

aergener · 2014-06-12T17:08:27Z

@jorisvandenbossche Cool, I'll test this out then get back to you.

As an aside, I was wondering if you have thought about adding better datatype support to pandas. Specifically, looking at pandas.io.sql.PandasSQLTable._sqlalchemy_type all strings in pandas end up as text fields in SQL. I'm not sure about other flavors, but in SQL Server working with text fields is a pain, so it would be nice to have something like string_repr option in to_sql. Do you have an email I can contact you at? I would like to help with this, as well as get involved with pandas development in general, but I'm not sure how I should go about it, and you seem like the SQL guy around here...

aergener · 2014-06-13T01:04:33Z

@jorisvandenbossche It looks like it's not picking up anything for the default_schema_name and defaulting to dbo. Notice how there's no row after 2014-06-12 20:56:15,461 DEBUG sqlalchemy.engine.base.Engine Col ('default_schema_name',)

2014-06-12 20:56:15,459 INFO sqlalchemy.engine.base.Engine SELECT user_name()
2014-06-12 20:56:15,459 INFO sqlalchemy.engine.base.Engine ()
2014-06-12 20:56:15,460 DEBUG sqlalchemy.engine.base.Engine Col ('',)
2014-06-12 20:56:15,460 DEBUG sqlalchemy.engine.base.Engine Row (u'aergener', )
2014-06-12 20:56:15,460 INFO sqlalchemy.engine.base.Engine 
            SELECT default_schema_name FROM
            sys.database_principals
            WHERE name = ?
            AND type = 'S'

2014-06-12 20:56:15,460 INFO sqlalchemy.engine.base.Engine (u'aergener',)
2014-06-12 20:56:15,461 DEBUG sqlalchemy.engine.base.Engine Col ('default_schema_name',)
2014-06-12 20:56:15,467 INFO sqlalchemy.engine.base.Engine SELECT [TABLES_1].[TABLE_NAME] 
FROM [INFORMATION_SCHEMA].[TABLES] AS [TABLES_1] 
WHERE [TABLES_1].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max)) AND [TABLES_1].[TABLE_TYPE] = ? ORDER BY [TABLES_1].[TABLE_NAME]
2014-06-12 20:56:15,467 INFO sqlalchemy.engine.base.Engine (u'dbo', u'BASE TABLE')

jorisvandenbossche · 2014-06-13T08:18:09Z

@aergener Hmm I don't know SQL Server, so don't know how it works with users and schema's. Maybe you follow up further at the sqlalchemy mailing thread?

aldanor · 2014-06-13T08:30:42Z

Sounds related to #7441 -- @jorisvandenbossche, looks like we'll need to be able to pass schema/meta manually to read_sql / to_sql functions after all? :)

@aergener Try this?

engine = create_engine(..., echo='debug')
meta = MetaData()
meta.reflect(bind=engine, schema='test')

jorisvandenbossche · 2014-06-13T09:03:44Z

@aldanor It is certainly related, but not necesarily the same issue, as 1) this just works with postgresql (if you are a user with another default schema, it just reads and writes from/to that schema), and 2) engine.has_table() will still look in the wrong place.

I mean, having a schema argument might be a way to circumvent this issue here, but this case should actually work out of the box? (although I don't know SQL Server, so can't say for sure)

UPDATE: my second point can of course be solved with the engine.has_table( .. , schema=..) if we have a schema argument.

aldanor · 2014-06-13T09:09:03Z

@jorisvandenbossche Can't say much about this case, don't know much about SQL Server either. In Oracle, I had to set the schema equal to original table owner when calling MetaData#reflect for meta to be able to reflect anything.

Regarding the original question, found a discussion on schemas in sqlalchemy with SQL Server here: https://groups.google.com/forum/#!topic/sqlalchemy/VZQ_SLprNQA -- again, it boils down to passing schema manually when all else fails.

Looks like there's many ways "default schema" could fail, quoting the link above:

Default Schema Name can be empty if the user logs in via an AD Group.
(in fact a Group user cannot be assigned a default schema name before
SQL Server 2012...).

jorisvandenbossche · 2014-06-13T16:23:09Z

@aergener If you want to get more involved, very welcome! I am following up a bit the sql issues, but I am by no mean an sql expert (just using postgresql a bit myself). So the experience of other people with other database flavors, more advanced use cases, other database set-ups with users and privileges, etc is very useful.
Some tips to get started:

File an issue for the enhancement ideas you have (eg about the better datatype support), so we can start discussing it. Try to give some example case where this would be useful, a possible interface, ..
Try to run the sql test suite with SQL Server (for now it is only tested on travis with postgresql, mysql (with pymysql) and sqlite, SQL Server is not available there). But it would be very useful to extend this testing coverage. For this you should make a class similiar to TestPostgreSQLAlchemy in https://github.com/pydata/pandas/blob/master/pandas/io/tests/test_sql.py#L1112. This could be a good starter.
Look through the issue with the SQL label and look if you want to try to tackle one (or non-sql issue also welcome of course!) If you hav any questions about it, just ask in the relevant issue.

See here for more general advice on getting started with pandas development: http://pandas.pydata.org/developers.html and https://github.com/pydata/pandas/wiki/Contributing. If you have any questions, just ask here or at the pydata mailing list (https://groups.google.com/forum/#!forum/pydata). Questions you have are probably also relevant to others.

JoergRittinger · 2014-07-25T08:18:43Z

I have also problems with the missing schema information in to_sql when ussing if_exists='append'.
Within sql.PandasSQLAlchemy the functions get_table and has_table are called with the same parameter name starting from the to_sql function.
In the dialect I use the tables in metadata look like this (which will be used in get_table):

>>> self.meta.tables.keys()
[ ..., 'schema_name.table_name', ...]

On the other hand the has_table function of my engine expects table and schema name as two arguments:

def has_table(self, connection, table_name, schema=None):
    ...

So when I call:

pandas_sql.to_sql(
        frame=dataframe,
        name='table_name',
        if_exists='append')

get_table will return None. If I call it with:

pandas_sql.to_sql(
        frame=dataframe,
        name='schema_name.table_name',
        if_exists='append')

get_table will return the table but has_table will return False.

Am I correct that this is kind of the same problem as you discussed?

maxgrenderjones · 2014-07-25T11:44:35Z

They're related, and down to the fact that has_table and get_table do different things.

has_table does a SQL query to determine if the table exists, using your own engine's way of doing that query. get_table returns the table out of a dictionary of tables that have been loaded by reflection from the database (so it doesn't trigger a SQL query). As a result has_table being True does not imply that you'll get anything back when you call get_table. This therefore results in strange effects if the methods are called in ways that they don't support (see #7826 and #7815).

imho the entry methods (e.g. to_sql) should take a schema name, but then pandas will need to somehow be aware that reflection returns schema_name.table_name under some circumstances but not others (can SQLAlchemy tell us this?). In other words, get_table may need to get much more intelligence than a simple dictionary lookup.

JoergRittinger · 2014-07-25T23:25:49Z

What do think about replacing the two functions in PandasSQLAlchemy:

def has_table(self, name):
    return self.engine.has_table(name)

def get_table(self, table_name):
    return self.meta.tables.get(table_name)

with:

def has_table(self, table_name, schema_name=None):
    return self.engine.has_table(table_name, schema_name)

def get_table(self, table_name, schema_name=None):
    if schema_name:
        for table in self.metadata.tables.itervalues():
            if table.name == table_name and table.schema == schema_name:
                return table
    else:
        return self.meta.tables.get(table_name)

of course at many other points schema_name would have to be added to function call

jorisvandenbossche · 2014-08-07T14:31:10Z

Hey @aergener, an implementation of schema support is here: #7952. Could you try this out to see if this fixes your problem (if you specify the schema)?

aergener · 2014-08-11T14:21:54Z

@jorisvandenbossche I will try this out and get back to you

jorisvandenbossche · 2014-08-11T14:25:28Z

@aergener Thanks! BTW, see #7957, an issue on the string representation (String/Text). You also mentioned something about that above, maybe you can give your experience theren in the issue.

jorisvandenbossche · 2014-09-05T13:19:53Z

This should be solved now #7952 is merged. @aergener if the problem still exists, you can always reopen the issue!

jorisvandenbossche added the SQL label Jun 10, 2014

jorisvandenbossche added this to the 0.14.1 milestone Jun 10, 2014

jorisvandenbossche changed the title ~~to_sql if_exists argument~~ to_sql if_exists argument with SQL server and other schema Jun 11, 2014

jorisvandenbossche modified the milestones: 0.15.0, 0.14.1 Jul 3, 2014

jreback added the API Design label Jul 25, 2014

jreback modified the milestones: 0.15.0, 0.15.1 Jul 25, 2014

maxgrenderjones mentioned this issue Jul 29, 2014

BUG/ENH: to_sql fails to detect tables via if_exists on mysql with capitalised table names #7815

Closed

jorisvandenbossche closed this as completed Sep 5, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_sql if_exists argument with SQL server and other schema #7422

to_sql if_exists argument with SQL server and other schema #7422

aergener commented Jun 10, 2014

jorisvandenbossche commented Jun 10, 2014

aergener commented Jun 10, 2014

jorisvandenbossche commented Jun 10, 2014

aergener commented Jun 10, 2014

jorisvandenbossche commented Jun 11, 2014

jorisvandenbossche commented Jun 12, 2014

aergener commented Jun 12, 2014

aergener commented Jun 13, 2014

jorisvandenbossche commented Jun 13, 2014

aldanor commented Jun 13, 2014

jorisvandenbossche commented Jun 13, 2014

aldanor commented Jun 13, 2014

jorisvandenbossche commented Jun 13, 2014

JoergRittinger commented Jul 25, 2014

maxgrenderjones commented Jul 25, 2014

JoergRittinger commented Jul 25, 2014

jorisvandenbossche commented Aug 7, 2014

aergener commented Aug 11, 2014

jorisvandenbossche commented Aug 11, 2014

jorisvandenbossche commented Sep 5, 2014

to_sql if_exists argument with SQL server and other schema #7422

to_sql if_exists argument with SQL server and other schema #7422

Comments

aergener commented Jun 10, 2014

jorisvandenbossche commented Jun 10, 2014

aergener commented Jun 10, 2014

jorisvandenbossche commented Jun 10, 2014

aergener commented Jun 10, 2014

jorisvandenbossche commented Jun 11, 2014

jorisvandenbossche commented Jun 12, 2014

aergener commented Jun 12, 2014

aergener commented Jun 13, 2014

jorisvandenbossche commented Jun 13, 2014

aldanor commented Jun 13, 2014

jorisvandenbossche commented Jun 13, 2014

aldanor commented Jun 13, 2014

jorisvandenbossche commented Jun 13, 2014

JoergRittinger commented Jul 25, 2014

maxgrenderjones commented Jul 25, 2014

JoergRittinger commented Jul 25, 2014

jorisvandenbossche commented Aug 7, 2014

aergener commented Aug 11, 2014

jorisvandenbossche commented Aug 11, 2014

jorisvandenbossche commented Sep 5, 2014