Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bulk insert w/ SQL Server and no-pk table in offline mode should not render SET IDENTITY INSERT #495

Open
sqlalchemy-bot opened this issue May 21, 2018 · 12 comments

Comments

@sqlalchemy-bot
Copy link

Migrated issue, originally created by Wil Tan

In offline mode, Alembic emits IDENTITY INSERT mode control statements. However, in online mode, it does not do so.

There are a few shortcomings that trips up my use cases:

  • in offline mode, sometimes I bulk_insert into a table with no auto-increment key. The SET IDENTITY INSERT <table> ON statement will result in an error in MSSQL.

  • in online mode, sometimes I do want to insert fixed primary key values for auto-increment fields, but because Alembic does not turn on IDENTITY INSERT mode, MSSQL throws an error.

Would it make sense to change the current behaviour in alembic/ddl/mssql.py such that it emits SET IDENTITY INSERT statements if the rows data contains the table._autoincrement_column, regardless of offline/online mode?

I'm happy to work on a pull request if you agree with the approach.

@sqlalchemy-bot
Copy link
Author

Changes by Wil Tan:

  • edited description

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

Not able to reproduce the second case:

migration:

def upgrade():
    # ### commands auto generated by Alembic - please adjust! ###

    has_autoinc = sa.Table('has_autoinc', sa.MetaData(),
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column('x', sa.Integer()),
        sa.Column('y', sa.Integer())
    )

    op.bulk_insert(has_autoinc,
        [
            {"id": 1, "x": 5, "y": 4},
            {"id": 2, "x": 7, "y": 3},
        ]
    )
    # ### end Alembic commands ###

output running in online mode:

INFO  [alembic.runtime.migration] Running upgrade 355e309c5732 -> 3c9004fc9946, rev2
INFO  [sqlalchemy.engine.base.Engine] SET IDENTITY_INSERT has_autoinc ON
INFO  [sqlalchemy.engine.base.Engine] ()
INFO  [sqlalchemy.engine.base.Engine] INSERT INTO has_autoinc (id, x, y) VALUES (?, ?, ?)
INFO  [sqlalchemy.engine.base.Engine] ((1, 5, 4), (2, 7, 3))
INFO  [sqlalchemy.engine.base.Engine] SET IDENTITY_INSERT has_autoinc OFF
INFO  [sqlalchemy.engine.base.Engine] ()
INFO  [sqlalchemy.engine.base.Engine] UPDATE alembic_version SET version_num='3c9004fc9946' WHERE alembic_version.version_num = '355e309c5732'
INFO  [sqlalchemy.engine.base.Engine] ()
INFO  [sqlalchemy.engine.base.Engine] COMMIT

in online mode, the SQLAlchemy dialect is doing the work here and it knows to turn on IDENTITY INSERT if the table has an autoincrement column and the given values contain that column.

OTOH, if you are doing it like the documentation implies:

has_autoinc = sa.table('has_autoinc',
    sa.column('id', sa.Integer),
    sa.column('x', sa.Integer()),
    sa.column('y', sa.Integer())
)

unfortunately that can't work for SQL Server because the lower-case "column()" / "table()" system has no primary key concept (or autoincrement column), thus the dialect doesn't know about a primary key and correctly doesn't turn on IDENTITY INSERT (the way you want the first case to work).

@sqlalchemy-bot
Copy link
Author

Wil Tan wrote:

Thanks for the tip! I was using lower case sa.table with upper case sa.Column; changing it to sa.Table makes all the difference.
Would the first case be something you'd like to fix?

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

sure, the first one sounds like it should be fixed as I noticed this is just a hardcoded rule in the Alembic impl in any case.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

I'm also not totally happy that the lowercase table(.., column()) system which is much more convenient for quickies can't be used in this kind of case. options include, getting lowercase table/column to support some rudimentary level of autoinc concepts, or maybe some kind of hint to bulk_insert() that gets passed to the SQL Server dialect. Perhaps:

op.bulk_insert(..., execution_options={"mssql_identity_insert": True})

e.g. we add the ability to send execution options to bulk_insert and we get the SQL Server dialect to allow for this hint to be explicit.

@sqlalchemy-bot
Copy link
Author

Wil Tan wrote:

If this is the only use case for scrapping / changing the lowercase table / column system, it seems like overkill. After all, it's not too much to ask the developer to switch to upper case Table(.., Column()) in order to make this work, provided it is in the documentation.
Adding execution_options is useful, and I would have used it if it had been there. However, it was much better to discover the proper solution which is to use Table with metadata support.

One way that could be useful is to warn user about the mixed use of uppercase / lowercase table/column. For example, op.bulk_insert could raise an error if a TableClause (and not schema.Table) was given and it contained a schema.Column as opposed to a simple ColumnClause. That was the mistake I made.

sa.table('foo', sa.Column('abc', sa.Integer(), primary_key=True))

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

well we can always fix one thing at a time, I'm waiting on you for a PR right? there is absolutely no hurry.

@sqlalchemy-bot
Copy link
Author

Wil Tan wrote:

Sounds good, I'll submit a PR for the offline case. Do you prefer a PR in Github rather than Bitbucket?

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

github is better. I'd ike to get off bitbucket completely but I'm waiting on github to assist in that.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (@zzzeek) wrote:

updated the title to the actual thing we want to fix here.

@sqlalchemy-bot
Copy link
Author

Changes by Michael Bayer (@zzzeek):

  • changed title from "Use IDENTITY INSERT for MS SQL Server for online m" to "bulk insert w/ SQL Server and no-pk table in offli"

@jlgrock
Copy link

jlgrock commented Jun 13, 2020

I'm not great at writing unit tests right now in python, so I haven't created a merge request, but I was able to manually test the following code replacement with and without a database connection and it properly identifies when the column has an identity column and set the SET IDENTITY_INSERT [TABLE_NAME] ON; sections. I do understand the _autoincrement_column is a private member, so there's probably a better way to go about that as well. Since I desperately needed this, my workaround was the following.

from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import mssql
from alembic.ddl.mssql import MSSQLImpl
from sqlalchemy.schema import Sequence

def bulk_insert(self, table, rows, **kw):
    id_column = table._autoincrement_column
    
    insert_has_identity = (id_column is not None) and (
        not isinstance(id_column.default, Sequence)
    )
    if self.as_sql and insert_has_identity:
        self._exec(
            "SET IDENTITY_INSERT %s ON"
            % self.dialect.identifier_preparer.format_table(table)
        )
        super(MSSQLImpl, self).bulk_insert(table, rows, **kw)
        self._exec(
            "SET IDENTITY_INSERT %s OFF"
            % self.dialect.identifier_preparer.format_table(table)
        )
    else:
        super(MSSQLImpl, self).bulk_insert(table, rows, **kw)

MSSQLImpl.bulk_insert = bulk_insert

I tested it with the following:
a) an Identity column

table_name = 'MY_EXAMPLE'
    my_example_table = op.create_table(
        table_name,
       sa.Column('my_id', sa.INTEGER, nullable=False),
        sa.Column('my_value', sa.VARCHAR(length=4), nullable=False),
        sa.PrimaryKeyConstraint('my_id')
    )
    op.bulk_insert(
        my_example_table,
        [
              {'my_value': '800A'},
        ]
    )

b) an integer that is not an identity

table_name = 'MY_EXAMPLE'
    my_example_table = op.create_table(
        table_name,
       sa.Column('my_id', sa.INTEGER, nullable=False, autoincrement=False),
        sa.Column('my_value', sa.VARCHAR(length=4), nullable=False),
        sa.PrimaryKeyConstraint('my_id')
    )
    op.bulk_insert(
        my_example_table,
        [
              {'my_id': 2, 'my_value': '800A'},
        ]
    )

c) something that is not an integer (and thus cannot be an identity)

table_name = 'MY_EXAMPLE'
    my_example_table = op.create_table(
        table_name,
       sa.Column('my_id', mssql.UNIQUEIDENTIFIER, server_default=sa.func.newid(), nullable=False),
        sa.Column('my_value', sa.VARCHAR(length=4), nullable=False),
        sa.PrimaryKeyConstraint('my_id')
    )
    op.bulk_insert(
        my_example_table,
        [
              {'my_value': '800A'},
        ]
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants