INSERT statements get lost #262

stolpovsky · 2017-08-03T19:01:22Z

Environment

Python: 3.5.1
pyodbc: 4.0.17
OS: Windows 7 SP1 64-bit
DB: MS SQL Server 2012
driver: ODBC driver 11 for sql server

Issue

I am executing around 1289 insert statements , only about half of them are reflected in the target table. There is no exception raised, it just silently ignores part of the input. String length is 427118.

        with pyodbc.connect(connection_string) as con:
            cursor = con.cursor()
            cursor.execute('\n'.join(insert_statements))
            cursor.commit()

Interestingly, the number of insert statements that get executed keeps changing (from around 600 to 1000) from run to run. There are no SQL errors - If I execute generated string in the GUI, there are no errors and all rows are inserted.

If I execute one INSERT at a time, all rows are inserted, but it runs noticeably slower (around 30 s vs 1 s).

        with pyodbc.connect(connection_string) as con:
            cursor = con.cursor()
            for s in insert_statements:
                cursor.execute(s)
            cursor.commit()

The text was updated successfully, but these errors were encountered:

v-chojas · 2017-08-04T17:29:01Z

Each insert statement will generate one resultset (containing the row count), which you are not consuming. SQL Server will block processing of further statements in a batch, if there are pending resultsets filling up its write buffer. Try this instead:

        with pyodbc.connect(connection_string) as con:
            cursor = con.cursor()
            cursor.execute('\n'.join(insert_statements))
            while cursor.nextset():
                pass
            cursor.commit()

Please see https://support.microsoft.com/en-us/help/827575/sql-server-does-not-finish-execution-of-a-large-batch-of-sql-statement for more information.

stolpovsky · 2017-08-04T17:44:55Z

Thanks a lot for providing a link to the issue discussion and for suggesting a workaround. My first reaction was to blame it on the API, but upon some thinking - pausing to execute statements to allow the caller to consume output seems reasonable.

Can this situation be handled in pyodbc? For example, on commit(), if there are any pending executions, then throw.

v-chojas · 2017-08-04T18:54:41Z

Although I don't know the specifics on what SQL Server does, I can understand and explain why this behaviour is by design and quite natural: the protocol works on a request-response model, where the client sends the request to the server and then waits for a reply. Multiple SQL statements can be submitted in one batch in order to minimise the number of network round-trips, but it is important that the client still know the status/result of each statement in a batch. Thus, the server's main loop is: read a statement; execute it; write the result. If the server can't write, it (implicitly) waits until it can.

This is where it would be appropriate to throw on buffer overflow, or to throw on timeout (while waiting for client to consume from output buffer).

This is a network buffer, whose fullness is highly dependent on network conditions --- thus your observation that the number of inserts executed varies somewhat randomly --- and also why "throw on" doesn't make much sense in this context: I suppose it could time out and close the connection, but that would have a huge negative effect on reliability, and be essentially unusable over slower links or interactive applications/those which take a longer time to process resultsets.

Regardless, the server expects you to read (and do error checking) the replies it sends. Hence,

Microsoft recommends that you always consume all result sets from SQL Server regardless of the size of the batch that you are executing. If you do not flush this data and there are successful result sets to be returned ahead of the error in the result set batch, the client might not discover the server errors.

Try to clear the pending results first, and if that does not fix this issue, we can investigate further.

gordthompson · 2017-08-05T14:15:34Z

Perhaps this might help. If I execute the code

sql = """\
INSERT INTO #tmp (id, txt) VALUES (1, 'FOO');
INSERT INTO #tmp (id, txt) VALUES (2, 'BAR');
INSERT INTO #tmp (id, txt) VALUES (3, 'BAZ');
"""

crsr.execute(sql)
row_count = crsr.rowcount
status_msg = "result returned. {} row(s) affected."
print(status_msg.format(row_count))
while crsr.nextset():
    row_count = crsr.rowcount
    print(status_msg.format(row_count))

I get three (3) results returned, one for each of the individual INSERT statements

result returned. 1 row(s) affected.
result returned. 1 row(s) affected.
result returned. 1 row(s) affected.

However, if I add a SET NOCOUNT ON; statement at the beginning of the anonymous code block

sql = """\
SET NOCOUNT ON;
INSERT INTO #tmp (id, txt) VALUES (1, 'FOO');
INSERT INTO #tmp (id, txt) VALUES (2, 'BAR');
INSERT INTO #tmp (id, txt) VALUES (3, 'BAZ');
"""

then I only get one result returned for the entire batch

result returned. -1 row(s) affected.

bjquast · 2017-08-07T07:18:59Z

If the insert statement is always the same it should be much faster to use a multi row insert:

sql = """\
-- SET NOCOUNT ON; not needed as only one result returned
INSERT INTO #tmp (id, txt) 
VALUES (1, 'FOO'), (2, 'BAR'), (3, 'BAZ');
"""

gordthompson · 2017-08-11T00:23:14Z

@stolpovsky - If you saw fit to close the pypyodbc issue then perhaps you might want to close this one, too.

stolpovsky · 2017-08-11T00:53:29Z

pypyodbc is practically unsupported, there is nobody there to look into the issue. I believe the behavior is dangerous and unintuitive: random number of statements do not get executed after commit is called, no error raised, and pyodbc allows it. I am not sure what might be the best way to solve it. Maybe read cursor inside commit, if there is anything left to execute?

while cursor.nextset():
    pass

Maybe throw from commit in case cursor has unread results?

gordthompson · 2017-08-11T14:03:02Z

You raise a very good point if a subset of the statements really do fail silently. Are you sure that no error is returned? That is, have you looped through all of the results using nextset and verified that there isn't an error result in there somewhere?

stolpovsky · 2017-08-11T14:15:45Z

Yes to both questions.

mkleehammer · 2017-08-12T16:04:29Z

This is very interesting. I'd seen the problem before, but did not know the underlying cause.

We all agree it seems like a very dangerous default, and adding a loop to commit should be harmless. I would not expect any significant performance problems - no added network trips - for the "normal" case when there is a single statement.

I would expect any errors to raise an exception in commit. Is everyone ok with that? That means code code look like this:

cursor.execute(sql)
stuff_to_do_when_successful()
commit()  # <- exception here

I think it is fine, because an error could happen here anyway, such as a network disconnect from the server.

stolpovsky · 2017-08-12T16:37:06Z

From the user perspective, this makes sense: commit will first execute any remaining statements, ignoring results, and throwing on error. Not returning results is ok, since the user did not bother to write his own result capture.

It seems like an easy fix for pyodbc maintainers, too: just add a loop in the beginning of commit.

MartinRiddar · 2019-10-22T10:00:49Z

Did this ever get implemented or is this something I have to think about in my code?

v-chojas · 2019-10-22T19:29:11Z

Microsoft recommends that you always consume all result sets from SQL Server regardless of the size of the batch that you are executing.

mneira10 · 2021-01-29T18:41:40Z

This seems to be affecting inserts using sqlalchemy and pandas using:

engine = sqlalchemy.create_engine(
            'mssql+pyodbc:///?odbc_connect={}'.format(dbParams))
dataFrame.to_sql(
            tableName,
            con=engine, ... )

Is there a workaround in this case?

gordthompson · 2021-01-30T22:52:36Z

@mneira10 - If you are talking about this Stack Overflow question then no, pandas to_sql() will almost certainly not be failing because of this issue. See my answer here for details.

keitherskine · 2021-05-29T19:57:05Z

Closed due to inactivity. Feel free to re-open with current information if necessary.

stolpovsky mentioned this issue Aug 9, 2017

insert statements get lost jiangwen365/pypyodbc#59

Closed

v-chojas mentioned this issue Sep 20, 2017

Query with Application Resource Locks Terminated during Processing #277

Closed

keitherskine closed this as completed May 29, 2021

v-chojas mentioned this issue May 6, 2022

fast_execute many stops after x inserts #1053

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSERT statements get lost #262

INSERT statements get lost #262

stolpovsky commented Aug 3, 2017 •

edited

Loading

v-chojas commented Aug 4, 2017

stolpovsky commented Aug 4, 2017 •

edited

Loading

v-chojas commented Aug 4, 2017

gordthompson commented Aug 5, 2017

bjquast commented Aug 7, 2017

gordthompson commented Aug 11, 2017

stolpovsky commented Aug 11, 2017

gordthompson commented Aug 11, 2017

stolpovsky commented Aug 11, 2017 via email •

edited

Loading

mkleehammer commented Aug 12, 2017

stolpovsky commented Aug 12, 2017

MartinRiddar commented Oct 22, 2019

v-chojas commented Oct 22, 2019

mneira10 commented Jan 29, 2021

gordthompson commented Jan 30, 2021 •

edited

Loading

keitherskine commented May 29, 2021

INSERT statements get lost #262

INSERT statements get lost #262

Comments

stolpovsky commented Aug 3, 2017 • edited Loading

Environment

Issue

v-chojas commented Aug 4, 2017

stolpovsky commented Aug 4, 2017 • edited Loading

v-chojas commented Aug 4, 2017

gordthompson commented Aug 5, 2017

bjquast commented Aug 7, 2017

gordthompson commented Aug 11, 2017

stolpovsky commented Aug 11, 2017

gordthompson commented Aug 11, 2017

stolpovsky commented Aug 11, 2017 via email • edited Loading

mkleehammer commented Aug 12, 2017

stolpovsky commented Aug 12, 2017

MartinRiddar commented Oct 22, 2019

v-chojas commented Oct 22, 2019

mneira10 commented Jan 29, 2021

gordthompson commented Jan 30, 2021 • edited Loading

keitherskine commented May 29, 2021

stolpovsky commented Aug 3, 2017 •

edited

Loading

stolpovsky commented Aug 4, 2017 •

edited

Loading

stolpovsky commented Aug 11, 2017 via email •

edited

Loading

gordthompson commented Jan 30, 2021 •

edited

Loading