Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch not cleared on failure when using bulk update #1767

Closed
ianroberts opened this issue Mar 10, 2022 · 7 comments · Fixed by #1869
Closed

Batch not cleared on failure when using bulk update #1767

ianroberts opened this issue Mar 10, 2022 · 7 comments · Fixed by #1869
Assignees
Projects

Comments

@ianroberts
Copy link
Contributor

Driver version

All (tested with 9.2.1 but same code is present in latest main branch).

Problem description

The normal behaviour of executeBatch is to clear the pending batch of operations, regardless of whether the batch execution completes successfully, succeeds for some rows and fails for others, or fails entirely with an exception:

} finally {
batchParamValues = null;
}

(the code to reset batchParamValues to null is in a finally block and fires in all cases, even when an exception is thrown).

However, in the case where the bulk update API is used, batchParamValues is not cleared in the case of an exception:

bcOperation.writeToServer(batchRecord);
bcOperation.close();
updateCounts = new int[batchParamValues.size()];
for (int i = 0; i < batchParamValues.size(); ++i) {
updateCounts[i] = 1;
}
batchParamValues = null;
loggerExternal.exiting(getClassNameLogging(), "executeBatch", updateCounts);
return updateCounts;

(batchParamValues = null is not in a finally so does not run if any of the bcOperation methods were to throw an exception). This means that subsequent addBatch calls following a failure will append to the existing batch rather than starting a new one, and the next executeBatch will re-submit the previous batch of rows along with the new data.

Expected behavior

The logic should be the same in all cases - executeBatch should always clear the batch state so subsequent addBatch calls begin a new batch.

Actual behavior

As described above - in the bulk update case the batch is not cleared on failure.

Any other details that can be helpful

Since clearBatch is idempotent, a workaround would be for user code to do an explicit stmt.clearBatch() call (in a finally) after every call to executeBatch().

ianroberts added a commit to ianroberts/mssql-jdbc that referenced this issue Mar 10, 2022
ianroberts added a commit to ianroberts/mssql-jdbc that referenced this issue Mar 10, 2022
…atch via bulk update.

Closes microsoft#1767

Signed-off-by: Ian Roberts <i.roberts@dcs.shef.ac.uk>
@Jeffery-Wasty
Copy link
Member

Hi @ianroberts,

Thank you for the report and for the potential fix in the pull request. We'll take a look at both and get back to you as soon as we can.

@Jeffery-Wasty Jeffery-Wasty added the Under Investigation Used for issues under investigation label Mar 10, 2022
@Jeffery-Wasty Jeffery-Wasty added this to Under Investigation in MSSQL JDBC via automation Mar 10, 2022
@Jeffery-Wasty Jeffery-Wasty self-assigned this Mar 10, 2022
@Jeffery-Wasty
Copy link
Member

Hi @ianroberts,

Thank you again for your submissions. We've taken a look at both the issue presented, as well as the proposed solution in the pull request. We would like to spend some more time reviewing both, and will get back to you with a final decision, at a later date.

@Jeffery-Wasty Jeffery-Wasty moved this from Under Investigation to Backlog in MSSQL JDBC Mar 15, 2022
@Jeffery-Wasty Jeffery-Wasty added Backlog The topic in question has been recognized and added to development backlog and removed Under Investigation Used for issues under investigation labels Mar 15, 2022
@Jeffery-Wasty Jeffery-Wasty removed their assignment Mar 15, 2022
@lilgreenbird lilgreenbird moved this from Backlog to Under Investigation in MSSQL JDBC Jun 15, 2022
@lilgreenbird lilgreenbird removed the Backlog The topic in question has been recognized and added to development backlog label Jun 15, 2022
@Jeffery-Wasty Jeffery-Wasty self-assigned this Jun 16, 2022
ianroberts added a commit to ianroberts/mssql-jdbc that referenced this issue Jul 14, 2022
…atch, whether via bulk update or via traditional batch insert.

Closes microsoft#1767

Signed-off-by: Ian Roberts <i.roberts@dcs.shef.ac.uk>
ianroberts added a commit to ianroberts/mssql-jdbc that referenced this issue Jul 14, 2022
…atch, whether via bulk update or via traditional batch insert.

Closes microsoft#1767

Signed-off-by: Ian Roberts <i.roberts@dcs.shef.ac.uk>
@Jeffery-Wasty
Copy link
Member

Jeffery-Wasty commented Aug 11, 2022

EDIT: Ignore what I posted, I see where you're coming from. I'll keep looking into the issue.

@Jeffery-Wasty Jeffery-Wasty added the Under Investigation Used for issues under investigation label Aug 11, 2022
@Jeffery-Wasty
Copy link
Member

Jeffery-Wasty commented Aug 16, 2022

Hi @ianroberts,

Another update. I agree with you there is a problem, when you look at the code path, and how batchParamValues is not cleared in cases, and in addition, this PR does give the best fix for that, however, we're still having issues actually replicating the original problem. How did you originally come across this? Is there some code you can share that will reproduce the original problem?

@ianroberts
Copy link
Contributor Author

I can't share my exact code that revealed this issue (it's buried deep inside a proprietary Grails web application) but the scenario is that I have a client that is uploading a stream of data items and on the server side I'm using a PreparedStatement to insert each item in turn into a database table. I use the batching mechanism to execute one batch for every thousand rows, roughly:

PreparedStatement ps = conn.prepareStatement(sql);
int idx = 0;
int numErrors = 0;
for(Item item : uploadedItems) {
  // set statement params from item data - in the real code this is more dynamic
  ps.setLong(1, item.getIdentifier());
  ps.setString(2, item.getSomethingElse());
  // add this row to the current batch
  ps.addBatch();
  if((++idx % 1000) == 0) {
    try {
      numErrors += countFails(ps.executeBatch());
    } catch(BatchUpdateException e) {
      numErrors += countFails(e.getUpdateCounts());
    } catch(Exception e) {
      // handle other types of exception
    }
  }
}

// and do a final executeBatch when we run out of items

(countFails is just a function counting the number of items in the int[] that equal Statement.EXECUTE_FAILED). What I found was as I describe in the original report - when useBulkCopyForBatchInsert=true is set, if executeBatch fails with an exception then the batch is not cleared, so every subsequent executeBatch on that PreparedStatement fails in the same way even if the rows added since the last batch was executed are not ones that cause an error.

The batch is cleared if we are not using bulk copy, or the bug can be worked around by adding finally { ps.clearBatch(); }

@ianroberts
Copy link
Contributor Author

I suppose a simple unit test could be

  • Create a table with a single integer PK column
  • Prepare a statement to insert values into this table
  • Make a batch that inserts two values 1, 2 - this should work
  • Make a second batch that inserts 2, 3 - this should fail with a duplicate PK
  • Make a third batch that inserts 4, 5 - this one should succeed, and will with bulk copy disabled, but with bulk copy inserts enabled it will currently fail with the same duplicate PK as it tries to do a four item batch 2, 3, 4, 5

ianroberts added a commit to ianroberts/mssql-jdbc that referenced this issue Aug 17, 2022
…atch, whether via bulk update or via traditional batch insert.

Closes microsoft#1767

Signed-off-by: Ian Roberts <i.roberts@dcs.shef.ac.uk>
ianroberts added a commit to ianroberts/mssql-jdbc that referenced this issue Aug 17, 2022
Signed-off-by: Ian Roberts <i.roberts@dcs.shef.ac.uk>
@ianroberts
Copy link
Contributor Author

@Jeffery-Wasty I've added exactly this test to com.microsoft.sqlserver.jdbc.preparedStatement.RegressionTest in the linked PR, and I can confirm this new test fails with the current main and passes with the PR changes.

https://github.com/microsoft/mssql-jdbc/pull/1869/files#diff-2fefeb1ca71703ceda500369b6f7eac68ff53a6e3b03236175e6006eb6e394e9

MSSQL JDBC automation moved this from Under Investigation to Closed Issues Aug 30, 2022
Jeffery-Wasty added a commit that referenced this issue Aug 30, 2022
… batch (#1869)

* Ensure that batchParamValues is cleared in all cases when running a batch, whether via bulk update or via traditional batch insert.

Closes #1767

Signed-off-by: Ian Roberts <i.roberts@dcs.shef.ac.uk>

* Added regression test for #1767

Signed-off-by: Ian Roberts <i.roberts@dcs.shef.ac.uk>

* Cleaned up tests slightly.

* Minor formatting cleanup

Signed-off-by: Ian Roberts <i.roberts@dcs.shef.ac.uk>
Co-authored-by: Jeff Wasty <v-jeffwasty@microsoft.com>
@Jeffery-Wasty Jeffery-Wasty removed the Under Investigation Used for issues under investigation label Aug 30, 2022
@Jeffery-Wasty Jeffery-Wasty modified the milestone: 11.3.0 Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment