Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return all generated keys for inserts #245

Open
Suraiya-Hameed opened this issue Apr 6, 2017 · 22 comments
Open

Return all generated keys for inserts #245

Suraiya-Hameed opened this issue Apr 6, 2017 · 22 comments
Labels
Backlog The topic in question has been recognized and added to development backlog Enhancement An enhancement to the driver. Lower priority than bugs. External Issue is due to an external source we do not control.
Projects

Comments

@Suraiya-Hameed
Copy link
Contributor

If multiple values are inserted with Statement.RETURN_GENERATED_KEYS set, driver returns only the last generated key/identity value on calling getGeneratedKeys. But according to JDBC spec,

The ResultSet object returned by getGeneratedKeys will contain a row for each value that a statement generated.

@Suraiya-Hameed Suraiya-Hameed added this to the Long Term Goals milestone Apr 6, 2017
@ChristianLutz
Copy link

ChristianLutz commented Apr 7, 2017

Hello,

Will this also work in combination with TVP, so all generated keys will returned in the correct order? E.g. I want to insert three rows TVP(ds1, ds2, ds3) and as a result I will get (pkOfds1, pkOfds2, pkOfds3).

thanks
Christian

@Suraiya-Hameed
Copy link
Contributor Author

@ChristianLutz right now the support generatedKey is a long term goal. We will consider it with TVP too while designing.
Thanks!

@irhiggs
Copy link

irhiggs commented Mar 1, 2018

Is this purely a JDBC isue or is there something required to change on the server?

@peterbae
Copy link
Contributor

peterbae commented Mar 28, 2018

Hi @irhiggs, as far as I can tell, this is a purely JDBC issue and no change is required on the server's end (if you meant in terms of how the user should configure the server environment). If the server is not sending a list of all the generated keys that need to be returned, then a change might be required on the server's end.

@almothafar
Copy link

Status of this after like 3 years?

@peterbae
Copy link
Contributor

peterbae commented Jul 7, 2020

This is a limitation from SQL Server's end and there's not much the driver can do. After speaking with the SQL Server team, changing the server response to return all generated keys instead of the last one (similar to OracleDB for example) would not be a trivial task, and the change likely won't be made as it will break a lot of existing design. However, we're keeping this issue open for future references.

@peterbae
Copy link
Contributor

peterbae commented Jul 7, 2020

To add a bit more background to this problem - currently the driver attempts to get the generated keys by selecting SCOPE_IDENTITY() after the user's query. However, this only returns the last generated value, and if we want to return all generated values from a statement, we would have to use an OUTPUT clause.

The problem with this approach is that we would be modifying the user's statement to put an OUTPUT clause in the middle of the statement (as per OUTPUT syntax), and we don't want our driver to modify the user's query. Even if the driver were to parse the user's statement to put an OUTPUT clause in it, it would not be a trivial task to parse the user's query since it needs to find where in the middle of the statement the clause would need to be inserted. Also, since only one OUTPUT clause is allowed, if the user's statement already contains an OUTPUT clause, we wouldn't be able to put another OUTPUT clause in there.

Note that this is a lot easier on OracleDB's end due to syntactical differences, where their equivalent of OUTPUT clause (RETURNING ROW ID INTO) comes at the end of a statement. I hope this explains why for now this issue is not being solved by the team for the time being.

@MelleD
Copy link

MelleD commented Mar 25, 2022

@peterbae Is there a workaround or possibility to get the next generated IDS in order to set them yourself, e.g.
Get 100 IDs
Insert the 100 IDS and the batch insert could be working just with 1 more select to fetching ids?

@MelleD
Copy link

MelleD commented Apr 5, 2022

@peterbae also what i not understand if i use the OUTPUT

-- Colleagues table with identity column
INSERT INTO colleagues (name)
OUTPUT INSERTED.id
VALUES 
    ('Foo'),
    ('Blubb'),
    ('Blaa')

The output is

id
1
2
3

But with the sql driver

 ps.executeBatch(); --> error
 
java.sql.BatchUpdateException: A result set for the update operation was generated
	at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2116)


Thats the plan to get a result set ;)

So if you use an OUTPUT manually, a workaround should be that you can somehow evaluate the result set. As far I see the result set contains the generated ids, correct?

Error is thrown here:

                    if (needsPrepare || numBatchesPrepared == numBatches) {
                        ensureExecuteResultsReader(batchCommand.startResponse(getIsResponseBufferingAdaptive()));

                        boolean retry = false;
                        while (numBatchesExecuted < numBatchesPrepared) {
                            // NOTE:
                            // When making changes to anything below, consider whether similar changes need
                            // to be made to Statement batch execution.

                            startResults();

                            try {
                                // Get the first result from the batch. If there is no result for this batch
                                // then bail, leaving EXECUTE_FAILED in the current and remaining slots of
                                // the update count array.
                                if (!getNextResult(true))
                                    return;

                                // If the result is a ResultSet (rather than an update count) then throw an
                                // exception for this result. The exception gets caught immediately below and
                                // translated into (or added to) a BatchUpdateException.
                                if (null != resultSet) {
                                    SQLServerException.makeFromDriverError(connection, this,
                                            SQLServerException.getErrString("R_resultsetGeneratedForUpdate"), null,
                                            false);
                                }

Isn't it simply possible not to throw an exception and to make the complete resultset (in my opinion also in the exception) accessible from the outside for a workaround. Better a workaround in the driver than no option at all.

@lilgreenbird lilgreenbird added the Enhancement An enhancement to the driver. Lower priority than bugs. label Apr 6, 2022
@lilgreenbird
Copy link
Member

hi @MelleD

executeBatch does not support getting generated keys. getGeneratedKeys is only supported for single, not batch updates.

As previously mentioned this issue is due to a SQL Server limitation, so until the server supports returning all the generated keys in the response there is not much the driver can do. As previously mentioned, implementing a hack in the driver to get the keys is not a trivial task as it involves parsing the user query which could be quite complex and will break existing design. This issue is kept open only for tracking purposes however as mentioned it is unlikely to make the cut any time soon as it does not look like this will be changed at the server side, and this enhancement request will be placed behind other higher priority issues for the driver.

@MelleD
Copy link

MelleD commented Apr 8, 2022

Hey @lilgreenbird

Thanks for the quick reply, but sorry to disagree with you. I think a workaround can be implemented, when the sql statement use the OUTPUT statement. And my suggestion was not to manipulate the user input. The user should actively use the output statement.

See OUTPUT INSERTED.id

I built the following test:


	@Test
	public void testBatch() throws SQLException {
		try (Connection con = getConnection(); Statement stmt = con.createStatement()) {

			stmt.executeUpdate("CREATE TABLE " + AbstractSQLGenerator.escapeIdentifier(tableName)
					+ " (id int IDENTITY(1,1) PRIMARY KEY," + "lastname varchar(255) NOT NULL)");

		}

		try (Connection con = getConnection();
				PreparedStatement stmt = con
						.prepareStatement("INSERT INTO" + AbstractSQLGenerator.escapeIdentifier(tableName)
								+ "(lastname) OUTPUT INSERTED.id VALUES(?)")) {

			stmt.setString(1, "Test1");
			stmt.addBatch();

			stmt.setString(1, "Test2");
			stmt.addBatch();
			stmt.executeBatch();

		} 
	}

Then i manipulate the SQLServerPreparedStatement line 3017

while (numBatchesExecuted < numBatchesPrepared) {
	// NOTE:
	// When making changes to anything below, consider whether similar changes need
	// to be made to Statement batch execution.

	startResults();
	try {

		while (getNextResult(true)) {
			if (resultSet == null) {
				continue;
			}
			while (resultSet.next()) {
				final int id = resultSet.getInt(1);
				System.out.println("Generated id" + id);

			}

		}

		SQLServerException.makeFromDriverError(connection, this,
				SQLServerException.getErrString("R_resultsetGeneratedForUpdate"), null, false);
} catch (final SQLServerException e) {

Output is
Generated id 1
//stacktrace
Generated id 2

You can also try (id int IDENTITY(10,5) then the output is
Generated id 10
//stacktrace
Generated id 15

So the client and driver has the right generated keys. Just a way/workaround how to return it to the user is needed (maybe into the exception)

Of course it's a workaround and you have to actively use the OUTPUT statement, but still better than no solution at all because then you can't use batch inserts and that's a disaster.

@MelleD
Copy link

MelleD commented Apr 26, 2022

Hey @lilgreenbird, @peterbae

some feedback here?

@David-Engel
Copy link
Contributor

According to the JDBC spec, executeBatch():

Submits a batch of commands to the database for execution and if all commands execute successfully, returns an array of update counts. The int elements of the array that is returned are ordered to correspond to the commands in the batch, which are ordered according to the order in which they were added to the batch. The elements in the array returned by the method executeBatch may be one of the following:

Unfortunately, SQL Server doesn't return a result set for every item in the batch. Only for batches that produce a result set. If some batches produce a result set and some batches don't, there is no way for the driver to correctly correlate a batch with its result set.

@MelleD
Copy link

MelleD commented May 11, 2022

@David-Engel how i said it's a workaround for people who are knowing what they do. Of course it's not spec conform, but the current state is also not spec conform.
The driver could allow a workaround for a statement which includes 'OUTPUT INSERTED.id', because then the SQL Server return a result set for batch inserts see examples above.
So just
if the sql statement contains 'OUTPUT INSERTED.id' --> return the result set for insert
if the sql statement not contains 'OUTPUT INSERTED.id' --> process like now

@David-Engel why should this not work?

@David-Engel
Copy link
Contributor

@MelleD

if the sql statement contains 'OUTPUT INSERTED.id' --> return the result set for insert

Just because a single statement includes 'OUTPUT INSERTED.id' doesn't mean it will produce one and only one result set. A batch can also consist of many statements. The driver has no way of telling the server to only return a single result set for a statement nor can it tell which result set(s) returned by the server should be associated with each batch statement.

If we implement something non-spec-conforming for "people who are knowing what they do", people who don't know that it doesn't conform to the spec will try to use it and raise issues when it doesn't work correctly. Or worse, they will use it and then rely on incorrect results without realizing they are incorrect.

@MelleD
Copy link

MelleD commented May 11, 2022

@David-Engel
As I said currently it is also not spec compliant. And it's a (performance) catastrophe not to get generated ids back from a batch (with same statements of inserts), even though you can do it with a simple SQL statement.
Yes maybe tickets will come, yes maybe something will happen, but it's still better than doing nothing. You just could take the options that the SQL server offers and that way is just 'OUTPUT INSERTED'.
I think the example is very clear and for a first draft a total normal usecase.

@lilgreenbird lilgreenbird added the External Issue is due to an external source we do not control. label Jun 15, 2022
@deusaquilus
Copy link

deusaquilus commented Aug 8, 2022

Quoting @MelleD, please see my response below...

Thanks for the quick reply, but sorry to disagree with you. I think a workaround can be implemented, when the sql statement use the OUTPUT statement. And my suggestion was not to manipulate the user input. The user should actively use the output statement.

See OUTPUT INSERTED.id
...

Output is Generated id 1 //stacktrace Generated id 2

You can also try (id int IDENTITY(10,5) then the output is Generated id 10 //stacktrace Generated id 15

So the client and driver has the right generated keys. Just a way/workaround how to return it to the user is needed (maybe into the exception)

Of course it's a workaround and you have to actively use the OUTPUT statement, but still better than no solution at all because then you can't use batch inserts and that's a disaster.

@MelleD According to what I understand about your approach (please correct me if I'm wrong), it essentially requires calling preparedStatement.executeQuery after each call to preparedStatement.addBatch. The problem with that is typically each addBatch call in JDBC is done for a single row-insert. So that means that you are essentially querying the DB once for every inserted row (this is the literal definition of the N+1 problem). The only way to avoid this problem is to do inserts with multiple VALUES clauses i.e.

INSERT ... VALUES (row1ColumnA, row1ColumnB, ...), (row2ColumnA, row1ColumnB, ....), ...(rowNColumnA, rowNColumnB, ...) 

...and to do this for some large N amount of rows (1000 for each addBatch seems to work well in practice). I have implemented such a feature in ProtoQuill and am therefore successfully using your workaround (here).

However, I do not think it is realistic to expect every ORM library to optimize value-batching in this way. The preparedStatement.getGeneratedKeys mechanism works after executeBatch for Postgres, MySQL, H2, and Oracle for single-row inserts after each addBatch call. I think it should work the same way for MSSQL.

@stolsvik
Copy link

After reading this thread, I believe the MS JDBC Driver team, or at least @lilgreenbird and @David-Engel, needs to carefully read @MelleD's comments again!

What is suggested is that if the user inserts the OUTPUT, the driver refuses to give access to the generated ids even though they are available for the driver code at that point.

IIUC: The driver thus actively hinders a way for the user to overcome the deficiency of the server/protocol!

(What is not suggested, is a way to make the JDBC driver compliant by clever hacks, modification of query, or somesuch. Just a way (hackish or not) to get to the actual result of the user's query, the result which is actually existing!)

@lilgreenbird
Copy link
Member

@stolsvik please re-read comment above that this is a SQL Server limitation. This issue is left open for tracking purposes only the driver team is not going to be implementing any hacks to workaround. This is filed in our backlog we will look into supporting when SQL Server addresses the limitation.

@lilgreenbird lilgreenbird added the Backlog The topic in question has been recognized and added to development backlog label Aug 12, 2022
@lilgreenbird lilgreenbird removed this from the Long Term Goals milestone Aug 12, 2022
@mikghgit
Copy link

I just checked with my C# developers, who claimed they could easily get all inserted ids returned in one transaction. Being sceptical having read this thread they demoed it to me.

They use Entity Framework as their ORM but nevertheless... Looking at the SQL output it turns out that Microsoft (your colleagues?) in fact allow OUTPUT INTO in the ODBC-drivers to return the inserted identity values.

I may of course miss some important context to understand this, but as a layman it seems inconsistent that what is not considered feasible for JDBC (and I do understand your reasoning) is a valid approach for the ODBC-drivers. Would you be interested in shedding some light on why this is?

@sergeykad
Copy link

It works for identity PK, but if you use sequence for example it will not work.
In general Microsoft's JDBC implementation isn't the best.

@lilgreenbird
Copy link
Member

fyi this issue is already filed in our backlog we will review the list of all bug fixes and/or feature requests when we do planning each semester. Please note that due to limited resources obviously not every bug or feature request will make the cut but we do take into consideration the number of user requests and upvotes so please feel free to fill out the survey on fixes/feature requests that you would like to see in the driver in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backlog The topic in question has been recognized and added to development backlog Enhancement An enhancement to the driver. Lower priority than bugs. External Issue is due to an external source we do not control.
Projects
MSSQL JDBC
  
Backlog
Development

No branches or pull requests