Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support to Upsert/Insert Ignore on PDB #386

Merged
merged 9 commits into from
Jun 7, 2023
Merged

Conversation

victorcmg
Copy link
Contributor

@victorcmg victorcmg commented May 24, 2023

Add Support to Upsert/Insert Ignore on PDB

Summary:

  • Introduces the capability to perform batch insert operations with duplicated entries without throwing a unique constraint violation exception. This is achieved by adding a new prepared statement to a MappedEntity when it is loaded. This statement will upsert the duplicate entries and will be implemented to each engine accordingly. This way, this prepared statement (upsert) will be used in the batch processing associated with the entity when adding and flushing the entries, but using the upserting any duplicated entries. This commit introduces support for the H2 and PostgreSQL engines for this operation.

  • For the H2 engine, the prepared statement is the MERGE command which updates existing rows, and insert rows that don't exist. The documentation for this engine can be found here

MERGE INTO <TARGET_ENTITY> (cols...) VALUES(vals...)
  • For the PostgreSQL engine, the insert statement introduces a ON CONFLICT clause that specifies an alternative action to raising a unique violation or exclusion constraint violation error. For the implementation, it was added the DO UPDATE action which will updating the columsn to the new values. It uses the values of the special table EXCLUDED to update it to the new values. The documentation for this engine can be found here
INSERT INTO <TARGET_ENTITY> (cols...) VALUES (vals...)
ON CONFLICT
DO UPDATE
SET (cols...) = EXCLUDED.(cols...)

Test:

  • A new test was added to verify that a batch with duplicate entries successfully inserts only one entry and do not throw an exception.
  • The test suite was run for every supported engine (H2 and PostgreSQL).

@victorcmg victorcmg marked this pull request as draft May 24, 2023 16:29
@victorcmg victorcmg marked this pull request as ready for review May 29, 2023 13:46
**Summary:**

- Introduces the capability to perform batch insert operations with duplicated entries without throwing a unique constraint violation exception. This is achieved by adding a new prepared statement to a `MappedEntity` when it is loaded. This statement will upsert the duplicate entries and will be implemented to each engine accordingly. This way, this prepared statement (upsert) will be used in the batch processing associated with the entity when adding and flushing the entries, but using the upserting any duplicated entries. This commit introduces support for the H2 and PostgreSQL engines for this operation.

- For the H2 engine, the prepared statement is the `MERGE` command which updates existing rows, and insert rows that don't exist. The documentation for this engine can be found [here](http://www.h2database.com/html/commands.html#merge_into)

```sql
MERGE INTO <TARGET_ENTITY> (cols...) VALUES(vals...)
```

- For the PostgreSQL engine, the insert statement introduces a `ON CONFLICT` clause that specifies an alternative action to raising a unique violation or exclusion constraint violation error. For the implementation, it was added the `DO UPDATE` action which will updating the columsn to the new values. It uses the values of the special table `EXCLUDED` to update it to the new values. The documentation for this engine can be found [here](https://www.postgresql.org/docs/current/sql-insert.html)

```sql
INSERT INTO <TARGET_ENTITY> (cols...) VALUES (vals...)
ON CONFLICT
DO UPDATE
SET (cols...) = EXCLUDED.(cols...)
```

**Test:**

- A new test was added to verify that a batch with duplicate entries successfully inserts only one entry and do not throw an exception.
- The test suite was run for every supported engine (H2 and PostgreSQL).
- Reverts import order to avoid boilerplate diff.
- Minor changes in the H2Engine MERGE command.
- Refactors the INSERT ON CONFLICT in teh PostgreSqlEngine class, so it updates instead of ignoring duplicate entries.
- Refactors test.
- Fix tests: Adds `assumeTrue` so it ignores the engines for two tests where the upsert prepared statement is not implemented.
- Fix tests: Adds cockroach db to exception as it extends the Postgres implementation.
- Fix tests: Fixes the h2 assumption as it requires more than one h2 engine.
Comment on lines 344 to 346
public void flushUpsert() {
logger.trace("Flush ignoring not available for MultithreadedBatch. Skipping ...");
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot be that dead silent :)

If this is not implemented and if it is still being used, then something is not right -- and then, the callee logic should fail fast, so it can be patched as fast as possible. So, I would throw an UnsupportedOperationException.

(at very least, this needs to be logged and error)

Comment on lines +85 to +99
protected void processBatchUpsert(final DatabaseEngine de, final List<BatchEntry> batchEntries) throws DatabaseEngineException {
/*
Begin transaction before the addBatch calls, in order to force the retry of the connection if it was lost during
or since the last batch. Otherwise, the addBatch call that uses a prepared statement will fail.
*/
de.beginTransaction();

for (final BatchEntry entry : batchEntries) {
de.addBatchUpsert(entry.getTableName(), entry.getEntityEntry());
}

de.flushUpsert();
de.commit();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRY - this is duplicated code and should be improved.

(if there is time before the release, I would not make it a priority, just letting a note)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires some further analysis, right off the bat I can't change it as from what I saw it could take some refactoring.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack! - we can address this later.

Copy link

@diogo-anjos-fdz diogo-anjos-fdz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines +85 to +99
protected void processBatchUpsert(final DatabaseEngine de, final List<BatchEntry> batchEntries) throws DatabaseEngineException {
/*
Begin transaction before the addBatch calls, in order to force the retry of the connection if it was lost during
or since the last batch. Otherwise, the addBatch call that uses a prepared statement will fail.
*/
de.beginTransaction();

for (final BatchEntry entry : batchEntries) {
de.addBatchUpsert(entry.getTableName(), entry.getEntityEntry());
}

de.flushUpsert();
de.commit();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack! - we can address this later.

@lfac-pt lfac-pt merged commit 1ef860d into feedzai:master Jun 7, 2023
63 of 66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants