-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support to Upsert/Insert Ignore on PDB #386
Conversation
**Summary:** - Introduces the capability to perform batch insert operations with duplicated entries without throwing a unique constraint violation exception. This is achieved by adding a new prepared statement to a `MappedEntity` when it is loaded. This statement will upsert the duplicate entries and will be implemented to each engine accordingly. This way, this prepared statement (upsert) will be used in the batch processing associated with the entity when adding and flushing the entries, but using the upserting any duplicated entries. This commit introduces support for the H2 and PostgreSQL engines for this operation. - For the H2 engine, the prepared statement is the `MERGE` command which updates existing rows, and insert rows that don't exist. The documentation for this engine can be found [here](http://www.h2database.com/html/commands.html#merge_into) ```sql MERGE INTO <TARGET_ENTITY> (cols...) VALUES(vals...) ``` - For the PostgreSQL engine, the insert statement introduces a `ON CONFLICT` clause that specifies an alternative action to raising a unique violation or exclusion constraint violation error. For the implementation, it was added the `DO UPDATE` action which will updating the columsn to the new values. It uses the values of the special table `EXCLUDED` to update it to the new values. The documentation for this engine can be found [here](https://www.postgresql.org/docs/current/sql-insert.html) ```sql INSERT INTO <TARGET_ENTITY> (cols...) VALUES (vals...) ON CONFLICT DO UPDATE SET (cols...) = EXCLUDED.(cols...) ``` **Test:** - A new test was added to verify that a batch with duplicate entries successfully inserts only one entry and do not throw an exception. - The test suite was run for every supported engine (H2 and PostgreSQL).
- Reverts import order to avoid boilerplate diff.
- Minor changes in the H2Engine MERGE command. - Refactors the INSERT ON CONFLICT in teh PostgreSqlEngine class, so it updates instead of ignoring duplicate entries. - Refactors test.
- Fix method name.
- Fix tests: Adds `assumeTrue` so it ignores the engines for two tests where the upsert prepared statement is not implemented.
- Fix tests: Adds cockroach db to exception as it extends the Postgres implementation.
- Fix tests: Fixes the h2 assumption as it requires more than one h2 engine.
src/main/java/com/feedzai/commons/sql/abstraction/batch/AbstractBatch.java
Outdated
Show resolved
Hide resolved
src/main/java/com/feedzai/commons/sql/abstraction/engine/impl/H2Engine.java
Outdated
Show resolved
Hide resolved
src/main/java/com/feedzai/commons/sql/abstraction/engine/impl/H2Engine.java
Show resolved
Hide resolved
public void flushUpsert() { | ||
logger.trace("Flush ignoring not available for MultithreadedBatch. Skipping ..."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot be that dead silent :)
If this is not implemented and if it is still being used, then something is not right -- and then, the callee logic should fail fast, so it can be patched as fast as possible. So, I would throw an UnsupportedOperationException
.
(at very least, this needs to be logged and error
)
src/main/java/com/feedzai/commons/sql/abstraction/engine/impl/PostgreSqlEngine.java
Outdated
Show resolved
Hide resolved
protected void processBatchUpsert(final DatabaseEngine de, final List<BatchEntry> batchEntries) throws DatabaseEngineException { | ||
/* | ||
Begin transaction before the addBatch calls, in order to force the retry of the connection if it was lost during | ||
or since the last batch. Otherwise, the addBatch call that uses a prepared statement will fail. | ||
*/ | ||
de.beginTransaction(); | ||
|
||
for (final BatchEntry entry : batchEntries) { | ||
de.addBatchUpsert(entry.getTableName(), entry.getEntityEntry()); | ||
} | ||
|
||
de.flushUpsert(); | ||
de.commit(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DRY - this is duplicated code and should be improved.
(if there is time before the release, I would not make it a priority, just letting a note)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires some further analysis, right off the bat I can't change it as from what I saw it could take some refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack! - we can address this later.
- Addresses comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
protected void processBatchUpsert(final DatabaseEngine de, final List<BatchEntry> batchEntries) throws DatabaseEngineException { | ||
/* | ||
Begin transaction before the addBatch calls, in order to force the retry of the connection if it was lost during | ||
or since the last batch. Otherwise, the addBatch call that uses a prepared statement will fail. | ||
*/ | ||
de.beginTransaction(); | ||
|
||
for (final BatchEntry entry : batchEntries) { | ||
de.addBatchUpsert(entry.getTableName(), entry.getEntityEntry()); | ||
} | ||
|
||
de.flushUpsert(); | ||
de.commit(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack! - we can address this later.
Add Support to Upsert/Insert Ignore on PDB
Summary:
Introduces the capability to perform batch insert operations with duplicated entries without throwing a unique constraint violation exception. This is achieved by adding a new prepared statement to a
MappedEntity
when it is loaded. This statement will upsert the duplicate entries and will be implemented to each engine accordingly. This way, this prepared statement (upsert) will be used in the batch processing associated with the entity when adding and flushing the entries, but using the upserting any duplicated entries. This commit introduces support for the H2 and PostgreSQL engines for this operation.For the H2 engine, the prepared statement is the
MERGE
command which updates existing rows, and insert rows that don't exist. The documentation for this engine can be found hereON CONFLICT
clause that specifies an alternative action to raising a unique violation or exclusion constraint violation error. For the implementation, it was added theDO UPDATE
action which will updating the columsn to the new values. It uses the values of the special tableEXCLUDED
to update it to the new values. The documentation for this engine can be found hereTest: