Adding bulk inserts or updates for PostgreSQL #226

PedroD · 2020-12-28T21:25:38Z

Adding support for PostgreSQL bulk insert or update (also allowing to bulk ignore conflicts).

         database.bulkInsert(Employees) {
                item {
                    set(it.id, 1)
                    set(it.name, "vince")
                    set(it.job, "engineer")
                    set(it.salary, 1000)
                    set(it.hireDate, LocalDate.now())
                    set(it.departmentId, 1)
                }
                item {
                    set(it.id, 5)
                    set(it.name, "vince")
                    set(it.job, "engineer")
                    set(it.salary, 1000)
                    set(it.hireDate, LocalDate.now())
                    set(it.departmentId, 1)
                }

                onDuplicateKey(Employees.id) {
                        set(it.salary, it.salary + 900)
                }
            }

(Will add +900 to each already existing employee salary)

         database.bulkInsert(Employees) {
                item {
                    set(it.id, 1)
                    set(it.name, "vince")
                    set(it.job, "engineer")
                    set(it.salary, 1000)
                    set(it.hireDate, LocalDate.now())
                    set(it.departmentId, 1)
                }
                item {
                    set(it.id, 5)
                    set(it.name, "vince")
                    set(it.job, "engineer")
                    set(it.salary, 1000)
                    set(it.hireDate, LocalDate.now())
                    set(it.departmentId, 1)
                }

                onDuplicateKey(Employees.id) {
                }
            }

(Will ignore any already existing employee and continue the query)

PedroD · 2020-12-29T12:04:20Z

ktorm-support-postgresql/src/main/kotlin/org/ktorm/support/postgresql/BulkInsertOrUpdate.kt

+ *      }
+ * ```
+ *
+ * @since 2.7


@vincentlauvlwj what version should go here?

Next version 3.3.0

vincentlauvlwj · 2020-12-29T17:12:20Z

Thank you for your contribution!!

One question: What's the difference between calling onDuplicateKey with an empty closure and not calling it? What if a user wants to throw an exception when any key conflict exists? Seems it is a more common case than just ignoring conflicts.

vincentlauvlwj · 2020-12-29T17:13:51Z

Also, please update the build.gradle file, add your GitHub ID to the developers info (line 90), that will let more people know your contributions.

…nsert-or-update query

PedroD · 2020-12-29T17:57:49Z

Thank you for your contribution!!

One question: What's the difference between calling onDuplicateKey with an empty closure and not calling it? What if a user wants to throw an exception when any key conflict exists? Seems it is a more common case than just ignoring conflicts.

Good question, I haven't though about that scenario...

But thinking about it now, I only see two possibilities:

Given that this method is an Insert or Update, it will potentially never throw an error on finding duplicates with the given key columns (by design). And this behavior is probably enforced by PostgreSQL itself. So the developer using a query like:

INSERT INTO $table ($col1, $col2, ...)
VALUES (...), (...), (...), ...
ON CONFLICT ($col1, $col2, ...)
DO UPDATE SET ....

Must be aware that this will never return an error for the first-level "collision" of ($col1, $col2, ...).

So, my conceptual line of though was the following; Given that when I use "upserts" I either want to:

Insert as new record if not exists, or update the existing record matching the conflict columns I mention in the query
Insert as new record if not exists, and ignore record if already exists matching the conflict columns I mention in the query
(maybe) Insert as new record if not exists, and receive an error if it conflicts

So, to account for scenario 1 we can use the form:

// Generates a DO UPDATE SET
onDuplicateKey(Employees.id) {
      set(it.salary, it.salary + 900)
}

For scenario 2 we can use the form:

// Generates a DO NOTHING
onDuplicateKey(Employees.id) {
}

For scenario 3, it cannot be done in this type of query as far as I understand by design of PostgreSQL. There's no way to throw an error if a record already exists matching the conflict columns I mention in the query.
However, it will throw an error from the DBMS if:

it finds a collision with any column constraint not matching the conflict ones mentioned in the query
if during the DO UPDATE operation one or more SETs create a new collision

This is all behavior that is native from PostgreSQL, given that there are only two actions we can do in these type of queries (NOTHING and UPDATE) according to their documentation.

So, for scenario 3, the user should probably use the default batchInsert which, by not performing any update/upsert, will return an error on collisions.

Does this make sense to you?

PedroD · 2020-12-29T18:06:01Z

PS. I made some commits renaming these methods to make more clear what they do and what operation they represent (Insert or Update, i.e. Upsert).

Also, what is the difference between a batchInsert and a bulkInsert in the case of MySQL, why not simply overwrite the default batchInsert?
Should I give the name of bulkInsertOrUpdate or batchInsertOrUpdate to the method I just implemented on this PR?

vincentlauvlwj · 2020-12-31T15:48:48Z

Thank you for the explanation, I fully understand what you mean. But would it be possible to generate an insert SQL without on conflict clause when onDuplicateKey is not called? For example:

database.bulkInsert(Employees) {
    item {
        set(it.name, "jerry")
        set(it.job, "trainee")
        set(it.managerId, 1)
        set(it.hireDate, LocalDate.now())
        set(it.salary, 50)
        set(it.departmentId, 1)
    }
    item {
        set(it.name, "linda")
        set(it.job, "assistant")
        set(it.managerId, 3)
        set(it.hireDate, LocalDate.now())
        set(it.salary, 100)
        set(it.departmentId, 2)
    }
}

Then generated SQL:

INSERT INTO t_employee (name, job, manager_id, hire_date, salary, department_id) 
VALUES (?, ?, ?, ?, ?, ?), (?, ?, ?, ?, ?, ?)

This is exactly the scenario 3 you mentioned, it inserts new records if not exists, and throws an error on collisions.

vincentlauvlwj · 2020-12-31T15:50:50Z

If it is possible to make onDuplicateKey optional, then we don't need to rename the function to bulkInsertOrUpdate. bulkInsert is good.

PedroD · 2021-01-01T14:12:24Z

If it is possible to make onDuplicateKey optional, then we don't need to rename the function to bulkInsertOrUpdate. bulkInsert is good.

Ok, I get the idea, I didn't go that way because it didn't seem to be the way the code was designed initially, meaning that I will need to change this behavior to allow an empty "conflictTarget" to exist (remove that ifEmpty part):

Is this behavior also intended on insertOrUpdate()? I am gonna assume "yes", so please let me know.

I applied these changes, so that both insertOrUpdate and bulkInsert now have their "update" part as optional, so that we can cover scenario 3 as well with both methods.

I also added a TODO note on the tests, because I think some tests have overlapping ids, so maybe we should make sure each test generates unique (thread-safe) entry ids? Let me know what you think.

I am still wondering if bulkInsert is the best name for this method, or if it should be batchInsertOrUpdate, so it is consistent with the pair (default) insert method and (pgsql) insertOrUpdate. Because as a user, if my IDE shows me batchInsert and bulkInsert I am left wondering what difference does bulk and batch reflect, and after reading the docs I see that batchInsert has no Update part and bulkInsert has. bulk does not convey this difference between these 2 methods IMO.

VS

or even VS

…ing the methods and updating the tests accordingly.

vincentlauvlwj · 2021-01-02T07:22:25Z

Merged, I will release v3.3.0 these days.

vincentlauvlwj · 2021-01-02T14:31:30Z

batchInsert is implemented based on JDBC's addBatch and executeBatch, it generates multiple SQL statements, while bulkInsert generates only one SQL statement, that's the difference.

vincentlauvlwj · 2021-01-02T14:39:06Z

You are right, the name bulkInsert is not good enough, with this name it should not have the Update part.

Maybe it is better to have another function bulkInsertOrUpdate, I will do a refactoring based on your code.

PedroD · 2021-01-04T23:22:44Z

@vincentlauvlwj awesome!

Keep up the great work! 💪

FeatureSpitter added 3 commits December 28, 2020 20:32

Adding bulk insert or update to postgresql driver and test

e691ff8

Testing the on-conflict do-update, and do-nothing

5955b46

Updating doc

9f08d8f

PedroD mentioned this pull request Dec 28, 2020

How can I do batch upserts with Ktorm? #220

Closed

FeatureSpitter added 2 commits December 28, 2020 21:53

Removing unused imports and formating code.

0feb836

Minor doc improvement.

999963b

PedroD commented Dec 29, 2020

View reviewed changes

FeatureSpitter added 4 commits December 29, 2020 17:33

Renaming the method to be more explicit on the "update" part of the i…

d42f349

…nsert-or-update query

Adding Github and contacts to build.gradle

16c6c62

Renaming dialect method to make the operation of update more explicit.

2ab1a9e

Fixing test

c0ea0e1

Fixing max line lengths

e4ce348

FeatureSpitter added 2 commits January 1, 2021 14:38

Allowing bulkInserts to have their "update" operation optional. Renam…

aa01ec5

…ing the methods and updating the tests accordingly.

Updating version

f1154c7

vincentlauvlwj changed the base branch from master to v3.3.x January 2, 2021 06:25

vincentlauvlwj merged commit 72c24ed into kotlin-orm:v3.3.x Jan 2, 2021

FeatureSpitter mentioned this pull request Feb 16, 2021

[BUG] Ktorm no longer supports Postgresql DO NOTHING #248

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding bulk inserts or updates for PostgreSQL #226

Adding bulk inserts or updates for PostgreSQL #226

PedroD commented Dec 28, 2020 •

edited

PedroD Dec 29, 2020

vincentlauvlwj Dec 29, 2020

vincentlauvlwj commented Dec 29, 2020

vincentlauvlwj commented Dec 29, 2020

PedroD commented Dec 29, 2020 •

edited

PedroD commented Dec 29, 2020 •

edited

vincentlauvlwj commented Dec 31, 2020 •

edited

vincentlauvlwj commented Dec 31, 2020

PedroD commented Jan 1, 2021 •

edited

vincentlauvlwj commented Jan 2, 2021

vincentlauvlwj commented Jan 2, 2021

vincentlauvlwj commented Jan 2, 2021

PedroD commented Jan 4, 2021

Adding bulk inserts or updates for PostgreSQL #226

Adding bulk inserts or updates for PostgreSQL #226

Conversation

PedroD commented Dec 28, 2020 • edited

PedroD Dec 29, 2020

Choose a reason for hiding this comment

vincentlauvlwj Dec 29, 2020

Choose a reason for hiding this comment

vincentlauvlwj commented Dec 29, 2020

vincentlauvlwj commented Dec 29, 2020

PedroD commented Dec 29, 2020 • edited

PedroD commented Dec 29, 2020 • edited

vincentlauvlwj commented Dec 31, 2020 • edited

vincentlauvlwj commented Dec 31, 2020

PedroD commented Jan 1, 2021 • edited

vincentlauvlwj commented Jan 2, 2021

vincentlauvlwj commented Jan 2, 2021

vincentlauvlwj commented Jan 2, 2021

PedroD commented Jan 4, 2021

PedroD commented Dec 28, 2020 •

edited

PedroD commented Dec 29, 2020 •

edited

PedroD commented Dec 29, 2020 •

edited

vincentlauvlwj commented Dec 31, 2020 •

edited

PedroD commented Jan 1, 2021 •

edited