Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TryInsert functions within the packages module use INSERT ... ON CONFLICT #21063

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

zeripath
Copy link
Contributor

@zeripath zeripath commented Sep 4, 2022

The TryInsert* functions within the packages models make incorrect assumptions
about transactional isolation within most databases. It is perfectly possible for
a SELECT to return nothing but an INSERT fail with a duplicate in most DBs as it
is only INSERT that the locking occurs.

This PR introduces a common InsertOnConflictDoNothing function which will attempt
to INSERT provided bean but will return 0 as the number of rows affected if there is a
conflict.

Fix #19586

Signed-off-by: Andrew Thornton art27@cantab.net

The TryInsert* functions within the packages models make incorrect assumptions
about transactional isolation within most databases. It is perfectly possible for
a SELECT to return nothing but an INSERT fail with a duplicate in most DBs as it
is only INSERT that the locking occurs.

This PR changes the code to simply try to insert first and if there is an error
then attempt to SELECT from the table. If the SELECT works then the INSERT error
is assumed to have been related to the unique constraint failure.

This technique avoids us having to parse the error returned from the DBs as
these are varied and different.

If the SELECT fails then the INSERT error is returned to the user.

Fix go-gitea#19586

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath
Copy link
Contributor Author

zeripath commented Sep 5, 2022

ugh looks like my worries about transactions were right - we're gonna need to use an upsert style...

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Sep 5, 2022
lunny
lunny previously approved these changes Sep 5, 2022
@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Sep 5, 2022
@zeripath zeripath added the pr/wip This PR is not ready for review label Sep 5, 2022
@zeripath zeripath changed the title Make TryInsert functions within the packages module try to insert first WIP: Make TryInsert functions within the packages module try to insert first Sep 5, 2022
@zeripath
Copy link
Contributor Author

zeripath commented Sep 5, 2022

I'm going to have to mark this WIP as we actually have to use the UPSERT forms otherwise the transactions will fail.

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath zeripath force-pushed the fix-19586-insert-first-then-try-to-get branch from 0663049 to b29ab42 Compare September 6, 2022 20:25
Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath zeripath removed the pr/wip This PR is not ready for review label Sep 7, 2022
@zeripath zeripath changed the title WIP: Make TryInsert functions within the packages module try to insert first Make TryInsert functions within the packages module try to insert first Sep 7, 2022
Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath zeripath changed the title Make TryInsert functions within the packages module try to insert first WIP: Make TryInsert functions within the packages module try to insert first Sep 7, 2022
@zeripath zeripath added the pr/wip This PR is not ready for review label Sep 7, 2022
return nil, err
}
if n != 0 {
return p, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The insert methods need to return the filled bean. For example it's expected that ID has a value in the return value which is not the case here.

Copy link
Contributor Author

@zeripath zeripath Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code auto fills the ID of the bean in the same way that xorm's Insert() does.

@lunny lunny modified the milestones: 1.18.0, 1.19.0 Oct 17, 2022
@lunny lunny modified the milestones: 1.19.0, 1.20.0 Jan 31, 2023
Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
models/db/common.go Outdated Show resolved Hide resolved
Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
@wxiaoguang
Copy link
Contributor

wxiaoguang commented Feb 8, 2023

If I understand correctly, the test cases haven't cover the behavior of comment:

// This function will update the ID of the provided bean if there is an insertion

That's the only test code for asserting m.ID, but it only tests a succeeded insertion

Update: out-dated

		_, _ = e.Exec("DELETE FROM one_unique")

		has, err := db.InsertOnConflictDoNothing(ctx, &OneUnique{})
		assert.Error(t, err)
		assert.False(t, has)

		toInsert := &OneUnique{Data: "test"}

		has, err = db.InsertOnConflictDoNothing(ctx, toInsert)
		assert.NoError(t, err)
		assert.True(t, has)
		assert.NotEqual(t, 0, toInsert.ID)

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Feb 8, 2023

If I understand correctly, there are still some problems in code:

  1. the has is misleading.

    • If insertion succeeds, it returns true, nil, if insertion fails, it returns false, nil
    • So the meaning of bool in (bool, error) is: whether the insertion succeeds, not about whether the record existed before.
  2. assert.NotEqual(t, int(0), int64(0)) always succeeds, But the ID is a int64, so you must use NotEqualValues or NotEmpty, otherwise the test code like assert.NotEqual(t, 0, toInsert.ID) always succeeds.

  3. There is only one ON CONFLICT DO NOTHING for SQLite3, I do not understand why it would return the ID for the duplicated insertion.

  4. The MySQL is highly likely needed to use update id=last_insert_id(id), but not update id=id. The second one doesn't seem the proper way to update the LAST_INSERT_ID for MySQL.

@lunny lunny dismissed their stale review February 8, 2023 06:53

Code changed so that I need to review again

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Feb 8, 2023
@lunny
Copy link
Member

lunny commented Feb 8, 2023

Looks like things become too complex to difficult to review.

@zeripath
Copy link
Contributor Author

zeripath commented Feb 8, 2023

If I understand correctly, the test cases haven't cover the behavior of comment:

It feels a bit like you're being deliberately obstructive and I don't quite understand why you're being this way.

This might be a language issue - but there are better ways to say I think you should add a line that checks if the ID remains at 0.

To which I might reply why would we care and why are you being so bloody difficult about it? That the ID remains unchanged is irrelevant. Fine I'll add it but honestly it's just making the test less clear and less easy to read for zero real world benefit.

It feels like you're targeting me for some reason and I don't understand why. You left maintainers - was it because of me?

If I understand correctly, there are still some problems in code:

  1. the has is misleading.

    • If insertion succeeds, it returns true, nil, if insertion fails, it returns false, nil
    • So the meaning of bool in (bool, error) is: whether the insertion succeeds, not about whether the record existed before.

I was using has as in «has been inserted» but I agree inserted would be better. Frankly its minor and you should just say use inserted instead of has instead of making some major issue about it.

  1. assert.NotEqual(t, int(0), int64(0)) always succeeds, But the ID is a int64, so you must use NotEqualValues or NotEmpty, otherwise the test code like assert.NotEqual(t, 0, toInsert.ID) always succeeds.

Again just make a suggestion. It's a good catch that I forgot about it but your tone is frankly insulting.

  1. There is only one ON CONFLICT DO NOTHING for SQLite3, I do not understand why it would return the ID for the duplicated insertion.

I don't understand your question?

There are two cases:

  1. INSERT proceeds because there is no conflict. The SQLite driver will return the inserted id because SQLite always does not.
  2. INSERT fails because there is a conflict.

In the second case we do not get an ID. Nor do we attempt to use it because 0 rows are affected.

  1. The MySQL is highly likely needed to use update id=last_insert_id(id), but not update id=id. The second one doesn't seem the proper way to update the LAST_INSERT_ID for MySQL.

This code is not attempting to update the id in fact it is deliberately not attempting to cause an update to the id - without the ON DUPLICATE KEY UPDATE id = id there is no ON CONFLICT DO NOTHING.

Signed-off-by: Andrew Thornton <art27@cantab.net>
@wxiaoguang
Copy link
Contributor

wxiaoguang commented Feb 8, 2023

It feels a bit like you're being deliberately obstructive and I don't quite understand why you're being this way.

I am not obstructive, but really want to help, I have tested a lot on my side.

It depends on the ID-behavior mentioned in #21063 (comment)

I have tested SQLite3 and MySQL. I do not think your code could work for some cases, and the tests were incorrect.

For SQLite, insert ... on conflict update set id=id returning id (do nothing doesn't work and won't return the id)

For MySQL, insert ... on duplicate key update id=last_insert_id(id)

And the test has to cover: insert(keyA), insert(keyB), insert(keyA) and assert getting A-id

That's all.

It feels like you're targeting me for some reason and I don't understand why. You left maintainers - was it because of me?

No, when reviewing, I only care about the code and solution are right or not. Maybe you are also too strict for some problems, so there are more likely divergences. There are also some strong objections when my PRs get reviewed.

but your tone is frankly insulting.

Sorry I didn't mean that. But just want make the problem clear. Could you suggest about how to express the problem more gently. ps: I have made some suggestions, but I didn't find that you had got the point, and I also got "Why don't you try it", maybe I also misunderstood something, but the problems are indeed there.

We can have a private chat with the SQL problems later.

This code is not attempting to update the id in fact it is deliberately not attempting to cause an update to the id - without the ON DUPLICATE KEY UPDATE id = id there is no ON CONFLICT DO NOTHING.

There is still no comment about why INSERT IGNORE doesn't work for MySQL in such case. (there was just a joking for the first question #21063 (comment) , I am not good enough to guess).


And, maybe I really misunderstood something -- about the design or about the logic (no comment nor test at the beginning), I do not know. Maybe the divergence starts here #21063 (comment): "The code auto fills the ID of the bean in the same way that xorm's Insert() does.": The bean should has the ID filled when there is no error.

So feel free to dismiss all my comments. Thank you for your work!

@zeripath
Copy link
Contributor Author

zeripath commented Feb 9, 2023

We don't use or get the ID when the INSERT fails because there is a conflict because OnConflictDoNothing.

I have tested SQLite3 and MySQL. I do not think your code could work for some cases, and the tests were incorrect.

For SQLite, insert ... on conflict update set id=id returning id (do nothing doesn't work and won't return the id)

Clearly I need to separate out the code paths here.

There is no RETURNING in the SQLite code. The RETURNING is only used in Postgres.

SQLite will always return the last inserted ID as part of its driver so we don't need to do that.

If there is a conflict we don't use it - we do nothing.

For MySQL, insert ... on duplicate key update id=last_insert_id(id)

No. As far as I can see that would cause every conflict to cause an update of the primary key (or fail.)

The code as written: ON DUPLICATE KEY UPDATE id = id makes the UPDATE a no-op so that no row is changed i.e. NOTHING happens when there is a conflict.

Again the code is not attempting to return the ID when there is a conflict.

And the test has to cover: insert(keyA), insert(keyB), insert(keyA) and assert getting A-id

See above, the code does not promise to get the ID when there is a conflict. It promises to do nothing.

There is still no comment about why INSERT IGNORE doesn't work for MySQL in such case. (there was just a joking for the first question #21063 (comment) , I am not good enough to guess).

Using INSERT IGNORE didn't work in my testing when there was a primary key with autoincrement - and if I didn't instead use the ON DUPLICATE KEY UPDATE no-op it would error out. This may have been warnings being returned rather than errors.

It seems however now that INSERT IGNORE will just work by itself but I do wonder if the warnings that IGNORE could cause failures or complaints by users.

And, maybe I really misunderstood something -- about the design or about the logic (no comment nor test at the beginning), I do not know. Maybe the divergence starts here #21063 (comment): "The code auto fills the ID of the bean in the same way that xorm's Insert() does.": The bean should has the ID filled when there is no error.

Ah. Perhaps that is the problem.

I'm not promising to get the ID when there is no insert.

The function has comment on it that says it will update the ID of the provided bean if it is inserted. It doesn't say it will do that when it's not inserted - and in fact it will NOT do that - instead on conflict it will do nothing as the function name suggests.

// InsertOnConflictDoNothing will attempt to insert the provided bean but if there is a conflict it will not error out
// This function will update the ID of the provided bean if there is an insertion
// This does not do all of the conversions that xorm would do automatically but it does quite a number of them
// once xorm has a working InsertOnConflictDoNothing this function could be removed.

There were always tests - just not specific ones. The package repositories code has good coverage of this function. As requested though I've added some specific tests and covered and fixed some more edge cases which should make the function usable in other contexts too.

@wxiaoguang
Copy link
Contributor

Using INSERT IGNORE didn't work in my testing when there was a primary key with autoincrement - and if I didn't instead use the ON DUPLICATE KEY UPDATE no-op it would error out. This may have been warnings being returned rather than errors.

It seems however now that INSERT IGNORE will just work by itself but I do wonder if the warnings that IGNORE could cause failures or complaints by users.

From my knowledge, I always use INSERT IGNORE for all cases and haven't got any problem. That's why the question first came .... because in my mind:

  • INSERT IGNORE should be fine for all cases, I never thought any user would be surprised or affected by the warning. If XORM is affected by the warning, that's another story. (if there was a comment for it, I could also get the point as early as possible).
  • Then after being told that the IGNORE doesn't work for autoincrement, I thought that since the insert on dup update id=id trick is used, it must be done by purpose to get the conflicted ID according to another comment, then the discussion happens.

Thank you for letting me know more details.

Anyway, it's clear for me now. I think this PR is good to have, maybe it could help to remove the workaround mutex for the package code after it gets merged:

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath
Copy link
Contributor Author

zeripath commented Feb 9, 2023

Ah yes that mutex can be removed too.

Apologies for the confusion earlier - it just felt we were going round and round in circles for no good reason and I couldn't understand why.

@lunny lunny removed the lgtm/need 1 This PR needs approval from one additional maintainer to be merged. label Feb 12, 2023
Signed-off-by: Andrew Thornton <art27@cantab.net>
@lunny
Copy link
Member

lunny commented Mar 12, 2023

Thanks for your hard-working @zeripath . It really should be a part of XORM. :)

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Mar 12, 2023

How about taking this PR now, and move to XORM in the future?

The "workaround mutex" really seems fragile, if there is a chance, we could use a DB insert-ignore instead of that in-process mutex for a stable release.

Signed-off-by: Andrew Thornton <art27@cantab.net>
func TestInsertOnConflictDoNothing(t *testing.T) {
defer tests.PrepareTestEnv(t)()

ctx := db.DefaultContext
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a parallel test?

@delvh delvh removed this from the 1.20.0 milestone Jun 4, 2023
@lunny
Copy link
Member

lunny commented May 7, 2024

I think the key problem is the package property table have no unique keys. per #30335 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Container registry - layer drops
8 participants