Clarify and explicitly order enroll host transaction #2318
Conversation
Previously this used an implicit transaction with `INSERT ... ON DUPLICATE KEY UPDATE`. Now we make the transaction explicit with the hopes of resolving some deadlock issues users encountered at very high scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to try this, I think it's fine. But I think it will make deadlocks (and performance) worse. It's replacing a single upsert, with 2 or 3 database calls.
I'm less sure about mysql, but certainly in postgres this is strictly worse for locking.
Looking at the mysql docs, I see these descriptions of the locks taken:
UPDATE ... WHERE ... sets an exclusive next-key lock on every record the search encounters. However, only an index record lock is required for statements that lock rows using a unique index to search for a unique row.
INSERT sets an exclusive lock on the inserted row. This lock is an index-record lock, not a next-key lock (that is, there is no gap lock) and does not prevent other sessions from inserting into the gap before the inserted row.
REPLACE is done like an INSERT if there is no collision on a unique key. Otherwise, an exclusive next-key lock is placed on the row to be replaced.
INSERT ... ON DUPLICATE KEY UPDATE differs from a simple INSERT in that an exclusive lock rather than a shared lock is placed on the row to be updated when a duplicat
e-key error occurs. An exclusive index-record lock is taken for a duplicate primary key value. An exclusive next-key lock is taken for a duplicate unique key value.
From that, the locking around ON DUPLICATE
sounds heavier. I also wonder about REPLACE
, though some discussion is that this is a delete; insert
under the hood, which would be a lose.
Also happy to chat on slack. Depending on what's happening, there may be other patterns.
assert.Equal(t, enrollSecretName, h.EnrollSecretName) | ||
|
||
h, err = ds.EnrollHost(tt.uuid, tt.nodeKey+"new", enrollSecretName+"new") | ||
require.Nil(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.Nil(t, err) | |
require.NoError(t, err) |
(nit)
This may help MySQL perform the locking in the expected order to resolve deadlocks.
Let me explain the reasoning some here. This is an attempt to solve the following deadlock, discussed in https://osquery.slack.com/archives/C1XCLA5DZ/p1601480664063800:
My hope is that by explicitly ordering the transaction and running the update or insert, we can get MySQL to acquire the locks in an order that avoids potential deadlocks. I have just updated the code to use the I agree with your analysis that this is likely less performant, but the motivation is to reduce the deadlocks due to the suspicion that those may be resulting in other errors. All of that said, I think I will drop this PR into drafts and keep debugging the root cause through the additional logging added in #2313 and see if #2321 possibly helps. |
I am not planning to merge this due to the enrollment issues we identified with the logging in #2313. |
Previously this used an implicit transaction with
INSERT ... ON DUPLICATE KEY UPDATE
. Now we make the transaction explicit with thehopes of resolving some deadlock issues users encountered at very high
scale.
These issues are possibly root-caused by misconfiguration of osquery
resulting in copied host identifiers, making this change less relevant. I
still believe it is worth being more explicit about this transaction.