Fix mysql deadlock #11320

thingstad · 2021-04-08T12:04:22Z

hashicorp-cla · 2021-04-08T12:04:25Z

All committers have signed the CLA.

hsimon-hashicorp · 2022-02-09T20:31:21Z

Hi @thingstad! Can you add a small changelog entry for this? We'll get it reviewed (ping @imthaghost). Thanks!

If node is current leader, the HA leader lock is monitored. If the connection is lost during the sleeping period, the monitor loop will never exit. This happend due to the fact that hasLock() does not provide an binary answer to the question "do we still have the lock", and due to the fact that QueryRow().Scan() does not return an error if the connection is lost. Nor does QueryRow().Scan() try to re-establish the lost connection.

ncabatoff · 2022-08-25T13:29:23Z

Hi @thingstad,

Do you suppose you could write a test to demonstrate the problem and the fix? Typically our physical backend implementations have integration tests. Sadly the mysql physical tests are relying on the user to provide a MYSQL_ADDR pointing to an existing DB they manage, and the tests are skipped otherwise. What they should be doing instead is use https://github.com/hashicorp/vault/blob/main/helper/testhelpers/mysql/mysqlhelper.go#L21-L21 to create a docker container for mysql. This will allow the tests to run in CI, which would already be a great improvement. It looks like this was already done for TestMySQLHABackend_LockFailPanic, but we failed to add use of PrepareTestContainer to the other tests at the time.

You could make that change as part of this PR, which would be nice but not required. What I would really like to see though is a test that shows that your change solves the problem, i.e. something that reproduces the steps from #11319 using docker containers. I realize that the bug may not manifest every time the db is restarted, but even if it takes many invocations of the tests to see the failure, that's still something. If it takes 20 attempts on average to make the failure happen before your change, and then we run the test 100 times with your change without seeing a failure, that's good enough evidence that it fixes the problem.

hsimon-hashicorp · 2023-11-04T01:06:20Z

Due to the age of this PR, I will go ahead and close it at this point. Please feel free to re-base and continue if you're willing to do so. Thanks!

vercel bot temporarily deployed to Preview – vault April 8, 2021 12:04 Inactive

vercel bot temporarily deployed to Preview – vault-storybook April 8, 2021 12:04 Inactive

GrahamDahlsveen mentioned this pull request Jan 4, 2022

HA MySQL/MariaDB: Deadlock on leader, if HA DB connection is lost #11319

Open

thingstad force-pushed the fix-mysql-deadlock branch from 199af44 to 144e50d Compare February 3, 2022 08:02

thingstad requested review from acahn and taoism4504 as code owners February 3, 2022 08:02

vercel bot temporarily deployed to Preview – vault-storybook February 3, 2022 08:02 Inactive

thingstad added 2 commits August 25, 2022 13:43

Chore: Added changelog file with issue reference

7be2a47

thingstad force-pushed the fix-mysql-deadlock branch from 144e50d to 7be2a47 Compare August 25, 2022 11:44

Chore: Added changelog text

ab98043

VioletHynes added the waiting-for-response label Aug 10, 2023

hsimon-hashicorp closed this Nov 4, 2023

github-actions bot removed the waiting-for-response label Nov 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mysql deadlock #11320

Fix mysql deadlock #11320

thingstad commented Apr 8, 2021

hashicorp-cla commented Apr 8, 2021 •

edited

hsimon-hashicorp commented Feb 9, 2022

ncabatoff commented Aug 25, 2022

hsimon-hashicorp commented Nov 4, 2023

Fix mysql deadlock #11320

Fix mysql deadlock #11320

Conversation

thingstad commented Apr 8, 2021

hashicorp-cla commented Apr 8, 2021 • edited

hsimon-hashicorp commented Feb 9, 2022

ncabatoff commented Aug 25, 2022

hsimon-hashicorp commented Nov 4, 2023

hashicorp-cla commented Apr 8, 2021 •

edited