Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique lock key based on database and table name #8499

Closed
wants to merge 4 commits into from

Conversation

pandeybk
Copy link

@pandeybk pandeybk commented Mar 6, 2020

Signed-off-by: Balkrishna Pandey sachit.nep@gmail.com

@briankassouf
Copy link
Member

briankassouf commented Mar 7, 2020

@pandeybk Thanks for adding this, it's a good idea. However I'm a little concerned that this will cause a single cluster to have multiple active nodes if using our recommended upgrading steps: https://www.vaultproject.io/docs/upgrading/#ha-installations

Supposedly if the standby nodes are restarted first they will use the new lock location and one of them will become active. This split brains the cluster since the remaining node is active and using the other lock location.

Not sure the best path forward here, so please let me know if you have any thoughts.

@pandeybk
Copy link
Author

pandeybk commented Mar 9, 2020

Adding #8458 as reference. I was talking to @pkoritala in private to troubleshoot this issue.

Thank you @briankassouf for review, I am a little bit confused with lock implementation. In this use case, 2 separate installations of the vault HA cluster (2 separate clusters) are trying to use the same RDS endpoint but different database name and table name.

If you look at the following line,
https://github.com/hashicorp/vault/blob/master/physical/mysql/mysql.go#L627.

Here i.key variable looks like a constant resolving to the same variable. If you look at MySQL implementation function SELECT GET_LOCK(), this function seems global for the same RDS endpoint. let’s say cluster1 is acquiring this lock and not releasing it for a long period (based on math.MaxInt32 variable). In this case, Cluster2 is not able to use the same lock key and Vault is failing to insert a leader record in the lock table. I keep seeing the following error message for the second cluster for every vault operation.

Error checking leader status: Error making API request.

URL: GET https://127.0.0.1:8200/v1/sys/leader
Code: 500. Errors:

* sql: no rows in result set

So I was hoping if I change lock key to use, database + table name, which is also unique and result in the same value per vault installation, will resolve this problem.

Looks like based on your comment I am missing some implementation details here. My knowledge of Vault, GoLang, and MySQL itself is very limited. I will go through the HA documentation in detail.

If you have any other pointer and suggestions, will be super helpful.

@pandeybk
Copy link
Author

pandeybk commented Mar 9, 2020

@briankassouf, I went through HA documentation, it looks like PR is valid in this case, let me know my explanation above making sense.

@pkoritala
Copy link

@pandeybk @briankassouf , as per bug #8458, using this PR, I'm able to deploy vault on two clusters with same dbendpoint. Could someone review it?

@jzbruno
Copy link

jzbruno commented May 6, 2020

It's been a couple months. Any update on getting this merged?

@jzbruno
Copy link

jzbruno commented Aug 13, 2020

Following up with a little more info. This is in alignment with the recommended approach from the MySQL docs

"But be aware that it also enables a client that is not among the set of cooperating clients to lock a name, either inadvertently or deliberately, and thus prevent any of the cooperating clients from locking that name. One way to reduce the likelihood of this is to use lock names that are database-specific or application-specific. For example, use lock names of the form db_name.str or app_name.str. "

https://dev.mysql.com/doc/refman/5.7/en/locking-functions.html

Also, one of my colleagues brought up another concern. In MySQL > 5.7, there is a length limitation on the lock name of 64 characters. Since the database name and table name are variable, it might be best to use an md5 or sha1 checksum instead to avoid future issues.

@pandeybk
Copy link
Author

pandeybk commented Aug 13, 2020

thank you @jzbruno, didn't realize characters limits on lock key, yes we can definitely generate sha1 or md5sum based on a combination of the database name + table name. It will guarantee we will always have a unique lock-key for given rds instance. I can adjust that in this PR.

But I am not seeing any activity going on from HashiCorp team, considering its a 4 months old PR. Looks like no one else is reporting this problem. Not sure spending time in this PR makes sense.

@jzbruno
Copy link

jzbruno commented Aug 13, 2020

Hi @pandeybk. We are talking with Hashicorp about getting this reviewed and merged.

@jzbruno
Copy link

jzbruno commented Aug 22, 2020

@pandeybk Would you mind if CJ contributed a few things to your branch before we get this re-reviewed?

@pandeybk
Copy link
Author

Sounds good @jzbruno, go for it.

@cjbaar
Copy link

cjbaar commented Aug 24, 2020

@pandeybk @jzbruno Not sure if this is this the best approach, but I have a new PR open from my branch:
#9807

@banks
Copy link
Member

banks commented Jul 14, 2023

Closing in favour of #9807 which we still have some things to figure out on. Thanks @pandeybk and everyone else for contributing here!

@banks banks closed this Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants