Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(pkg/stash): close etcd sessions #1887

Merged
merged 1 commit into from
Sep 18, 2023
Merged

fix(pkg/stash): close etcd sessions #1887

merged 1 commit into from
Sep 18, 2023

Conversation

uhthomas
Copy link
Contributor

What is the problem I am trying to address?

Athens creates a new session for each 'Stash' request and then never closes it. The sessions (etcd leases) are consequently held forever and eventually leads to resource exhaustion on the etcd cluster.

How is the fix applied?

This has been fixed by closing the acquired session at the end of the request.

What GitHub issue(s) does this PR fix or close?

Fixes: #1886

@uhthomas uhthomas requested a review from a team as a code owner September 11, 2023 22:08
@uhthomas
Copy link
Contributor Author

I've also excluded (*go.etcd.io/etcd/client/v3/concurrency.Session).Close from errcheck as there's not really any action which can be taken if a session fails to close - the session and lease should automatically expire.

@uhthomas
Copy link
Contributor Author

uhthomas commented Sep 11, 2023

Question for reviewers: should we also remove the error check from (*go.etcd.io/etcd/clientv3/concurrency.Mutex).Unlock? There's not really anything which can be done about it. It may be a bit unfortunate to then return an error just because the mutex didn't unlock, despite successfully stashing the module. Especially because locks will expire when sessions are closed.

@uhthomas
Copy link
Contributor Author

I've made the changes inline with my previous comment, I hope this is okay 😄

}
defer func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer keeping it the way it was.
https://go.dev/play/p/cq0DGy9OJES

Knowing if unlock / session close fails is a useful thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I made this change is because it seems a bit strange to fail the request because the mutex failed to unlock. There's not really any action to be taken, and everything else has already succeeded. The best thing we can do here is log, not fail the entire request - as it doesn't matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; I think we should definitely log. Can we have some tests if possible please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manugupt1 I've replaced the etcd tests. They should now be a lot more realistic and actually connects to an etcd server. The tests cover the main code paths and ensure that a lease is acquired, and then freed again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for asking again, can we log if unlock or session close fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manugupt1 There doesn't really seem to be a standard for logging in this repository. What should I use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also am not entirely sure it's worth logging. It could be useful? Though neither of these conditions should have any long-lasting negative effects. If the connection to etcd is unhealthy then the lease will expire automatically with no hard done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manugupt1 What are your thoughts? I'm also keen to merge this one as it is quite a major issue for the etcd integration.

I am sort of surprised this issue wasn't raised before - I wonder how many people use Athens with etcd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer having at least a metric being emitted for things like these, I am not a user; so I will trust you!

@uhthomas
Copy link
Contributor Author

uhthomas commented Sep 13, 2023

We deployed this change yesterday and have seen significant improvement in both Athens and etcd.

etcd performance

Screenshot 2023-09-13 at 16 44 56 Screenshot 2023-09-13 at 16 44 19 Screenshot 2023-09-13 at 16 43 40

gRPC filtered by LeaseRevoke:

Screenshot 2023-09-13 at 16 45 50

Resource usage

Athens

Screenshot 2023-09-13 at 16 33 51 Screenshot 2023-09-13 at 17 20 43

etcd

Screenshot 2023-09-13 at 16 36 02 Screenshot 2023-09-13 at 17 22 07

pkg/stash/with_etcd_test.go Outdated Show resolved Hide resolved
Athens creates a new session for each 'Stash' request and then never closes it.
The sessions (etcd leases) are consequently held forever and eventually leads
to resource exhaustion on the etcd cluster.

This has been fixed by closing the acquired session at the end of the request.

Fixes: gomods#1886
@manugupt1 manugupt1 merged commit 9a14565 into gomods:main Sep 18, 2023
12 checks passed
kmuchmore pushed a commit to kmuchmore/athens that referenced this pull request Oct 26, 2023
@DrPsychick DrPsychick added this to the 0.13.0 milestone Dec 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Athens leaks etcd leases
3 participants