-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[supervisor] Make resource status request more resilient #12103
Conversation
a5b8337
to
2bda963
Compare
2bda963
to
0f2d58f
Compare
@andreafalzetti How do you test failure case? |
0f2d58f
to
954ad36
Compare
@akosyakov That's what is mainly left now to get this merged. If you have any suggestions or can point me to some similar tests, I would appreciate it. In the meantime I will attempt some ideas. |
Run several watch gp top in parallel and then manipulate with sock file. |
@andreafalzetti I actually cannot start a workspace it gets stopped immediately. |
@akosyakov Weird. I was able to reproduce. Before today's changes it was working fine. Maybe it's the prev env, I will try to do a clean deployment |
/werft run with-clean-slate-deployment 👍 started the job as gitpod-build-afalz-12083-supervisor-extend-rate-limiting.9 |
You can check workspace logs, maybe supervisor panics somehow after last change, i.e. not initialised channel and then nil pointer exception. |
@akosyakov If I look for the file pathes used to read memory/cpu, in my workspace, I don't see them. What am I missing? e.g.
the sock file |
for cgroup v2 there is not such files only via |
954ad36
to
13603a6
Compare
@akosyakov workspace starts now, it was re-closing a closed channel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works as advertised
/hold
if you want to clean up something
13603a6
to
97f6c60
Compare
@@ -257,6 +257,9 @@ func Run(options ...RunOption) { | |||
go analyseConfigChanges(ctx, cfg, analytics, gitpodConfigService, gitpodService) | |||
} | |||
|
|||
topService := NewTopService(Top) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) I think for usual clients it should be just NewTopService()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akosyakov do you mean without parameters?
I left some nits, can be merged without addressing them as well. Not sure about the test, using timeouts can lead to flakiness. |
97f6c60
to
a47d1ea
Compare
/unhold |
Description
Makes the resource status request more resilient by avoiding calling syncronously upstream to fetch resource status at every requests. With more clients adding functionalities such as
gp top
or workspace cpu/memory in jetbrains ide control center, the risk of getting rate-limited is guaranteed.With the approached used in this PR, we retrieve data from upstream at every second and when clients request it, we return the latest available data. Additional, an exponential backoff strategy is in place to be more resilient.
Related Issue(s)
Fixes #12083
How to test
Compare how this prev env behaves in comparison with gitpod.io by running a workspace and run
watch -n 0.1 gp top
. In gitpod.io (prod) it will almost immediately fail by returning gRPC errors because ResourceStatus endpoint is pulled too frequently and we hit the rate-limit.In the prev env, you should be able to run multiple
watch -n 0.1 gp top
without any errors.Release Notes
Documentation
Werft options: