[supervisor] Make resource status request more resilient #12103

andreafalzetti · 2022-08-12T15:58:18Z

Description

Makes the resource status request more resilient by avoiding calling syncronously upstream to fetch resource status at every requests. With more clients adding functionalities such as gp top or workspace cpu/memory in jetbrains ide control center, the risk of getting rate-limited is guaranteed.

With the approached used in this PR, we retrieve data from upstream at every second and when clients request it, we return the latest available data. Additional, an exponential backoff strategy is in place to be more resilient.

Related Issue(s)

Fixes #12083

How to test

Compare how this prev env behaves in comparison with gitpod.io by running a workspace and run watch -n 0.1 gp top. In gitpod.io (prod) it will almost immediately fail by returning gRPC errors because ResourceStatus endpoint is pulled too frequently and we hit the rate-limit.

In the prev env, you should be able to run multiple watch -n 0.1 gp top without any errors.

Release Notes

NONE

Documentation

Werft options:

/werft with-preview

components/supervisor/pkg/supervisor/top.go

akosyakov · 2022-08-15T08:51:50Z

@andreafalzetti How do you test failure case?

components/supervisor/pkg/supervisor/services.go

components/supervisor/pkg/supervisor/top.go

components/supervisor/pkg/supervisor/top_test.go

andreafalzetti · 2022-08-15T13:18:16Z

How do you test failure case?

@akosyakov That's what is mainly left now to get this merged. If you have any suggestions or can point me to some similar tests, I would appreciate it. In the meantime I will attempt some ideas.

akosyakov · 2022-08-15T13:23:40Z

Run several watch gp top in parallel and then manipulate with sock file.

akosyakov · 2022-08-15T13:25:15Z

@andreafalzetti I actually cannot start a workspace it gets stopped immediately.

andreafalzetti · 2022-08-15T13:48:48Z

@andreafalzetti I actually cannot start a workspace if gets stopped immediately.

@akosyakov Weird. I was able to reproduce. Before today's changes it was working fine. Maybe it's the prev env, I will try to do a clean deployment

andreafalzetti · 2022-08-15T13:48:58Z

/werft run with-clean-slate-deployment

👍 started the job as gitpod-build-afalz-12083-supervisor-extend-rate-limiting.9
(with .werft/ from main)

akosyakov · 2022-08-15T13:49:16Z

You can check workspace logs, maybe supervisor panics somehow after last change, i.e. not initialised channel and then nil pointer exception.

andreafalzetti · 2022-08-15T13:52:24Z

@akosyakov If I look for the file pathes used to read memory/cpu, in my workspace, I don't see them. What am I missing?

e.g. /sys/fs/cgroup/memory/memory.limit_in_bytes -> https://github.com/gitpod-io/gitpod/blob/main/components/supervisor/pkg/supervisor/top.go#L67

cat: /sys/fs/cgroup/memory/memory.limit_in_bytes: No such file or directory

the sock file /.supervisor/info.sock is there, but I am not sure how to mock/alter it yet

akosyakov · 2022-08-15T13:55:30Z

If I look for the file pathes used to read memory/cpu, in my workspace, I don't see them. What am I missing?

for cgroup v2 there is not such files only via /.supervisor/info.sock It is expected if sock is missing then we try to compute with cgroup v1 if not possible we fail completely, one cannot compute such info for v2 from within of the workspace.

andreafalzetti · 2022-08-15T14:42:47Z

@akosyakov workspace starts now, it was re-closing a closed channel

components/supervisor/pkg/supervisor/top_test.go

akosyakov

works as advertised

/hold
if you want to clean up something

components/supervisor/pkg/supervisor/top.go

akosyakov · 2022-08-16T10:28:01Z

components/supervisor/pkg/supervisor/supervisor.go

@@ -257,6 +257,9 @@ func Run(options ...RunOption) {
 		go analyseConfigChanges(ctx, cfg, analytics, gitpodConfigService, gitpodService)
 	}

+	topService := NewTopService(Top)


(nit) I think for usual clients it should be just NewTopService()

@akosyakov do you mean without parameters?

components/supervisor/pkg/supervisor/top.go

components/supervisor/pkg/supervisor/top_test.go

akosyakov · 2022-08-16T10:31:11Z

I left some nits, can be merged without addressing them as well. Not sure about the test, using timeouts can lead to flakiness.

andreafalzetti · 2022-08-16T13:44:11Z

/unhold

andreafalzetti added component: supervisor team: IDE labels Aug 12, 2022

roboquat added do-not-merge/work-in-progress do-not-merge/release-note-label-needed size/M labels Aug 12, 2022

andreafalzetti changed the title ~~feat(supervisor): add top service~~ [supervisor] Make resource status request more resilient Aug 12, 2022

gitpod-io deleted a comment from roboquat Aug 12, 2022

andreafalzetti force-pushed the afalz/12083-supervisor-extend-rate-limiting branch 5 times, most recently from a5b8337 to 2bda963 Compare August 12, 2022 17:15

andreafalzetti marked this pull request as ready for review August 15, 2022 06:09

andreafalzetti requested a review from a team August 15, 2022 06:09

roboquat removed the do-not-merge/work-in-progress label Aug 15, 2022

andreafalzetti marked this pull request as draft August 15, 2022 06:10

roboquat added the do-not-merge/work-in-progress label Aug 15, 2022

andreafalzetti force-pushed the afalz/12083-supervisor-extend-rate-limiting branch from 2bda963 to 0f2d58f Compare August 15, 2022 06:37

andreafalzetti commented Aug 15, 2022

View reviewed changes

components/supervisor/pkg/supervisor/top.go Outdated Show resolved Hide resolved

andreafalzetti marked this pull request as ready for review August 15, 2022 06:39

roboquat removed the do-not-merge/work-in-progress label Aug 15, 2022

akosyakov reviewed Aug 15, 2022

View reviewed changes

components/supervisor/pkg/supervisor/top_test.go Show resolved Hide resolved

andreafalzetti force-pushed the afalz/12083-supervisor-extend-rate-limiting branch from 0f2d58f to 954ad36 Compare August 15, 2022 13:14

roboquat added size/L and removed size/M labels Aug 15, 2022

andreafalzetti force-pushed the afalz/12083-supervisor-extend-rate-limiting branch from 954ad36 to 13603a6 Compare August 15, 2022 14:14

akosyakov reviewed Aug 15, 2022

View reviewed changes

components/supervisor/pkg/supervisor/top_test.go Outdated Show resolved Hide resolved

akosyakov approved these changes Aug 15, 2022

View reviewed changes

roboquat added the do-not-merge/hold label Aug 15, 2022

jeanp413 reviewed Aug 15, 2022

View reviewed changes

components/supervisor/pkg/supervisor/top.go Outdated Show resolved Hide resolved

akosyakov reviewed Aug 16, 2022

View reviewed changes

components/supervisor/pkg/supervisor/top.go Outdated Show resolved Hide resolved

andreafalzetti force-pushed the afalz/12083-supervisor-extend-rate-limiting branch from 13603a6 to 97f6c60 Compare August 16, 2022 10:23

akosyakov reviewed Aug 16, 2022

View reviewed changes

components/supervisor/pkg/supervisor/top.go Outdated Show resolved Hide resolved

akosyakov reviewed Aug 16, 2022

View reviewed changes

components/supervisor/pkg/supervisor/top.go Outdated Show resolved Hide resolved

akosyakov reviewed Aug 16, 2022

View reviewed changes

components/supervisor/pkg/supervisor/top.go Show resolved Hide resolved

akosyakov reviewed Aug 16, 2022

View reviewed changes

components/supervisor/pkg/supervisor/top_test.go Show resolved Hide resolved

feat(supervisor): add top service

a47d1ea

andreafalzetti force-pushed the afalz/12083-supervisor-extend-rate-limiting branch from 97f6c60 to a47d1ea Compare August 16, 2022 13:32

andreafalzetti self-assigned this Aug 16, 2022

roboquat added release-note-none and removed do-not-merge/hold do-not-merge/release-note-label-needed labels Aug 16, 2022

roboquat merged commit 47c64d4 into main Aug 16, 2022

roboquat deleted the afalz/12083-supervisor-extend-rate-limiting branch August 16, 2022 13:50

roboquat added deployed: IDE IDE change is running in production deployed Change is completely running in production labels Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[supervisor] Make resource status request more resilient #12103

[supervisor] Make resource status request more resilient #12103

andreafalzetti commented Aug 12, 2022 •

edited

akosyakov commented Aug 15, 2022

andreafalzetti commented Aug 15, 2022 •

edited

akosyakov commented Aug 15, 2022

akosyakov commented Aug 15, 2022 •

edited

andreafalzetti commented Aug 15, 2022

andreafalzetti commented Aug 15, 2022 •

edited by werft-gitpod-dev-com bot

akosyakov commented Aug 15, 2022 •

edited

andreafalzetti commented Aug 15, 2022 •

edited

akosyakov commented Aug 15, 2022 •

edited

andreafalzetti commented Aug 15, 2022

akosyakov left a comment

akosyakov Aug 16, 2022 •

edited

andreafalzetti Aug 16, 2022

akosyakov commented Aug 16, 2022 •

edited

andreafalzetti commented Aug 16, 2022

[supervisor] Make resource status request more resilient #12103

[supervisor] Make resource status request more resilient #12103

Conversation

andreafalzetti commented Aug 12, 2022 • edited

Description

Related Issue(s)

How to test

Release Notes

Documentation

Werft options:

akosyakov commented Aug 15, 2022

andreafalzetti commented Aug 15, 2022 • edited

akosyakov commented Aug 15, 2022

akosyakov commented Aug 15, 2022 • edited

andreafalzetti commented Aug 15, 2022

andreafalzetti commented Aug 15, 2022 • edited by werft-gitpod-dev-com bot

akosyakov commented Aug 15, 2022 • edited

andreafalzetti commented Aug 15, 2022 • edited

akosyakov commented Aug 15, 2022 • edited

andreafalzetti commented Aug 15, 2022

akosyakov left a comment

Choose a reason for hiding this comment

akosyakov Aug 16, 2022 • edited

Choose a reason for hiding this comment

andreafalzetti Aug 16, 2022

Choose a reason for hiding this comment

akosyakov commented Aug 16, 2022 • edited

andreafalzetti commented Aug 16, 2022

andreafalzetti commented Aug 12, 2022 •

edited

andreafalzetti commented Aug 15, 2022 •

edited

akosyakov commented Aug 15, 2022 •

edited

andreafalzetti commented Aug 15, 2022 •

edited by werft-gitpod-dev-com bot

akosyakov commented Aug 15, 2022 •

edited

andreafalzetti commented Aug 15, 2022 •

edited

akosyakov commented Aug 15, 2022 •

edited

akosyakov Aug 16, 2022 •

edited

akosyakov commented Aug 16, 2022 •

edited