[supervisor] Trigger instance update if missing after port exposure #7058

corneliusludmann · 2021-12-03T14:51:56Z

Description

When for some reason the server stops sending instance updates ports are getting stuck in detecting. This PR checks 1 minute after port exposure if we got the port exposure information from the server and if not it asks actively for an instance update.

@akosyakov @csweichel After implementing this, I'm not sure if this is 100 % that was you had in mind. I'm very open to any suggestions for other implementations.

Related Issue(s)

Mitigate/Fixes #6778

How to test

I tested this PR by adding this change:

diff --git a/components/supervisor/pkg/ports/exposed-ports.go b/components/supervisor/pkg/ports/exposed-ports.go
index 5408a5b3..001cbcbc 100644
--- a/components/supervisor/pkg/ports/exposed-ports.go
+++ b/components/supervisor/pkg/ports/exposed-ports.go
@@ -120,6 +120,13 @@ func (g *GitpodExposedPorts) Observe(ctx context.Context) (<-chan []ExposedPort,
                for {
                        select {
                        case u := <-updates:
+
+                               // TODO: ignoring all updates just for testing purposes
+                               // remove the following block before merging
+                               if u != nil {
+                                       break
+                               }
+
                                res := getExposedPorts(u)
                                if res == nil {
                                        return

This change ignores all instance updates from server. With this we see the same as we see in prod: The ports get stuck in “detecting”. After 1 minute, supervisor logs “we haven't seen an instance update with port exposure info after 1 minute” and “detecting” disappears.

Release Notes

Mitigate ports getting stuck in detecting

roboquat · 2021-12-03T14:52:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from corneliusludmann after the PR has been reviewed.

Associated issue: #6778

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

components/supervisor/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov · 2021-12-03T14:52:17Z

Codecov Report

Merging #7058 (3530a57) into main (81a86c2) will increase coverage by 17.89%.
The diff coverage is 0.00%.

@@             Coverage Diff             @@
##             main    #7058       +/-   ##
===========================================
+ Coverage   19.04%   36.94%   +17.89%     
===========================================
  Files           2       19       +17     
  Lines         168     4623     +4455     
===========================================
+ Hits           32     1708     +1676     
- Misses        134     2781     +2647     
- Partials        2      134      +132

Flag	Coverage Δ
components-local-app-app-linux-amd64	`?`
components-local-app-app-linux-arm64	`?`
components-local-app-app-windows-386	`?`
components-local-app-app-windows-amd64	`?`
components-local-app-app-windows-arm64	`?`
components-supervisor-app	`36.94% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
components/supervisor/pkg/ports/exposed-ports.go	`0.00% <0.00%> (ø)`
components/supervisor/pkg/ports/ports.go	`60.60% <0.00%> (ø)`
components/local-app/pkg/auth/pkce.go
components/local-app/pkg/auth/auth.go
components/supervisor/pkg/supervisor/tasks.go	`58.56% <0.00%> (ø)`
...mponents/supervisor/pkg/supervisor/notification.go	`83.95% <0.00%> (ø)`
components/supervisor/pkg/terminal/ring-buffer.go	`45.65% <0.00%> (ø)`
components/supervisor/pkg/dropwriter/dropwriter.go	`73.46% <0.00%> (ø)`
components/supervisor/pkg/terminal/terminal.go	`64.52% <0.00%> (ø)`
components/supervisor/pkg/ports/slirp4netns.go	`0.00% <0.00%> (ø)`
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81a86c2...3530a57. Read the comment docs.

akosyakov

It would be nice to add couple tests for it by mocking the api service in ports_test.go.

akosyakov · 2021-12-06T07:24:41Z

components/supervisor/pkg/ports/exposed-ports.go

-		errchan = make(chan error, 1)
-	)
+	errchan := make(chan error, 1)
+	g.exposedPorts = make(chan []ExposedPort)


it would imply that now we can have only one listener

akosyakov · 2021-12-06T07:39:16Z

components/supervisor/pkg/ports/exposed-ports.go

+	if err != nil {
+		return err
+	}
+	res := getExposedPorts(wsInfo.LatestInstance)


I think it is better if we do sync on level of APIoverJSONRPC in such way we don't need to worry about changing semantic of its clients.

i.e. add APIoverJSONRPC.SyncInstance, you can look how we appliy similar logic in the frontend, there are 2 important things:

avoid multiple concurrent syncs, i.e. last one wins for isntance like here:

gitpod/components/gitpod-protocol/src/gitpod-service.ts

Lines 529 to 549 in b4fa0fc

private sync(): void {

this.cancelSync();

this.syncTokenSource = new CancellationTokenSource();

const token = this.syncTokenSource.token;

this.syncQueue = this.syncQueue.then(async () => {

if (token.isCancellationRequested) {

return;

}

try {

const info = await this.service.server.getWorkspace(this._info.workspace.id);

if (token.isCancellationRequested) {

return;

}

this._info = info;

this.source = 'sync';

this.onDidChangeEmitter.fire(undefined);

} catch (e) {

console.error('failed to sync workspace instance:', e)

}

})

}

resolve conflicts between instance updates and sync, if sync seen newer data then we shoudl ignore new instance update, it can be done based on phases like here:

gitpod/components/gitpod-protocol/src/gitpod-service.ts

Lines 568 to 571 in b4fa0fc

if (instance.id !== this.info.latestInstance?.id) {

return false;

}

return phasesOrder[instance.status.phase] < phasesOrder[this.info.latestInstance.status.phase];

corneliusludmann · 2021-12-06T14:27:02Z

Anton and I discussed that we want to postpone this change for now and start with adding more logging (#7083).

akosyakov · 2021-12-10T13:00:36Z

@iQQBot Could you picked up as a mitigation for #6778?
while @geropl trying to solver the root cause in server: #7054

geropl · 2021-12-10T15:15:34Z

@akosyakov I had a look into supervisor, and it seems we're not using a reconnect handler. That could help making sure we see all events we're waiting for, even across reconnects.

akosyakov · 2021-12-10T15:17:27Z

@akosyakov I had a look into supervisor, and it seems we're not using a reconnect handler. That could help making sure we see all events we're waiting for, even across reconnects.

Do you mean to do force getWorkspace on reconnect similarly how we do it in the local companion?

geropl · 2021-12-10T15:18:50Z

Still not 100% sure this caused the "port exposure" troubles, but qualifies as general error handler we talked about here.

stale · 2021-12-20T15:50:30Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

akosyakov · 2021-12-20T17:44:23Z

I close it for now. We would like to add some syncing but on reconnection to Gitpod Server.

[supervisor] Trigger instance update if missing after port exposure

3530a57

roboquat added release-note team: IDE labels Dec 3, 2021

roboquat added the size/L label Dec 3, 2021

akosyakov reviewed Dec 6, 2021

View reviewed changes

corneliusludmann marked this pull request as draft December 6, 2021 13:55

roboquat added the do-not-merge/work-in-progress label Dec 6, 2021

akosyakov mentioned this pull request Dec 17, 2021

server stops sending events to supervisor occasionally #7054

Closed

stale bot added the meta: stale This issue/PR is stale and will be closed soon label Dec 20, 2021

akosyakov closed this Dec 20, 2021

akosyakov mentioned this pull request Jan 13, 2022

[supervisor] Improve observability/error handling around openPort #7574

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[supervisor] Trigger instance update if missing after port exposure #7058

[supervisor] Trigger instance update if missing after port exposure #7058

corneliusludmann commented Dec 3, 2021

roboquat commented Dec 3, 2021

codecov bot commented Dec 3, 2021 •

edited

akosyakov left a comment •

edited

akosyakov Dec 6, 2021

akosyakov Dec 6, 2021 •

edited

corneliusludmann commented Dec 6, 2021

akosyakov commented Dec 10, 2021

geropl commented Dec 10, 2021 •

edited

akosyakov commented Dec 10, 2021

geropl commented Dec 10, 2021

stale bot commented Dec 20, 2021

akosyakov commented Dec 20, 2021

	private sync(): void {
	this.cancelSync();
	this.syncTokenSource = new CancellationTokenSource();
	const token = this.syncTokenSource.token;
	this.syncQueue = this.syncQueue.then(async () => {
	if (token.isCancellationRequested) {
	return;
	}
	try {
	const info = await this.service.server.getWorkspace(this._info.workspace.id);
	if (token.isCancellationRequested) {
	return;
	}
	this._info = info;
	this.source = 'sync';
	this.onDidChangeEmitter.fire(undefined);
	} catch (e) {
	console.error('failed to sync workspace instance:', e)
	}
	})
	}

	if (instance.id !== this.info.latestInstance?.id) {
	return false;
	}
	return phasesOrder[instance.status.phase] < phasesOrder[this.info.latestInstance.status.phase];

[supervisor] Trigger instance update if missing after port exposure #7058

[supervisor] Trigger instance update if missing after port exposure #7058

Conversation

corneliusludmann commented Dec 3, 2021

Description

Related Issue(s)

How to test

Release Notes

roboquat commented Dec 3, 2021

codecov bot commented Dec 3, 2021 • edited

Codecov Report

akosyakov left a comment • edited

Choose a reason for hiding this comment

akosyakov Dec 6, 2021

Choose a reason for hiding this comment

akosyakov Dec 6, 2021 • edited

Choose a reason for hiding this comment

corneliusludmann commented Dec 6, 2021

akosyakov commented Dec 10, 2021

geropl commented Dec 10, 2021 • edited

akosyakov commented Dec 10, 2021

geropl commented Dec 10, 2021

stale bot commented Dec 20, 2021

akosyakov commented Dec 20, 2021

codecov bot commented Dec 3, 2021 •

edited

akosyakov left a comment •

edited

akosyakov Dec 6, 2021 •

edited

geropl commented Dec 10, 2021 •

edited