[workspace]: add force-stop check on stopping workspaces #5184

mrsimonemms · 2021-08-13T08:27:35Z

Further work on #5055

Since #4910 stopped counting "stopping" workspaces for billing purposes,
any workspace caught in a "stopping" phase would never be force-stopped.
This adds a conditional "excludeStopping" boolean (defaulting to true)
to the DB implementation and the meta-instance-controller simply includes
that phase in the search.

It was discovered that ~200 workspaces were caught in this phase (90%
prebuilds) so this phase is necessary to force-stop.

mrsimonemms · 2021-08-13T08:30:27Z

@jankeromnes has manually deleted all instances stuck in stopping, except the following (all owned by Gitpodders). When this is merged into prod, check that these instances are successfully stopped

bf6fe0d1-4213-4843-b003-c89a9e263965
b5e3519a-fd7d-4cf1-9691-6efa3cab0520
99f3149d-fe76-4085-9807-0da13c725e73
7bf0d14a-258e-47ac-9720-7a7d832dbf94
3123c642-df61-4a60-b81a-6ecba467a129

JanKoehnlein · 2021-08-13T08:44:53Z

Is there any way to test this in the preview env?

jankeromnes · 2021-08-13T08:58:07Z

Another super cool fix! Many thanks @mrsimonemms 🙏 🚀

It was discovered that ~200 workspaces were caught in this phase (90%
prebuilds)

... all created since yesterday around midnight UTC (2021-08-12T00:05:14.984Z), so this could be a side-effect of recent incidents (but it's still 100% a good idea to not leave instances stuck in stopping forever 💯)

@jankeromnes has manually deleted all instances stuck in stopping

To clarify, I've manually forced them back to stopped phase -- I haven't actually deleted them 😅

The query:

mysql> update d_b_workspace_instance set status = JSON_SET(status, '$.phase', 'stopped'), phasePersisted = 'stopped' where phase = 'stopping' and stoppedTime = '' and STR_TO_DATE(stoppingTime, '%Y-%m-%dT%H:%i:%s.%fZ') < (NOW() - INTERVAL 2 HOUR) and id not in ('bf6fe0d1-4213-4843-b003-c89a9e263965','b5e3519a-fd7d-4cf1-9691-6efa3cab0520','99f3149d-fe76-4085-9807-0da13c725e73','7bf0d14a-258e-47ac-9720-7a7d832dbf94','3123c642-df61-4a60-b81a-6ecba467a129');
Query OK, 208 rows affected (0.06 sec)
Rows matched: 208  Changed: 208  Warnings: 0

Is there any way to test this in the preview env?

I guess you can:

Start and stop a workspace
Open the PR in Gitpod, then connect to the DB (kubectl port-forward statefulset/mysql 3306 & mysql -h 127.0.0.1 -ptest gitpod)
Update the d_b_workspace_instance entry to be stuck in stopping since > 2 hours ago, e.g. like so:

mysql> update d_b_workspace_instance set status = JSON_SET(status, '$.phase', 'stopping'), phasePersisted = 'stopping', stoppedTime = '', creationTime = '2021-08-13T05:00:00.000Z', stoppingTime = '2021-08-13T05:00:00.000Z';

(Warning: There is no where clause, so this will update all workspace instances in this deployment. Should be okay but FYI.)

Wait for the PR to do its clean-up job (instance should go back to stopped eventually)

jankeromnes

Code looks 99% good to me!

Added a few thoughts in-line.

components/gitpod-db/src/typeorm/workspace-db-impl.ts

components/ws-manager-bridge/src/meta-instance-controller.ts

JanKoehnlein · 2021-08-13T09:26:21Z

Test passed.

Since #4910 stopped counting "stopping" workspaces for billing purposes, any workspace caught in a "stopping" phase would never be force-stopped. This adds a conditional "includeStopping" boolean (defaulting to `false`) to the DB implementation and the meta-instance-controller simply includes that phase in the search. It was discovered that ~200 workspaces were caught in this phase (90% prebuilds) so this phase is necessary to force-stop.

JanKoehnlein · 2021-08-13T11:11:27Z

/lgtm

roboquat · 2021-08-13T11:11:30Z

LGTM label has been added.

Git tree hash: f4fc46160a894b20bf3fd4dce3b3c841568e376d

roboquat · 2021-08-13T11:11:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JanKoehnlein

Associated issue: #5055

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [JanKoehnlein]
~~components/gitpod-db/OWNERS~~ [JanKoehnlein]
~~components/ws-manager-bridge/OWNERS~~ [JanKoehnlein]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mrsimonemms requested review from jankeromnes and csweichel August 13, 2021 08:27

roboquat requested review from AlexTugarev and JanKoehnlein August 13, 2021 08:27

roboquat added the size/S label Aug 13, 2021

mrsimonemms linked an issue Aug 13, 2021 that may be closed by this pull request

Please force-stop workspace instances that are "stuck" in a bad state #5016

Closed

jankeromnes reviewed Aug 13, 2021

View reviewed changes

components/gitpod-db/src/typeorm/workspace-db-impl.ts Outdated Show resolved Hide resolved

components/ws-manager-bridge/src/meta-instance-controller.ts Outdated Show resolved Hide resolved

mrsimonemms force-pushed the sje/force-stop-stopping-ws branch from 234fbbf to 9affb83 Compare August 13, 2021 11:05

mrsimonemms requested a review from jankeromnes August 13, 2021 11:06

roboquat assigned JanKoehnlein Aug 13, 2021

roboquat added the lgtm label Aug 13, 2021

roboquat added the approved label Aug 13, 2021

roboquat merged commit f35e762 into main Aug 13, 2021

roboquat deleted the sje/force-stop-stopping-ws branch August 13, 2021 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[workspace]: add force-stop check on stopping workspaces #5184

[workspace]: add force-stop check on stopping workspaces #5184

mrsimonemms commented Aug 13, 2021 •

edited

Loading

mrsimonemms commented Aug 13, 2021

JanKoehnlein commented Aug 13, 2021

jankeromnes commented Aug 13, 2021 •

edited

Loading

jankeromnes left a comment

JanKoehnlein commented Aug 13, 2021

JanKoehnlein commented Aug 13, 2021

roboquat commented Aug 13, 2021

roboquat commented Aug 13, 2021

[workspace]: add force-stop check on stopping workspaces #5184

[workspace]: add force-stop check on stopping workspaces #5184

Conversation

mrsimonemms commented Aug 13, 2021 • edited Loading

mrsimonemms commented Aug 13, 2021

JanKoehnlein commented Aug 13, 2021

jankeromnes commented Aug 13, 2021 • edited Loading

jankeromnes left a comment

Choose a reason for hiding this comment

JanKoehnlein commented Aug 13, 2021

JanKoehnlein commented Aug 13, 2021

roboquat commented Aug 13, 2021

roboquat commented Aug 13, 2021

mrsimonemms commented Aug 13, 2021 •

edited

Loading

jankeromnes commented Aug 13, 2021 •

edited

Loading