New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[server] track more startWorkspace failures #12378
Conversation
585dbb6
to
3678ffd
Compare
@@ -244,6 +248,12 @@ export async function getWorkspaceClassForInstance( | |||
} | |||
} | |||
|
|||
class StartInstanceError extends Error { | |||
constructor(public readonly reason: FailedInstanceStartReason, public readonly cause: Error) { | |||
super("Starting workspace instance failed: " + cause.message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the reason value assigned to this inherited type before it is used below?
if (err instanceof StartInstanceError) {
failedReason = err.reason;
}
I don't have that much experience with Errors in JS/TS but here it looks like we're just assigning it to a possibly variable string. My worry with this is that for the metric in prometheus, we should ideally have a fixed cardinality of the label which means we wouldn't expect the variance that could come from this constructor taking on arbitrary message
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FailedInstanceStartReason type is restricted to only three values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke on a call. Sven explained that public readonly reason: FailedInstanceStartReason
actually makes reason
a property of the object. I was not familiar with that syntax before.
This resolves my question.
Thanks for the changes!
@@ -414,6 +424,11 @@ export class WorkspaceStarter { | |||
forceRebuild, | |||
); | |||
} catch (e) { | |||
let failedReason: FailedInstanceStartReason = "other"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geropl this is the value for the alerts rate(gitpod_server_instance_starts_failed_total{}[$__rate_interval])
.
Not sure this makes sense to have other
if that's the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're 💯 right, it doesn't (with the current quality of errors we're receiving). PR to fix this: #12574
Counts additional reasons for startWorkspace failures.
Related Issue(s)
fixes #12332
Release Notes