Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce SLI for preview environment start #10732

Merged
merged 2 commits into from
Jun 20, 2022
Merged

Conversation

mads-hartmann
Copy link
Contributor

@mads-hartmann mads-hartmann commented Jun 17, 2022

NOTE: I have added the hold label because the build fails are we seem to have a problems with VMs right now.

Description

This PR instruments our build job to include annotations on the root span that we can use as part of our "Preview environments should start successfully" SLI.

These annotations are used to created a derived column in Honeycomb that represents our SLI. The structure of the IF(condition, then-value, else-value) - see here - additionally for SLIs in Honeycomb true means the event should count as a success, false means it should count as a failure, and null means the event isn't part of the SLI.

So in our case, we only count the event if preview.gitpod_built_successfully is true and the preview.k3s_successfully_created attribute exists. We consider it a success if the root span doesn't have an error set and preview.k3s_successfully_created is true. Otherwise it's a failure.

IF(
  AND(
    EQUALS($preview.gitpod_built_successfully, true),
    EXISTS($preview.k3s_successfully_created)
  ),
  AND(
    NOT($error),
    EQUALS($preview.k3s_successfully_created, true)
  ),
  null
)

Related Issue(s)

Fixes https://github.com/gitpod-io/ops/issues/2728
Fixes https://github.com/gitpod-io/ops/issues/2729
Fixes https://github.com/gitpod-io/ops/issues/2731

How to test

I started a job off the branch so it loaded my new TS code

 werft job run github -f

The VM happened to fail as we're overloaded right now, so that's great for testing 馃槄 See the trace here and screenshot of it below:
Screenshot 2022-06-17 at 15 37 12

Here is a simple query showing the count of events grouped by the the SLI - there is a bunch of events that doesn't count as part of the SLI and one failure (no successes because Harvester is overloaded)

Screenshot 2022-06-17 at 15 48 27

Also created an SLO for the fun of it here.

Release Notes

NONE

Documentation

N/A

@ArthurSens
Copy link
Contributor

ArthurSens commented Jun 17, 2022

All code changes look good! Awesome to see SLOs already as well 馃榿

I tried rebasing your PR to see if the Preview Problem was solved already, but I think we'll have to wait until next week to get this one merged 馃槙

Copy link
Member

@meysholdt meysholdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

馃殌

@roboquat roboquat merged commit fcd38fa into main Jun 20, 2022
@roboquat roboquat deleted the mads/preview-tracing branch June 20, 2022 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants