Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[observability] Add gRPC client call and additional functional metrics #5726

Merged
merged 7 commits into from
Sep 17, 2021

Conversation

csweichel
Copy link
Contributor

Description

Adds client call metrics to gRPC calls form TS and Go, and makes the server -> image-builder connection use the default gRPC unary options. The lack of the latter probably was the cause of the recent image-builder related outage.

Also adds cluster registration metrics to ws-manager-bridge, and image-build counts to image-builder-mk3

Thanks @JanKoehnlein for figuring out the nitty gritty of the grpc-js interceptor business.

Related Issue(s)

Fixes #5706, #5707

How to test

  1. start a workspace to produce some data all the way 'round
  2. look at the metric endpoints of server, image-builder-mk3, ws-manager-bridge, ws-manager, ws-proxy, ws-scheduler. You should see grpc_client_calls_total metrics on all those services.

Release Notes

[server] Add gRPC client call metrics
[ws-manager-bridge] Add gRPC client call and cluster registration metrics
[image-builder-mk3] Add image-build metrics

Copy link
Contributor

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first walk through the code...

I really liked the decision of using go-grpc-middleware to expose client-side gRPC metrics! When you announced this in the all-hands I thought you were going to implement this in common-go alone 😅.

components/registry-facade/cmd/run.go Outdated Show resolved Hide resolved
components/registry-facade/cmd/run.go Outdated Show resolved Hide resolved
@csweichel
Copy link
Contributor Author

csweichel commented Sep 17, 2021

I've addressed the comments. @ArthurSens, @aledbf and @geropl please have another look.

@codecov
Copy link

codecov bot commented Sep 17, 2021

Codecov Report

Merging #5726 (34c0801) into main (f5eea47) will decrease coverage by 7.36%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5726      +/-   ##
==========================================
- Coverage   19.04%   11.67%   -7.37%     
==========================================
  Files           2        6       +4     
  Lines         168     1310    +1142     
==========================================
+ Hits           32      153     +121     
- Misses        134     1137    +1003     
- Partials        2       20      +18     
Flag Coverage Δ
components-local-app-app-linux ?
components-local-app-app-windows ?
components-registry-facade-app 11.67% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
components/local-app/pkg/auth/pkce.go
components/local-app/pkg/auth/auth.go
...omponents/registry-facade/pkg/registry/registry.go 0.00% <0.00%> (ø)
...omponents/registry-facade/pkg/registry/manifest.go 9.82% <0.00%> (ø)
components/registry-facade/pkg/registry/blob.go 0.00% <0.00%> (ø)
...omponents/registry-facade/pkg/registry/imagecfg.go 0.00% <0.00%> (ø)
components/registry-facade/pkg/registry/metrics.go 0.00% <0.00%> (ø)
...onents/registry-facade/pkg/registry/layersource.go 28.80% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5eea47...34c0801. Read the comment docs.

@JanKoehnlein
Copy link
Contributor

/lgtm

@roboquat
Copy link
Contributor

LGTM label has been added.

Git tree hash: 6ff787bc06800936c3677f0c554ed15998326d7b

@aledbf
Copy link
Member

aledbf commented Sep 17, 2021

Looking at our logs, there seems to be a severe problem with ws-manager in core-dev workspaces with this change

The error you see in the logs was fixed here #5696, i.e., is not related to this PR

@JanKoehnlein
Copy link
Contributor

Thanks for the clarification @aledbf. I just wanted to make sure we don't commit anything that sets our logging on fire

/approve

@aledbf
Copy link
Member

aledbf commented Sep 17, 2021

/approve

@roboquat
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: aledbf, JanKoehnlein

Associated issue: #5706

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Logs: Log entries related content initialiser should be findable when searching for a workspace
5 participants