Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expose metrics about the operator #214

Merged
merged 77 commits into from Sep 30, 2022
Merged

feat: expose metrics about the operator #214

merged 77 commits into from Sep 30, 2022

Conversation

CAJan93
Copy link
Contributor

@CAJan93 CAJan93 commented Sep 9, 2022

What's changed and what's your intention?

WORK IN PROGRESS! DO NOT MERGE!

Implementing feature request from Issue 123

  • Summarize your change (mandatory)
  • How does this PR work? Need a brief introduction for the changed logic (optional)
  • Describe clearly one logical change and avoid lazy messages (optional)
  • Describe any limitations of the current code (optional)

Checklist

  • I have written the necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

@CAJan93 CAJan93 added the enhancement New feature or request label Sep 9, 2022
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 112 files.

Valid Invalid Ignored Fixed
111 1 0 0
Click to see the invalid file list
  • pkg/metrics/metrics.go

pkg/metrics/metrics.go Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Sep 9, 2022

Codecov Report

Merging #214 (1bfe4f3) into main (d42f944) will increase coverage by 0.42%.
The diff coverage is 87.14%.

@@            Coverage Diff             @@
##             main     #214      +/-   ##
==========================================
+ Coverage   81.43%   81.86%   +0.42%     
==========================================
  Files          34       36       +2     
  Lines        3022     3115      +93     
==========================================
+ Hits         2461     2550      +89     
- Misses        510      511       +1     
- Partials       51       54       +3     
Flag Coverage Δ
unittests 81.86% <87.14%> (+0.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/controller/risingwave_controller.go 71.42% <61.76%> (-1.48%) ⬇️
...hook/risingwave_pod_template_validating_webhook.go 80.95% <75.00%> (+2.00%) ⬆️
pkg/webhook/mutating_webhook_metrics_recorder.go 91.30% <91.30%> (ø)
pkg/webhook/risingwave_validating_webhook.go 98.80% <96.77%> (+0.01%) ⬆️
pkg/ctrlkit/loop.go 100.00% <100.00%> (ø)
pkg/webhook/risingwave_mutating_webhook.go 100.00% <100.00%> (ø)
...ebhook/risingwave_pod_template_mutating_webhook.go 100.00% <100.00%> (ø)
pkg/webhook/validating_webhook_metrics_recorder.go 100.00% <100.00%> (ø)
pkg/command/create/create.go 88.28% <0.00%> (+8.59%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

CAJan93 and others added 5 commits September 9, 2022 14:41
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@CAJan93
Copy link
Contributor Author

CAJan93 commented Sep 9, 2022

I am currently able to expose the metrics on localhost:8080. I am unable to see the metrics in prometheus. This may be due to one of two things:

  • I did not deploy the latest version of the operator
  • The metrics are not being scraped correctly

@CAJan93 CAJan93 changed the title WIP: first dummy metric exported WORK IN PROGRESS: exposing metrics about the operator Sep 9, 2022
@arkbriar
Copy link
Collaborator

Generally, the metrics are what we wanted but I think we'd better:

  • Don't inject metrics update in the ctrlkit package, make it less coupling
  • Label metrics with distinguishable labels, such as webhook type and name for webhook metrics, controller name and action name for workflow actions

I'll detail in #123 later.

@arkbriar arkbriar self-requested a review September 14, 2022 02:15
@arkbriar arkbriar reopened this Sep 29, 2022
@arkbriar
Copy link
Collaborator

I don't know what happened, mergify just closed the PR? 🤔 Anyway, I reopened it.

@arkbriar
Copy link
Collaborator

Yes, I think so. IMO we should return an error to notify the controller-runtime framework that an error happens.
However, an error means an immediate requeueing so our best strategy would be to log the panic and return ctrl.Result{}, nil.

Sorry, I didn't make it consistent. Actually reconcile.Result is just an alias of ctrl.Result so I always mixed them up.

Good point. I think this totally makes sense. Sidetone: The function returns a reconcile.Result, not ctrl.Result, Error.
IMHO it would be the best to requeue here with some delay, to avoid immediately crashing again. What do you think @arkbriar ?

Sounds good, but how 🤔 ? I don't think we could achieve that by blocking the thread. The controller uses a worker pool and if we block too many, the others might not be able to run anymore.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 115 files.

Valid Invalid Ignored Fixed
114 1 0 0
Click to see the invalid file list
  • pkg/utils/types.go

pkg/utils/types.go Show resolved Hide resolved
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@CAJan93
Copy link
Contributor Author

CAJan93 commented Sep 29, 2022

In a deployment where the Prometheus operator is also installed. This should be an optional setting. We can add a doc about how to apply this. We can leave it to another PR.

I opened this PR. Let's discuss it there.

@arkbriar arkbriar changed the title Expose metrics about the operator feat: expose metrics about the operator Sep 30, 2022
Copy link
Collaborator

@arkbriar arkbriar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM! Great work! Thanks for your great effort!

I left some comments and please fix the issues listed before merging.

pkg/controller/risingwave_controller.go Show resolved Hide resolved
pkg/controller/risingwave_controller.go Show resolved Hide resolved
pkg/controller/risingwave_controller.go Show resolved Hide resolved
request reconcile.Request,
gvk schema.GroupVersionKind,
ctx context.Context,
reconcileStartTS time.Time) reconcile.Result {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need the return value here anymore since it is not used. Shall we remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. We have to change the res *reconcile.Request though. Let me fix that.

pkg/controller/risingwave_controller.go Outdated Show resolved Hide resolved
pkg/controller/risingwave_controller.go Outdated Show resolved Hide resolved
pkg/metrics/metrics.go Outdated Show resolved Hide resolved
pkg/metrics/metrics.go Outdated Show resolved Hide resolved
pkg/utils/types.go Outdated Show resolved Hide resolved
@CAJan93
Copy link
Contributor Author

CAJan93 commented Sep 30, 2022

Sounds good, but how 🤔 ? I don't think we could achieve that by blocking the thread. The controller uses a worker pool and if we block too many, the others might not be able to run anymore.

I also do not know :/

@CAJan93 CAJan93 merged commit 573e280 into main Sep 30, 2022
@CAJan93 CAJan93 deleted the issue_123 branch September 30, 2022 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants