Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Monitoring and Observability #114

Closed
8 tasks done
Tracked by #118
ruanxin opened this issue Aug 12, 2022 · 4 comments
Closed
8 tasks done
Tracked by #118

Implementing Monitoring and Observability #114

ruanxin opened this issue Aug 12, 2022 · 4 comments
Assignees

Comments

@ruanxin
Copy link
Contributor

ruanxin commented Aug 12, 2022

Description

To ensure operational readiness for new reconciler product, a comprehensive monitoring and observability solution is essential to have. We want to support operational aspects like observability and tracing by design and want to incorporate it early on.

AC

Developer Notes

  • Metrics Including but not limited to:
    • Successful Processing Duration 95th percentile for each module reconciliation
      • The duration time of one module state change from error or processing to ready.
      • Design this metrics as Histograms
    • Operator memory, cpu usage (in % and in peak %)
    • Operator worker queue related metrics (from kubebuilder)
    • Operator reconcilation error, success count (from kubebuilder)
  • Dashboard data file should persistent and configured in a grafana dashboard ConfigMap, current control-plane repo reference
  • Contact with Huskies to clarify who take responsibility and how to do the integration of Jaeger, Kiali.
    • Jaeger is not in productive ready state
    • Jaeger will be replaced with OpenTelemetry
    • Operator Listener/Watcher is not a typical use case for tracing
    • Conclusion: Don't integrate tracing tool

Related PR

kyma-project/module-manager#113

@ruanxin ruanxin added the area/monitoring Issues or PRs related to the monitoring module (deprecated) label Aug 12, 2022
@tobiscr tobiscr added area/reconciler and removed area/monitoring Issues or PRs related to the monitoring module (deprecated) labels Aug 18, 2022
@ruanxin ruanxin self-assigned this Aug 23, 2022
@tobiscr
Copy link
Contributor

tobiscr commented Aug 25, 2022

For testing, we can use KCP DEV and deploy the reconciler ecosystem manually until the automated pipelines are ready.

@tobiscr
Copy link
Contributor

tobiscr commented Aug 25, 2022

Further details how to confiugre the observability stack can be provided by @a-thaler .

@tobiscr
Copy link
Contributor

tobiscr commented Aug 25, 2022

Depends on #43 to get finally implemented.

@tobiscr
Copy link
Contributor

tobiscr commented Sep 29, 2022

Waiting for SRE alignemtn on requirements. Moving to blocked now ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants