New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2831: adding beta graduation criteria #3714
Conversation
@sallyom: GitHub didn't allow me to request PR reviews from the following users: vrutkovs. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
bfaa6dc
to
dd17862
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
(i'll leave lgtm to others involved, but this lgtm)
@sallyom wanna give this one another update? :) |
43cecb8
to
da911e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
da911e1
to
e5cd9df
Compare
In my queue - just waiting for PRR shadow reviewer to take a look first :) |
👋 I'm shadowing PRR review this time around and gave this a review. It doesn't look like this PR has updated the PRR questionare (which was pretty complete for the alpha stage). Since this is proposing to graduate to beta, have you given any additional thought to the metrics that might inform a rollback or monitoring requirements? With a bad TracingConfiguration, I expect users will things in the kubelet logs to help correct that configuration, but are there any other changes expected as part of this that might introduce better ways to monitor/troubleshoot this feature or understand if they are achieving the SLO (99% of spans delivered)? Not sure if this something that can be done (maybe ref: open-telemetry/opentelemetry-go#2547) , but I think it would be useful to add some clarity around this with an update. Was also wondering if you added unit tests for the feature gate when this was implemented in alpha? If so, could you link those in an update to the PRR questionnaire? |
- A test with kubelet-tracing & apiserver-tracing enabled to ensure no issues are introduced, regardless | ||
of whether a tracing backend is configured. | ||
|
||
### Graduation Requirements | ||
|
||
Alpha | ||
|
||
- [] Implement tracing of incoming and outgoing gRPC, HTTP requests in the kubelet | ||
- [] Integration testing of tracing | ||
- [X] Implement tracing of incoming and outgoing gRPC, HTTP requests in the kubelet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also take a look a @jeremyrickard comment here:
#3714
Filling in the remaining part of monitoring sections is the main callout there (I would have the same) and it will be blocking PRR approval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added references to the opentelemetry-go issue and also added kubelet metrics that would signal issues
/assign @ehashman |
/assign |
e5cd9df
to
993ac97
Compare
@jeremyrickard @wojtek-t thanks for your review & apologies for not getting to it right away - I've updated based on your feedback, thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally LGTM.
/approve
for SIG Instrumentation, waiting on PRR confirm for final lgtm
approvers: | ||
- "@dashpole" | ||
- "@ehashman" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need someone from instrumentation to approve this. Since @dashpole is a coauthor, please leave me in as an instrumentation (rather than PRR) approver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do, thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added, done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sallyom Few more small comments from me.
- [] Integration testing of tracing | ||
- [X] Implement tracing of incoming and outgoing gRPC, HTTP requests in the kubelet | ||
- [X] Integration testing of tracing | ||
- _component-base tracing/api/v1 integration test_ https://github.com/kubernetes/kubernetes/blob/master/test/integration/apiserver/tracing/tracing_test.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved to the Testing section above (as part of integration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to testing section
@@ -328,7 +305,17 @@ _This section must be completed when targeting beta graduation to a release._ | |||
No impact to running workloads, logs will indicate the problem. | |||
|
|||
###### What specific metrics should inform a rollback? | |||
To be determined. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L296: Have those test been added?
FWIW - given that this isn't a persistent thing, rather in-memory switch, I would say that explicit enablement/disablement tests aren't really needed, as they won't provide visible value on top of the actual feature tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are no tests for enablement/disablement - i'll add a note there
``` | ||
|
||
* Pod Lifecycle and Kubelet [SLOs](https://github.com/kubernetes/community/tree/master/sig-scalability/slos) are the signals that should guide a rollback. In particular, the [`kubelet_pod_start_duration_seconds_count`, `kubelet_runtime_operations_errors_total`, and `kubelet_pleg_relist_interval_seconds_bucket`] would surface issues affecting kubelet performance. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L321: can you please ensure that those will happen (and you update the KEP to describe the scenarios tested and finding - please cc me on it) before the FG will be graduated to Beta ?
L346: I'm assuming this will be the OTLP metric that doesn't yet, exist, right? Or sth else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(was)L321: I added under Graduation Requirements
for Beta L245
(was)L346: yes, the OTLP metric tbd is what I am referring to & noted that L349
Signed-off-by: Sally O'Malley <somalley@redhat.com>
Signed-off-by: Sally O'Malley <somalley@redhat.com>
Co-authored-by: Sascha Grunert <sgrunert@redhat.com>
Signed-off-by: Sally O'Malley <somalley@redhat.com>
Signed-off-by: Sally O'Malley <somalley@redhat.com>
Signed-off-by: Sally O'Malley <somalley@redhat.com>
Signed-off-by: Sally O'Malley <somalley@redhat.com>
993ac97
to
a9c36f7
Compare
I'm fine with it from PRR perspective. You still need SIG-level approval though. /approve PRR |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dashpole, ehashman, sallyom, saschagrunert, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Actually - I see Elana already approved it above. So lgtm-ing to ensure it will make the release. /lgtm |
updates to KEP-2831 to track connected traces in kubelet and beta graduation
Issue link: Kubelet OpenTelemetry Tracing #2831
/cc @dashpole
/cc @saschagrunert
/cc @vrutkovs