Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2831: adding beta graduation criteria #3714

Merged
merged 7 commits into from Feb 9, 2023

Conversation

sallyom
Copy link
Contributor

@sallyom sallyom commented Jan 9, 2023

/cc @dashpole
/cc @saschagrunert
/cc @vrutkovs

@k8s-ci-robot
Copy link
Contributor

@sallyom: GitHub didn't allow me to request PR reviews from the following users: vrutkovs.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @dashpole
/cc @saschagrunert
/cc @vrutkovs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 9, 2023
@sallyom sallyom mentioned this pull request Jan 9, 2023
12 tasks
@sallyom sallyom force-pushed the kep2831-update-1.27 branch 2 times, most recently from bfaa6dc to dd17862 Compare January 9, 2023 16:05
Copy link
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

(i'll leave lgtm to others involved, but this lgtm)

@saschagrunert
Copy link
Member

@sallyom wanna give this one another update? :)

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 24, 2023
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 26, 2023
@wojtek-t
Copy link
Member

In my queue - just waiting for PRR shadow reviewer to take a look first :)

@jeremyrickard
Copy link
Contributor

jeremyrickard commented Feb 2, 2023

👋 I'm shadowing PRR review this time around and gave this a review. It doesn't look like this PR has updated the PRR questionare (which was pretty complete for the alpha stage). Since this is proposing to graduate to beta, have you given any additional thought to the metrics that might inform a rollback or monitoring requirements? With a bad TracingConfiguration, I expect users will things in the kubelet logs to help correct that configuration, but are there any other changes expected as part of this that might introduce better ways to monitor/troubleshoot this feature or understand if they are achieving the SLO (99% of spans delivered)? Not sure if this something that can be done (maybe ref: open-telemetry/opentelemetry-go#2547) , but I think it would be useful to add some clarity around this with an update.

Was also wondering if you added unit tests for the feature gate when this was implemented in alpha? If so, could you link those in an update to the PRR questionnaire?

- A test with kubelet-tracing & apiserver-tracing enabled to ensure no issues are introduced, regardless
of whether a tracing backend is configured.

### Graduation Requirements

Alpha

- [] Implement tracing of incoming and outgoing gRPC, HTTP requests in the kubelet
- [] Integration testing of tracing
- [X] Implement tracing of incoming and outgoing gRPC, HTTP requests in the kubelet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also take a look a @jeremyrickard comment here:
#3714

Filling in the remaining part of monitoring sections is the main callout there (I would have the same) and it will be blocking PRR approval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added references to the opentelemetry-go issue and also added kubelet metrics that would signal issues

@logicalhan
Copy link
Member

/assign @ehashman

@logicalhan
Copy link
Member

/assign

@sallyom
Copy link
Contributor Author

sallyom commented Feb 3, 2023

@jeremyrickard @wojtek-t thanks for your review & apologies for not getting to it right away - I've updated based on your feedback, thank you.

Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally LGTM.

/approve
for SIG Instrumentation, waiting on PRR confirm for final lgtm

approvers:
- "@dashpole"
- "@ehashman"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need someone from instrumentation to approve this. Since @dashpole is a coauthor, please leave me in as an instrumentation (rather than PRR) approver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do, thank you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, done

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sallyom Few more small comments from me.

- [] Integration testing of tracing
- [X] Implement tracing of incoming and outgoing gRPC, HTTP requests in the kubelet
- [X] Integration testing of tracing
- _component-base tracing/api/v1 integration test_ https://github.com/kubernetes/kubernetes/blob/master/test/integration/apiserver/tracing/tracing_test.go
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to the Testing section above (as part of integration).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to testing section

@@ -328,7 +305,17 @@ _This section must be completed when targeting beta graduation to a release._
No impact to running workloads, logs will indicate the problem.

###### What specific metrics should inform a rollback?
To be determined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L296: Have those test been added?

FWIW - given that this isn't a persistent thing, rather in-memory switch, I would say that explicit enablement/disablement tests aren't really needed, as they won't provide visible value on top of the actual feature tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are no tests for enablement/disablement - i'll add a note there

```

* Pod Lifecycle and Kubelet [SLOs](https://github.com/kubernetes/community/tree/master/sig-scalability/slos) are the signals that should guide a rollback. In particular, the [`kubelet_pod_start_duration_seconds_count`, `kubelet_runtime_operations_errors_total`, and `kubelet_pleg_relist_interval_seconds_bucket`] would surface issues affecting kubelet performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L321: can you please ensure that those will happen (and you update the KEP to describe the scenarios tested and finding - please cc me on it) before the FG will be graduated to Beta ?

L346: I'm assuming this will be the OTLP metric that doesn't yet, exist, right? Or sth else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(was)L321: I added under Graduation Requirements for Beta L245
(was)L346: yes, the OTLP metric tbd is what I am referring to & noted that L349

@Atharva-Shinde Atharva-Shinde mentioned this pull request Feb 8, 2023
4 tasks
sallyom and others added 7 commits February 8, 2023 20:01
Signed-off-by: Sally O'Malley <somalley@redhat.com>
Signed-off-by: Sally O'Malley <somalley@redhat.com>
Co-authored-by: Sascha Grunert <sgrunert@redhat.com>
Signed-off-by: Sally O'Malley <somalley@redhat.com>
Signed-off-by: Sally O'Malley <somalley@redhat.com>
Signed-off-by: Sally O'Malley <somalley@redhat.com>
Signed-off-by: Sally O'Malley <somalley@redhat.com>
@sallyom
Copy link
Contributor Author

sallyom commented Feb 9, 2023

@wojtek-t @ehashman updated based on the latest review/feedback - thank you again

@wojtek-t
Copy link
Member

wojtek-t commented Feb 9, 2023

I'm fine with it from PRR perspective. You still need SIG-level approval though.

/approve PRR

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, ehashman, sallyom, saschagrunert, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 9, 2023
@wojtek-t
Copy link
Member

wojtek-t commented Feb 9, 2023

Actually - I see Elana already approved it above. So lgtm-ing to ensure it will make the release.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 9, 2023
@k8s-ci-robot k8s-ci-robot merged commit 33f7b95 into kubernetes:master Feb 9, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants