Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

component-helpers: Support structured and contextual logging #120637

Conversation

bells17
Copy link
Contributor

@bells17 bells17 commented Sep 13, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

Implement support for structured logging and contextual logging in component-helpers.
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md

I've unified the implementation to use contextual logging, as some parts of component-helpers were already using contextual logging.

Which issue(s) this PR fixes:

Fixes #120638

Special notes for your reviewer:

I've confirmed that they pass the logcheck tests:

$ GOOS=linux _output/local/bin/golangci-lint run --color=always --config=hack/golangci.yaml ./...

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. labels Sep 13, 2023
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 13, 2023
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 13, 2023
@bells17 bells17 marked this pull request as ready for review September 13, 2023 17:59
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 13, 2023
@bells17 bells17 force-pushed the component-helpers-structured-and-contextual-logging branch from def55dc to 196cf2a Compare September 13, 2023 18:34
@mengjiao-liu
Copy link
Member

/wg structured-logging
/area logging

/assign @pohly

@k8s-ci-robot k8s-ci-robot added wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging. area/logging labels Sep 14, 2023
@leilajal
Copy link
Contributor

/remove-sig api-machinery

@k8s-ci-robot k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Sep 14, 2023
@bart0sh bart0sh added this to Triage in SIG Node PR Triage Sep 15, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Sep 15, 2023

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 15, 2023
@@ -28,7 +28,7 @@ import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/informers"
"k8s.io/client-go/kubernetes/fake"
"k8s.io/klog/v2/ktesting"
"k8s.io/kubernetes/test/utils/ktesting"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for this change, can you remove it?

Otherwise it would be better to use tCtx instead of ctx below to indicate that k/k ktesting (in contrast to klog ktesting) returns more than just a context.Context.

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Apr 21, 2024

@bells17: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-conformance-kind-ipv6-parallel 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-conformance-kind-ipv6-parallel
pull-kubernetes-conformance-image-test 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-conformance-image-test
pull-kubernetes-e2e-gce-network-proxy-http-connect 66083696a6fb01d03726a3d77b995f26119de6ad link true /test pull-kubernetes-e2e-gce-network-proxy-http-connect
pull-kubernetes-e2e-gce-cos-alpha-features 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-e2e-gce-cos-alpha-features
pull-kubernetes-e2e-capz-windows-master 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-e2e-capz-windows-master
pull-kubernetes-e2e-gci-gce-ingress 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-e2e-gci-gce-ingress
pull-kubernetes-e2e-gce-storage-snapshot 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-e2e-gce-storage-snapshot
check-dependency-stats 66083696a6fb01d03726a3d77b995f26119de6ad link false /test check-dependency-stats
pull-kubernetes-e2e-kind-kms 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-e2e-kind-kms
pull-kubernetes-e2e-ubuntu-gce-network-policies 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-e2e-ubuntu-gce-network-policies
pull-kubernetes-e2e-gce-csi-serial 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-e2e-gce-csi-serial
pull-kubernetes-e2e-gce-storage-slow 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-e2e-gce-storage-slow
pull-publishing-bot-validate 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-publishing-bot-validate
pull-kubernetes-cross 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-cross
pull-kubernetes-e2e-gci-gce-ipvs 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-e2e-gci-gce-ipvs
pull-kubernetes-e2e-storage-kind-disruptive 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-e2e-storage-kind-disruptive
pull-kubernetes-e2e-gce-network-proxy-grpc 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-e2e-gce-network-proxy-grpc
pull-kubernetes-kind-dra 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-kind-dra
pull-kubernetes-node-e2e-crio-dra 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-node-e2e-crio-dra
pull-kubernetes-node-e2e-containerd-1-7-dra 66083696a6fb01d03726a3d77b995f26119de6ad link false /test pull-kubernetes-node-e2e-containerd-1-7-dra

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@bells17 bells17 requested a review from pohly April 21, 2024 11:50
@bells17
Copy link
Contributor Author

bells17 commented Apr 21, 2024

@pohly I have made the necessary corrections. Could you please review them again?

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 21, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: ec6481fde3a099229eb832ce2697285d60c76ac4

@pohly
Copy link
Contributor

pohly commented Apr 22, 2024

/label tide/merge-method-squash

@bells17: should you need to update the PR again, then feel free to squash manually with a single, clean commit message.

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Apr 22, 2024
@@ -750,7 +750,7 @@ func nodeAddressesChangeDetected(addressSet1, addressSet2 []v1.NodeAddress) bool
return false
}

func updateNodeAddressesFromNodeIP(node *v1.Node, nodeAddresses []v1.NodeAddress) ([]v1.NodeAddress, error) {
func updateNodeAddressesFromNodeIP(ctx context.Context, node *v1.Node, nodeAddresses []v1.NodeAddress) ([]v1.NodeAddress, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this context used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used (anymore?).

@bells17: can you remove the parameter? Please squash (but don't rebase!) during your next push.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this context used?

Not used (anymore?)

Thank you. This change was unnecessary, so I removed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly I have squashed the commits into one. I followed the steps below, but please let me know if there are any issues with the procedure.

$ git rebase -i HEAD~6
$ git push origin component-helpers-structured-and-contextual-logging --force

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bells17 bells17 force-pushed the component-helpers-structured-and-contextual-logging branch from 2576586 to 5a3fd49 Compare April 22, 2024 07:58
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 22, 2024
// cloudProvider is unset or `"external"`.
func ParseNodeIPArgument(nodeIP, cloudProvider string) ([]net.IP, error) {
// ParseNodeIPArgument parses kubelet's --node-ip argument.
// If nodeIP contains invalid values, they will be returned as strings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this API now is really weird, I assume that is to avoid logging inside a library that ... IIRC we have to do because of the sloppy IPs thing, and is only used in this file and in cmd/kubelet so I think that is not as bad ... @danwinship WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm to blame for the API, see #120637 (comment) for reference 😅

IMHO having a parser function do logging to inform the user should better be avoided. It depends on the context in which the parsing happens if and how the user should be informed, which isn't known to the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added parseNodeIP originally but I feel like sig-node owns it now so it's more up to them.

The logging-in-the-middle-of-a-helper-function was just to preserve the original behavior where it used to be logging directly from cmd/kubelet/app/server.go, and probably I wouldn't have done it that way if I'd had to think about contexts/loggers at the time.

One possibility would be to not change the API of parseNodeIPs, and just have RunKubelet check if len(nodeIPs) == len(strings.Split(kubeServer.NodeIP, ",")) and warn that "some values" were invalid and were ignored if not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or return (nodeIPs []net.IP, warning, err error) or (nodeIPs []net.IP, fatal bool, err error) (I feel like "fatal" should come after "err" there except that "err" should always be last...)

I was just dealing with the same thing in another PR. Maybe we need a standard convention for functions that want to emit/return both warnings and errors?

Copy link
Contributor

@pohly pohly Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning warning (presumably a string) and a non-fatal pre-formatted error in the second case both have the problem that they use unstructured formatting (i.e. some flavor of fmt.Sprintf) of a message. That might not work well when the caller uses structured logging. I prefer returning just the information about a problem and leaving the formatting of that to the caller.

len(nodeIPs) == len(strings.Split(kubeServer.NodeIP, ",")) feels like it defeats the purpose of abstracting the parsing in a helper function.

I added parseNodeIP originally but I feel like sig-node owns it now so it's more up to them.

They'll probably just defer to you 😜

Do you want to pull someone else in or do you feel confident going ahead with it as-is and approving it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need a standard convention for functions that want to emit/return both warnings and errors?

Defining conventions is hard, ensuring that fellow developers know about them and follow them is even harder. I suspect unless Go itself defines something, we won't have a chance to establish this on our own for Kubernetes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrunalp: you reviewed this earlier for SIG Node. Any opinion ("I feel like sig-node owns it now so it's more up to them")? ^^^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to pull someone else in or do you feel confident going ahead with it as-is and approving it?

I don't have approver bits on those directories anyway...

I prefer returning just the information about a problem and leaving the formatting of that to the caller.

I guess that since the semantics of the function are essentially "parse the valid IPs but ignore the invalid ones for backward-compatibility reasons", then returning a list of the invalid IPs does make sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have approver bits on those directories anyway...

@mrunalp already approved earlier. What I need from you (or someone from SIG Networking) is approval for pkg/controller/nodeipam/ipam/OWNERS - I know, it's complicated 😢

@aojea
Copy link
Member

aojea commented Apr 22, 2024

LGTM the networking bits, just one question on the ParseNodeIPArgument function that I prefer @danwinship to take a look as he has more context

@danwinship
Copy link
Contributor

/approve
for nodeipam changes

@wojtek-t
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 24, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: c18b444697d6bdc0834e3912978b42329d5ebbd5

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bells17, danwinship, mrunalp, pohly, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 24, 2024
@k8s-ci-robot k8s-ci-robot merged commit 1c917aa into kubernetes:master Apr 24, 2024
15 checks passed
SIG Node CI/Test Board automation moved this from Issues - In progress to Done Apr 24, 2024
SIG Node PR Triage automation moved this from not-sig-node to Done Apr 24, 2024
@k8s-ci-robot k8s-ci-robot added this to the v1.31 milestone Apr 24, 2024
@bells17 bells17 deleted the component-helpers-structured-and-contextual-logging branch April 24, 2024 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider area/code-generation area/conformance Issues or PRs related to kubernetes conformance tests area/dependency Issues or PRs related to dependency changes area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/ipvs area/kube-proxy area/kubectl area/kubelet area/logging area/network-policy Issues or PRs related to Network Policy subproject area/provider/gcp Issues or PRs related to gcp provider area/release-eng Issues or PRs related to the Release Engineering subproject area/stable-metrics Issues or PRs involving stable metrics area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.
Projects
Archived in project
Archived in project
Archived in project
Status: Done
Development

Successfully merging this pull request may close these issues.

component-helpers: Support structured and contextual logging