Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Contextual logging in the apiserver POC #114198

Closed
wants to merge 3 commits into from

Conversation

tallclair
Copy link
Member

This PR is illustrating how the apiserver can make use of a contextual logger (KEP 3077).

In particular, attaching an auditID to log messages will ensure that apiserver logs in the request path always have sufficient context to tie them to a specific request. See previous discussion from SIG api-machinery - Nov. 16.

This PR contains 3 commits:

  1. [apiserver] Add a contextual logger with auditID - demonstrates how to attach key/value pairs to a context logger. This should be used very sparingly (auditID might be the only valid use case at the top level).
  2. Example: convert structured logs to contextual logging - demonstrates how to convert a structured log line to a contextual logger when a context is available. The format of this log is unchanged, except it will include the auditID from above.
  3. Example: convert unstructured logs to contextual logging - This commit illustrates 3 different options for how we may want to handle unstructured logs in the initial release.

Plumbing a context through to everywhere a log is recorded is out of scope.

/cc @lavalamp @deads2k @pohly @enj
/sig api-machinery
/area apiserver logging

@tallclair tallclair added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 29, 2022
@k8s-ci-robot
Copy link
Contributor

@tallclair: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. area/apiserver labels Nov 29, 2022
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.26 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.26.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Tue Nov 29 21:40:19 UTC 2022.

@k8s-ci-robot k8s-ci-robot added area/logging cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 29, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tallclair
Once this PR has been reviewed and has the lgtm label, please assign lavalamp for approval by writing /assign @lavalamp in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Nov 29, 2022
@k8s-ci-robot
Copy link
Contributor

@tallclair: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-verify a04a74b link true /test pull-kubernetes-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@@ -49,6 +49,10 @@ func withAuditInit(handler http.Handler, newAuditIDFunc func() string) http.Hand
// Note: we save the user specified value of the Audit-ID header as is, no truncation is performed.
audit.WithAuditID(ctx, types.UID(auditID))

logger := klog.FromContext(ctx).WithValues("auditID", auditID)
ctx = klog.NewContext(ctx, logger)
r = r.WithContext(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on making this step unnecessary in go-logr/logr#158

The exact solution is still unknown, but it should become faster than modifying the logger and adding it to the context.

Do you have benchmarks that could be run to measure the performance impact of these changes?

@fedebongio
Copy link
Contributor

/cc @serathius @logicalhan

@fedebongio
Copy link
Contributor

We reviewed this one in the SIG meeting yesterday, if you need help assigning to someone let me know.
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 1, 2022
logger.Error(nil, fmt.Sprintf("Dropping mismatched audit annotation %q", keysAndValues[len(keysAndValues)-1]))

// Option 3: Rewrite as structured logs
logger.Error(nil, "Dropping mismatched audit annotation", "annotation", keysAndValues[len(keysAndValues)-1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my structured logging WG lead hat on I have to vote for option 3 😁

The resulting messages will be different, but still have the same information. If you stay consistent, then searching for key/value pairs becomes easier, even when using the text output format.

When using JSON, the overhead for writing log entries is lower because zap is more optimized than klog.

@tkashem
Copy link
Contributor

tkashem commented Dec 16, 2022

/cc

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 25, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tallclair tallclair marked this pull request as draft January 27, 2023 22:32
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 27, 2023
@enj
Copy link
Member

enj commented Mar 20, 2023

@tallclair @pohly @fedebongio what are the next steps here?

@pohly
Copy link
Contributor

pohly commented Mar 21, 2023

Go is currently about to add slog as structured and not-quite-yet contextual logging API. I am in discussion with @thockin how can leverage that work in Kubernetes - see https://kubernetes.slack.com/archives/C020CCMUEAX/p1679169826926789

Let's wait how that turns out before coming back to this PR.

@tallclair
Copy link
Member Author

Yeah, sorry for the lack of updates. I decided that there was enough in flux in the contextual logging space that it didn't make sense to try and get this into v1.27. Hopefully someone can pick it up in v1.28 (I'll be on 👶 leave), or I can follow up in v1.29

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2023
@pohly
Copy link
Contributor

pohly commented Jun 21, 2023

/remove-lifecycle stale

This needs to be picked up at some point, but still seems relevant, so let's keep it open.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2023
@dims
Copy link
Member

dims commented Oct 24, 2023

This work-in-progress PR needs a rebase. please rebase if this PR is still needed for the upcoming release.

@pohly
Copy link
Contributor

pohly commented Oct 25, 2023

The relationship with slog has been clarified. It doesn't support contextual logging, so nothing changes in Kubernetes: we continue to use logr as main API, with the ability to write to a slog.Handler if some application uses that.

I tried promoting contextual logging to beta but that raised concerns about potential performance impacts when enabled by default, so in the end a decision about beta got postponed.

It would be good to pick up this PR again for 1.30 and reach some conclusion whether you want contextual logging to be enabled by default. One proposal on Slack was to make it an opt-in feature, so users wouldn't get it unless they change command line flags or configs of each component where they want it.

@tallclair
Copy link
Member Author

It would be good to pick up this PR again for 1.30 and reach some conclusion whether you want contextual logging to be enabled by default. One proposal on Slack was to make it an opt-in feature, so users wouldn't get it unless they change command line flags or configs of each component where they want it.

@pohly LMK if there's a better place to comment on this, but I do not think making it optional is a good idea. The value of the contextual logger is making it easier to plumb through context to the logging sites. If contextual logging is optional, you either end up with logs missing key context, or you have to plumb that context through manually which defeats the purpose. Am I missing something?

@pohly
Copy link
Contributor

pohly commented Nov 13, 2023

@tallclair: I agree with you.

A better place to comment on promotion to beta (with or without a new config option) would be in kubernetes/enhancements#4219.

@dims dims added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 4, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apiserver area/logging cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

8 participants