Skip to content

KEP 2258: add node log query#96120

Merged
k8s-ci-robot merged 4 commits into
kubernetes:masterfrom
LorbusChris:kubelet-journal-logs
Mar 14, 2023
Merged

KEP 2258: add node log query#96120
k8s-ci-robot merged 4 commits into
kubernetes:masterfrom
LorbusChris:kubelet-journal-logs

Conversation

@LorbusChris

@LorbusChris LorbusChris commented Nov 2, 2020

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

Provide an administrator a streaming view of journal logs on Linux and WinEvent logs on Windows
without them having to implement a client side reader. Only available to cluster admins.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:
Upstreaming from OpenShift:

Does this PR introduce a user-facing change?:

Adds feature gate `NodeLogQuery` which provides cluster administrators with a streaming view of logs using kubectl without them having to implement a client side reader or logging into the node.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 2, 2020
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @LorbusChris. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 2, 2020
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 2, 2020
@LorbusChris

Copy link
Copy Markdown
Contributor Author

/cc @smarterclayton @sttts @aravindhp

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

@LorbusChris: GitHub didn't allow me to request PR reviews from the following users: aravindhp.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @smarterclayton @sttts @aravindhp

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@LorbusChris LorbusChris changed the title Expose a system journal shim on the kubelet logs endpoint on Linux and Windows kubelet: Expose a system journal shim on the logs endpoint on Linux and Windows Nov 3, 2020
@LorbusChris

Copy link
Copy Markdown
Contributor Author

Somebody please give this an ok-to-test

@rphillips

Copy link
Copy Markdown
Member

/ok-to-test
/lgtm

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 5, 2020
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 5, 2020
@rphillips

Copy link
Copy Markdown
Member

/assign @sjenning

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 10, 2020
Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think logDir could be renamed to something more specific like nodeLogDir.

Comment thread pkg/kubelet/kubelet.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the constant defined for /var/log?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the 100 limit come from? Probably worth adding a comment. The systemd limit for a unit name is 256 characters https://www.freedesktop.org/software/systemd/man/systemd.unit.html#:~:text=The%20total%20length%20of%20the,service%20%22%2C%20%22%20.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the limit to 256.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a comment for the limit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the previous comment about the length, I change this to check for number of services with a constant of 4.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constant for 100000

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use something like https://github.com/cyphar/filepath-securejoin to prevent filepath parent walk attacks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this 7?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I missed updating this when reducing the heuristics. Fixed.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: was for found -> was found

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@aravindhp aravindhp left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrunalp thank you for the review. I have addressed your comments. Please take a look.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated
Comment on lines 62 to 87

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread pkg/kubelet/kubelet.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the limit to 256.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the previous comment about the length, I change this to check for number of services with a constant of 4.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I missed updating this when reducing the heuristics. Fixed.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 -> 256? Maybe a constant for it :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

journalctl won't return an error if we try to fetch logs for a non-existent service, hence we search for it in the list of services known to journalctl

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@aravindhp aravindhp left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrunalp I have addressed your comments. PTAL.

Comment thread pkg/kubelet/kubelet_server_journal.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

aravindhp and others added 4 commits March 14, 2023 08:45
Enable the query endpoint only if this gate is enabled in addition to
the EnableNodeLogQuery kubelet config option.
Added EnableNodeLogQuery field to kubelet/apis/config/types.go and
staging/src/k8s.io/kubelet/config/v1beta1/types.go, then executed.
 `hack/update-codegen.sh`.

This new field will default to off and will need to be explicitly
enabled in addition to the NodeLogQuery gate to use the feature.
…dpoint

Provide an administrator a streaming view of journal logs on Linux
systems using journalctl, and event logs on Windows systems using the
Get-WinEvent PowerShell cmdlet without them having to implement a client
side reader.

Only available to cluster admins.

The implementation for journald on Linux was originally done by Clayton
Coleman.

Introduce a heuristics approach to query logs

The logs query for node objects will follow a heuristics approach
when asked to query for logs from a service. If asked to get the
logs from a service foobar, it will first check if foobar logs to the
native OS service log provider. If unable to get logs from these, it
will attempt to get logs from /var/foobar, /var/log/foobar.log or
/var/log/foobar/foobar.log in that order.
The logs sub-command can also directly serve a file if the query looks
like a file.

Co-authored-by: Clayton Coleman <ccoleman@redhat.com>
Co-authored-by: Christian Glombek <cglombek@redhat.com>
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

@LorbusChris: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2 c315b5bca759bbc554a383e16e83c2b8f15e1ebf link false /test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
pull-kubernetes-e2e-capz-windows-containerd d12696c link false /test pull-kubernetes-e2e-capz-windows-containerd
pull-kubernetes-e2e-gce-cos-alpha-features d12696c link false /test pull-kubernetes-e2e-gce-cos-alpha-features

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mrunalp

mrunalp commented Mar 14, 2023

Copy link
Copy Markdown
Contributor

/lgtm
/approve
We discussed in sig-node today and are okay merging this.

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: 6358ad89a82d840117d4e133309f6681e000face

@liggitt

liggitt commented Mar 14, 2023

Copy link
Copy Markdown
Member

/approve
kubelet config field / validation / defaults lgtm
approval of kubelet implementation from @mrunalp

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, LorbusChris, marosset, mrunalp, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aravindhp

Copy link
Copy Markdown
Contributor

/retest-required

@marosset

Copy link
Copy Markdown
Contributor

/test pull-kubernetes-e2e-capz-windows-containerd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/dependency Issues or PRs related to dependency changes area/kubectl area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

Status: API review completed, 1.27
Archived in project
Archived in project
Status: Done

Development

Successfully merging this pull request may close these issues.