Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate scheduler to structured logging #105841

Closed
pchan opened this issue Oct 22, 2021 · 31 comments
Closed

Migrate scheduler to structured logging #105841

pchan opened this issue Oct 22, 2021 · 31 comments
Assignees
Labels
area/logging kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.
Milestone

Comments

@pchan
Copy link
Contributor

pchan commented Oct 22, 2021

What would you like to be added?

Enhance scheduler logging and address current issues.

  1. Migrate the current unstructured logs to structured logging.
    • Klog APIs of unstructured logs are found in 2 directories (cmd/kube-scheduler, pkg/scheduler)
    • Primarily files in pkg/scheduler/framework and pkg/scheduler/internal needs to be migrated. I will update this issue with the API breakdown at the file level.
  2. One improvement that would be nice to have is logging a unique key for each pod through its scheduling cycle. This was raised in Do contextual logging for scheduler #91633. Need tracing for this, will need to work with SIG instrumentation and do more analysis.

Authors

What to include in your PR

Subject: "Migrate to structured logging"
SIGs: instrumentation, node
Priority: important-soon
Kind: cleanup
HOW TO: Read the structured logging migration guide.

Please reference this issue for scheduler migrations.

Before you submit your PR, check to see if someone has already migrated that component: https://github.com/kubernetes/kubernetes/pulls?q=is%3Apr+is%3Aopen+structured+log+label%3Asig%2Fnode

Reviewers:

Read through the migration guide above. Use the PR query above to find PRs to review.

Why is this needed?

To get the benefits of structured logging.

kubernetes/enhancements#1602

/wg structured-logging
/sig scheduling
/area logging
/priority important-longterm
/kind cleanup
/cc @kubernetes/wg-structured-logging-reviews
/cc @damemi

@pchan pchan added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 22, 2021
@k8s-ci-robot k8s-ci-robot added wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. area/logging priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 22, 2021
@pchan
Copy link
Contributor Author

pchan commented Oct 22, 2021

File level breakup

File Unstructured log API Calls Assigned
cmd/kube-scheduler/app/server.go 2
pkg/scheduler/framework/plugins/nodelabel/node_label.go 1
pkg/scheduler/internal/cache/debugger/comparer.go 2
pkg/scheduler/framework/plugins/nodevolumelimits/non_csi.go 1
pkg/scheduler/framework/plugins/volumebinding/binder_test.go 2
pkg/scheduler/internal/cache/node_tree.go 4
pkg/scheduler/framework/plugins/examples/stateful/stateful.go 1
pkg/scheduler/framework/plugins/noderesources/resource_allocation.go 2
pkg/scheduler/framework/plugins/serviceaffinity/service_affinity.go 1
pkg/scheduler/framework/preemption/preemption.go 1
pkg/scheduler/internal/cache/debugger/dumper.go 3
pkg/scheduler/framework/plugins/volumebinding/assume_cache.go 14
pkg/scheduler/framework/plugins/volumebinding/binder.go 28
pkg/scheduler/framework/plugins/interpodaffinity/filtering.go 2
pkg/scheduler/framework/plugins/legacy_registry.go 4
pkg/scheduler/framework/plugins/nodepreferavoidpods/node_prefer_avoid_pods.go 1
pkg/scheduler/framework/plugins/nodevolumelimits/csi.go 1
pkg/scheduler/framework/plugins/podtopologyspread/filtering.go 4
pkg/scheduler/framework/plugins/volumezone/volume_zone.go 1
pkg/scheduler/internal/cache/cache.go 17

@shiva1333
Copy link
Contributor

Thanks @pchan,
I think it'd be better if we can divide the above files into separate groups containing approx 20-30 lines.
It eases the migration process by helping contributors to pick different groups.

@shiva1333
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 22, 2021
@shiva1333
Copy link
Contributor

/sig instrumentation
/cc @serathius
/milestone v1.23
/priority important-soon

@k8s-ci-robot k8s-ci-robot added sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Oct 22, 2021
@k8s-ci-robot
Copy link
Contributor

@shivanshu1333: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility.

In response to this:

/sig instrumentation
/cc @serathius
/milestone v1.23
/priority important-soon

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@serathius
Copy link
Contributor

/milestone v1.23

@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Oct 22, 2021
@pohly
Copy link
Contributor

pohly commented Oct 22, 2021

One improvement that would be nice to have is logging a unique key for each pod through its scheduling cycle. [...] Need tracing for this [, will need to work with SIG instrumentation and do more analysis.

No, this is unrelated to tracing. What we need is contextual logging:

logger = logger.WithValue("pod", KObj(pod))

logger.Info("some operation")

=> "some operation" pod=...

This cannot be done without passing the logger instance with the additional value through the entire call chain where log messages are emitted as part of scheduling. In other words, it cannot be done with the current global logger in klog.

I am working on a KEP for this.

@shiva1333
Copy link
Contributor

Yes, @pohly is on point, we need to have contextual logging for this.
Thanks pohly, for initiating the work required to come up with a KEP for contextual logging. Sooner or later, we need to work on contextual logging, and it's good to start it now.

@jyz0309
Copy link
Contributor

jyz0309 commented Oct 23, 2021

Maybe I can help to migrate the pkg/scheduler/framework/plugins/volumebinding/binder.go and pkg/scheduler/framework/plugins/volumebinding/assume_cache.go, can you assign it to me?

@jyz0309
Copy link
Contributor

jyz0309 commented Oct 23, 2021

BTW, I think we can divide it into 4 part as @shivanshu1333 said. I have tried to divide it into 4 parts:

     
File Unstructured log API Calls Assigned PR Status
cmd/kube-scheduler/app/server.go 2  @shivanshu1333 #105855
pkg/scheduler/framework/plugins/nodelabel/node_label.go 1  @shivanshu1333 #105855
pkg/scheduler/framework/plugins/nodevolumelimits/non_csi.go 1  @shivanshu1333 #105855
pkg/scheduler/framework/plugins/nodevolumelimits/csi.go 1  @shivanshu1333 #105855
pkg/scheduler/framework/plugins/examples/stateful/stateful.go 1  @shivanshu1333 #105967
pkg/scheduler/framework/plugins/noderesources/resource_allocation.go 2  @shivanshu1333 #105967
pkg/scheduler/framework/preemption/preemption.go 1  @shivanshu1333 #105967
     
pkg/scheduler/internal/cache/debugger/dumper.go 3  @shivanshu1333 #105968
pkg/scheduler/internal/cache/debugger/comparer.go 2  @shivanshu1333 #105968
pkg/scheduler/internal/cache/cache.go 17  @shivanshu1333 #105969
pkg/scheduler/internal/cache/node_tree.go 4  @shivanshu1333 #105968
     
pkg/scheduler/framework/plugins/volumebinding/binder_test.go 2  @jyz0309 #105858
pkg/scheduler/framework/plugins/volumebinding/binder.go 28  @jyz0309 #105858
     
pkg/scheduler/framework/plugins/volumebinding/assume_cache.go 14  @mengjiao-liu #105904
pkg/scheduler/framework/plugins/interpodaffinity/filtering.go 2  @mengjiao-liu #105931
pkg/scheduler/framework/plugins/podtopologyspread/filtering.go 4  @mengjiao-liu #105931
pkg/scheduler/framework/plugins/volumezone/volume_zone.go 1  @mengjiao-liu #105931

@shiva1333
Copy link
Contributor

Yes, @jyz0309 the partition looks good, yes you can pick the part 3 ☺️

@shiva1333
Copy link
Contributor

@pchan as part of migration let's break the table into 4 parts as suggested above.
Kindly assign Part 3 to @jyz0309

@shiva1333
Copy link
Contributor

I'm happy to migrate Part 1 and Part 2, you can assign that to me ☺️
/assign

@shiva1333
Copy link
Contributor

A note to contributors:
While migrating the above files, you may encounter that some of the parts of the mentioned files were already migrated. Kindly look into them as well, as there are many instances where the logs were migrated incorrectly previously.
cc: @jyz0309

@jyz0309
Copy link
Contributor

jyz0309 commented Oct 23, 2021

/assign

@pchan
Copy link
Contributor Author

pchan commented Oct 25, 2021

@shivanshu1333 and others Thank you. I initially thought it should be divided by component (framework, internal) but I guess even this division will work. Please let me know if it makes sense to add another column in the table, that tracks migration status (link to issue). I will make sure to review the changes.

@shiva1333
Copy link
Contributor

Yes, it's good to have a column tracking the PR status🙂

@jyz0309
Copy link
Contributor

jyz0309 commented Oct 25, 2021

I will add a column to track the PR~

@mengjiao-liu
Copy link
Member

mengjiao-liu commented Oct 27, 2021

The following files have been removed and no longer need to be traced.

@jyz0309
Copy link
Contributor

jyz0309 commented Oct 27, 2021

The following files have been removed and no longer need to be traced.

Have deleted it.

@shiva1333
Copy link
Contributor

pkg/scheduler/framework/plugins/serviceaffinity/service_affinity.go is deleted

@jyz0309
Copy link
Contributor

jyz0309 commented Oct 28, 2021

All of the PRs are out 🎉

@shiva1333
Copy link
Contributor

Complete migration of scheduler depends on resolution of ((106258 or 106276) and (106262))
We still need to migrate pkg/scheduler/internal/cache/debugger/dumper.go (2 lines (49 and 60)) and cmd/kube-scheduler/app/options/configfile.go (1 line (96)) after resolving the above issues, and update the hack/.structured_logging indicating complete migration of scheduler.

@jyotimahapatra
Copy link
Contributor

Hi 👋 Im the bug triage shadow. Since we're in code freeze for 1.23, i wanted to check if we can move this issue to milestone 1.24.

@serathius
Copy link
Contributor

@jyotimahapatra I think that this issue is just blocked on #106305

@pohly
Copy link
Contributor

pohly commented Nov 17, 2021

@serathius
Copy link
Contributor

Ok, only #106305 will go in 1.23 as we just need approval.
/milestone v1.24

@doshyt
Copy link

doshyt commented Jan 29, 2022

Hi all, Bug triage for 1.24 here. Wanted to check what's the expectation for this issue w.r.t 1.24? Is it still relevant and planned to be completed? Also, I see quite a few dependencies in its history - would be great to get an updated summary of blockers if any, and see whether 1.24 is still a relevant target. Thanks!

@pohly
Copy link
Contributor

pohly commented Mar 23, 2022

#108159 should finish the conversion.

@liggitt
Copy link
Member

liggitt commented Mar 25, 2022

fixed by #108159

@liggitt liggitt closed this as completed Mar 25, 2022
WG Structured Logging - 1.23 Migration automation moved this from Inbox to Done Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.
Development

No branches or pull requests

10 participants