Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make all logging consistent across the controllers #1812

Merged
merged 1 commit into from
Dec 7, 2021

Conversation

devigned
Copy link
Contributor

@devigned devigned commented Oct 29, 2021

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
This PR cleans up all of the logging inconsistencies and logs everything to the span / composite logger. This PR removes all of the Logger fields and Logger embedded interfaces in scopes, reducing the scope interface footprint.

All log events are added to the appropriate spans in the distributed traces.

Which issue(s) this PR fixes: Fixes #1806

Special notes for your reviewer:
This is a terribly large PR touching almost all of the code base. I'm opening this to illustrate the scope of the changes and would be happy to break it up in whatever way makes the most sense. Please suggest.

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

make all logging consistent across the controller

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 29, 2021
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Oct 29, 2021
@nader-ziada
Copy link
Contributor

nice cleanup 👍

the tests are failing cause of this error:

controllers/azurecluster_controller.go:75:2: ineffectual assignment to ctx (ineffassign)
	ctx, log, done := tele.StartSpanWithLogger(ctx,
	^
controllers/azureidentity_controller.go:56:2: ineffectual assignment to ctx (ineffassign)
	ctx, log, done := tele.StartSpanWithLogger(ctx,

azure/scope/cluster.go Outdated Show resolved Hide resolved
@@ -214,7 +210,7 @@ func (s *ClusterScope) LBSpecs() []azure.LBSpec {

// RouteTableSpecs returns the node route table.
func (s *ClusterScope) RouteTableSpecs() []azure.RouteTableSpec {
routetables := []azure.RouteTableSpec{}
var routetables []azure.RouteTableSpec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any practical difference in how Go treats these two constructs, or is this just for style? (I prefer the var style to which you changed it.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are subtly different: https://play.golang.org/p/Yd6jei06m9y. One is an empty slice, the other is nil.

azure/scope/cluster.go Outdated Show resolved Hide resolved
azure/scope/machinepool.go Outdated Show resolved Hide resolved
@@ -510,12 +507,12 @@ func (m *MachinePoolScope) PatchObject(ctx context.Context) error {

// Close the MachineScope by updating the machine spec, machine status.
func (m *MachinePoolScope) Close(ctx context.Context) error {
ctx, _, done := tele.StartSpanWithLogger(ctx, "scope.MachinePoolScope.Close")
ctx, log, done := tele.StartSpanWithLogger(ctx, "scope.MachinePoolScope.Close")
Copy link
Contributor

@mboersma mboersma Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always want to shadow the ctx argument that was passed in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes'ish, because the ctx that is returned is a new ctx struct which may now contain a correlationID, span, and other values in the bag. We could change the name of the new context, but this ensure that the original is not used below. Only the new ctx would be used. Use of the original is in nearly all cases an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of the original is in nearly all cases an error.

Got it. Thanks for the clarification.

controllers/azureidentity_controller.go Outdated Show resolved Hide resolved
@@ -1,109 +0,0 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we ceding any important coverage by deleting this (and other) logging-related unit tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The tests were not super effective, and as the original author, I'm no longer sold on the fact they verify anything of value.

util/tele/composite_logger.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 4, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 19, 2021
@devigned devigned changed the title make all logging consistent across the controller make all logging consistent across the controllers Nov 19, 2021
@devigned devigned force-pushed the mc-logs branch 4 times, most recently from 54ee02c to c05c760 Compare November 19, 2021 21:07
@devigned
Copy link
Contributor Author

devigned commented Nov 19, 2021

I believe I've addressed all the feedback.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 20, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 22, 2021
@devigned
Copy link
Contributor Author

I rebased these changes this morning and looks like all the tests are green. Let me know if you folks see anything that needs to change.

main.go Outdated Show resolved Hide resolved
azure/scope/cluster.go Outdated Show resolved Hide resolved
azure/services/scalesetvms/scalesetvms.go Outdated Show resolved Hide resolved
@@ -76,25 +72,28 @@ func NewAzureMachineReconciler(client client.Client, log logr.Logger, recorder r

// SetupWithManager initializes this controller with a manager.
func (amr *AzureMachineReconciler) SetupWithManager(ctx context.Context, mgr ctrl.Manager, options Options) error {
ctx, _, done := tele.StartSpanWithLogger(ctx, "controllers.AzureMachineReconciler.SetupWithManager")
ctx, log, done := tele.StartSpanWithLogger(ctx,
"controllers.AzureMachineReconciler.SetupWithManager",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarification:
This logger previously had the name created asctrl.Log.WithName("controllers").WithName("AzureMachine"). Is this okay to change now? Same for other controllers as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume so, but it sounds like you are concerned about something specific. What concerns you about changing these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we loosing info on the namespace and the name of the object? I see logger.WithValues("namespace", req.Namespace, "azureMachine", req.Name) was removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we loosing info on the namespace and the name of the object? I see logger.WithValues("namespace", req.Namespace, "azureMachine", req.Name) was removed

Indeed, that was removed, but it was added back in the tele.KVP attributes that are passed into tele.StartSpanWithLogger, which adds them as values to the composite logger.

@@ -437,18 +429,21 @@ func (m *MachineScope) SetFailureReason(v capierrors.MachineStatusError) {
}

// SetBootstrapConditions sets the AzureMachine BootstrapSucceeded condition based on the extension provisioning states.
func (m *MachineScope) SetBootstrapConditions(provisioningState string, extensionName string) error {
func (m *MachineScope) SetBootstrapConditions(ctx context.Context, provisioningState string, extensionName string) error {
_, log, done := tele.StartSpanWithLogger(ctx, "scope.MachineScope.SetBootstrapConditions")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question on the approach: All logs are now written through the span logger which also creates spans for the functions they are called in. This means we have already added spans to functions which previously didn't have one, and in future, we'll be adding more if we want to add logs. Will this create a flurry of spans and make it harder to visualize the flow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the spans and adding them to the traces in my opinion actually makes it easier to understand what is happening. If someone would like to filter spans, they are able to in the visualization or in the span data. Without producing the spans, the data would then be lost. For example, in Jaeger, I can just not expand items in the tree below a given level if I want. However, if I need to dig deeper I can. If we omit deeper span information, there is no option to dig deeper.

Additionally, exporters would likely be configured to do stochastic tracing only selecting a percentage of traces or a subset based on some criteria.

Overall, I'd bias toward over instrumentation and giving folks the ability to reduce based on export criteria.

wdyt?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 30, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 3, 2021
@devigned devigned force-pushed the mc-logs branch 2 times, most recently from a2a5a4b to cc11107 Compare December 3, 2021 20:54
@devigned
Copy link
Contributor Author

devigned commented Dec 3, 2021

/retest

@devigned
Copy link
Contributor Author

devigned commented Dec 6, 2021

/test pull-cluster-api-provider-azure-e2e

@CecileRobertMichon
Copy link
Contributor

Given the size of the PR and the number of lines that it touches, I'm going to let it merge since this week is pretty slow and it should allow us to optimize for minimum collisions with other in-flight PRs. Most comments have been addressed and since this is not adding any features or APIs we can adjust and correct if needed. @shysank feel free to complete your review when you get back so any outstanding comments/questions you have can be addressed.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 7, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 7, 2021
@CecileRobertMichon
Copy link
Contributor

/retest

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Dec 7, 2021

@devigned: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-apidiff 1e97558 link false /test pull-cluster-api-provider-azure-apidiff

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@devigned
Copy link
Contributor Author

devigned commented Dec 7, 2021

/test pull-cluster-api-provider-azure-e2e

looks like e2e timed out 2h

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consistent Logging
6 participants