-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make all logging consistent across the controllers #1812
Conversation
nice cleanup 👍 the tests are failing cause of this error:
|
@@ -214,7 +210,7 @@ func (s *ClusterScope) LBSpecs() []azure.LBSpec { | |||
|
|||
// RouteTableSpecs returns the node route table. | |||
func (s *ClusterScope) RouteTableSpecs() []azure.RouteTableSpec { | |||
routetables := []azure.RouteTableSpec{} | |||
var routetables []azure.RouteTableSpec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any practical difference in how Go treats these two constructs, or is this just for style? (I prefer the var
style to which you changed it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are subtly different: https://play.golang.org/p/Yd6jei06m9y. One is an empty slice, the other is nil.
@@ -510,12 +507,12 @@ func (m *MachinePoolScope) PatchObject(ctx context.Context) error { | |||
|
|||
// Close the MachineScope by updating the machine spec, machine status. | |||
func (m *MachinePoolScope) Close(ctx context.Context) error { | |||
ctx, _, done := tele.StartSpanWithLogger(ctx, "scope.MachinePoolScope.Close") | |||
ctx, log, done := tele.StartSpanWithLogger(ctx, "scope.MachinePoolScope.Close") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we always want to shadow the ctx
argument that was passed in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes'ish, because the ctx
that is returned is a new ctx
struct which may now contain a correlationID, span, and other values in the bag. We could change the name of the new context, but this ensure that the original is not used below. Only the new ctx
would be used. Use of the original is in nearly all cases an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use of the original is in nearly all cases an error.
Got it. Thanks for the clarification.
@@ -1,109 +0,0 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we ceding any important coverage by deleting this (and other) logging-related unit tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. The tests were not super effective, and as the original author, I'm no longer sold on the fact they verify anything of value.
54ee02c
to
c05c760
Compare
I believe I've addressed all the feedback. |
I rebased these changes this morning and looks like all the tests are green. Let me know if you folks see anything that needs to change. |
@@ -76,25 +72,28 @@ func NewAzureMachineReconciler(client client.Client, log logr.Logger, recorder r | |||
|
|||
// SetupWithManager initializes this controller with a manager. | |||
func (amr *AzureMachineReconciler) SetupWithManager(ctx context.Context, mgr ctrl.Manager, options Options) error { | |||
ctx, _, done := tele.StartSpanWithLogger(ctx, "controllers.AzureMachineReconciler.SetupWithManager") | |||
ctx, log, done := tele.StartSpanWithLogger(ctx, | |||
"controllers.AzureMachineReconciler.SetupWithManager", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarification:
This logger previously had the name created asctrl.Log.WithName("controllers").WithName("AzureMachine")
. Is this okay to change now? Same for other controllers as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume so, but it sounds like you are concerned about something specific. What concerns you about changing these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we loosing info on the namespace and the name of the object? I see logger.WithValues("namespace", req.Namespace, "azureMachine", req.Name)
was removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we loosing info on the namespace and the name of the object? I see logger.WithValues("namespace", req.Namespace, "azureMachine", req.Name) was removed
Indeed, that was removed, but it was added back in the tele.KVP attributes that are passed into tele.StartSpanWithLogger
, which adds them as values to the composite logger.
@@ -437,18 +429,21 @@ func (m *MachineScope) SetFailureReason(v capierrors.MachineStatusError) { | |||
} | |||
|
|||
// SetBootstrapConditions sets the AzureMachine BootstrapSucceeded condition based on the extension provisioning states. | |||
func (m *MachineScope) SetBootstrapConditions(provisioningState string, extensionName string) error { | |||
func (m *MachineScope) SetBootstrapConditions(ctx context.Context, provisioningState string, extensionName string) error { | |||
_, log, done := tele.StartSpanWithLogger(ctx, "scope.MachineScope.SetBootstrapConditions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question on the approach: All logs are now written through the span logger which also creates spans for the functions they are called in. This means we have already added spans to functions which previously didn't have one, and in future, we'll be adding more if we want to add logs. Will this create a flurry of spans and make it harder to visualize the flow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating the spans and adding them to the traces in my opinion actually makes it easier to understand what is happening. If someone would like to filter spans, they are able to in the visualization or in the span data. Without producing the spans, the data would then be lost. For example, in Jaeger, I can just not expand items in the tree below a given level if I want. However, if I need to dig deeper I can. If we omit deeper span information, there is no option to dig deeper.
Additionally, exporters would likely be configured to do stochastic tracing only selecting a percentage of traces or a subset based on some criteria.
Overall, I'd bias toward over instrumentation and giving folks the ability to reduce based on export criteria.
wdyt?
a2a5a4b
to
cc11107
Compare
/retest |
/test pull-cluster-api-provider-azure-e2e |
Given the size of the PR and the number of lines that it touches, I'm going to let it merge since this week is pretty slow and it should allow us to optimize for minimum collisions with other in-flight PRs. Most comments have been addressed and since this is not adding any features or APIs we can adjust and correct if needed. @shysank feel free to complete your review when you get back so any outstanding comments/questions you have can be addressed. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CecileRobertMichon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@devigned: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/test pull-cluster-api-provider-azure-e2e looks like e2e timed out 2h |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This PR cleans up all of the logging inconsistencies and logs everything to the span / composite logger. This PR removes all of the Logger fields and Logger embedded interfaces in scopes, reducing the scope interface footprint.
All log events are added to the appropriate spans in the distributed traces.
Which issue(s) this PR fixes: Fixes #1806
Special notes for your reviewer:
This is a terribly large PR touching almost all of the code base. I'm opening this to illustrate the scope of the changes and would be happy to break it up in whatever way makes the most sense. Please suggest.
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
TODOs:
Release note: