Skip to content

DelegatingLogger reports incorrect source lines when zap.AddCaller() option is used #1737

@dobsonj

Description

@dobsonj

I noticed an issue with logging in controller-runtime when source line logging is enabled via the zap.AddCaller() option.

For context, I was debugging local-storage-operator, with the following options passed into SetLogger:

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/main.go#L79-L87

	opts := zap.Options{
		Development: true,
		ZapOpts:     []zaplog.Option{zaplog.AddCaller()},
	}

	opts.BindFlags(flag.CommandLine)
	flag.Parse()

	ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))

During startup, it prints log messages like this, attributing them to controller/controller.go:221

2021-12-02T22:17:24.408Z        INFO    controller-runtime.manager.controller.localvolume       controller/controller.go:221    Starting EventSource    {"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolume", "source": "kind source: /, Kind="}
2021-12-02T22:17:24.408Z        INFO    controller-runtime.manager.controller.localvolume       controller/controller.go:221    Starting EventSource    {"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolume", "source": "kind source: /, Kind="}

The problem is that line 221 is not the line where the log message is:

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go#L221

That is the parent function of the line where the actual log message is:

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go#L165

So I would expect the log messages to report controller/controller.go:165 instead of controller/controller.go:221 when the AddCaller option is present. This is one example, but the problem applies to other log messages as well.

The logr.Logger object is setting callerSkip to 2... so I added debug.PrintStack to AddCallerSkip(), and these were the places where it was provided a non-zero argument:

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go#L234

goroutine 1 [running]:
runtime/debug.Stack(0x1ed08f0, 0xc0002fddd0, 0xc000590d00)
        /usr/lib/golang/src/runtime/debug/stack.go:24 +0x9f
runtime/debug.PrintStack()
        /usr/lib/golang/src/runtime/debug/stack.go:16 +0x25
go.uber.org/zap.AddCallerSkip(0x1, 0x1ec45e0, 0xc000010020)
        /go/src/github.com/openshift/local-storage-operator/vendor/go.uber.org/zap/options.go:110 +0x25
sigs.k8s.io/controller-runtime/pkg/log/zap.NewRaw(0xc00020bcf8, 0x1, 0x1, 0xc0004a8320)
        /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go:234 +0x106
sigs.k8s.io/controller-runtime/pkg/log/zap.New(0xc00020bcf8, 0x1, 0x1, 0x1, 0x0)
        /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go:43 +0x3f
main.main()
        /go/src/github.com/openshift/local-storage-operator/main.go:93 +0x3f4

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/github.com/go-logr/zapr/zapr.go#L182

goroutine 1 [running]:
runtime/debug.Stack(0x6, 0x8, 0xc000339920)
        /usr/lib/golang/src/runtime/debug/stack.go:24 +0x9f
runtime/debug.PrintStack()
        /usr/lib/golang/src/runtime/debug/stack.go:16 +0x25
go.uber.org/zap.AddCallerSkip(0x1, 0xc00000ff50, 0x1ec5120)
        /go/src/github.com/openshift/local-storage-operator/vendor/go.uber.org/zap/options.go:110 +0x25
github.com/go-logr/zapr.newLoggerWithExtraSkip(0xc000339920, 0x1, 0x1, 0xc000339920)
        /go/src/github.com/openshift/local-storage-operator/vendor/github.com/go-logr/zapr/zapr.go:172 +0x2f
github.com/go-logr/zapr.NewLogger(...)
        /go/src/github.com/openshift/local-storage-operator/vendor/github.com/go-logr/zapr/zapr.go:182
sigs.k8s.io/controller-runtime/pkg/log/zap.New(0xc00020bcf8, 0x1, 0x1, 0x1, 0x0)
        /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go:43 +0x56
main.main()
        /go/src/github.com/openshift/local-storage-operator/main.go:93 +0x3f4

As a quick test, I tried reducing callerSkip to 1 by passing in zaplog.AddCallerSkip(-1):

	ZapOpts:     []zaplog.Option{zaplog.AddCaller(), zaplog.AddCallerSkip(-1)},

And that resulted in the correct line being logged:

2021-12-02T23:47:31.676Z        INFO    controller-runtime.manager.controller.localvolume  controller/controller.go:165     Starting EventSource    {"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolume", "source": "kind source: /, Kind="}
2021-12-02T23:47:31.676Z        INFO    controller-runtime.manager.controller.localvolume  controller/controller.go:165     Starting EventSource    {"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolume", "source": "kind source: /, Kind="}

As I understand it, callerSkip is being set to 2 on the assumption that we're using the wrapper functions here:

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go#L127

If we use the DelegatingLogger functions, then callerSkip = 2 is correct.
But any time WithName() or WithValues() is called, it returns the underlying logr.Logger object:

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go#L183

Which means we're not using the DelegatingLogger wrapper functions anymore, we're calling go-logr methods instead without wrappers, and in that case callerSkip = 2 is incorrect, and the parent function ends up getting logged instead of the line where the log statement is.

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions