log: fix "panic" loglevel parsing for --zap-stacktrace-level option #3040

vboulineau · 2020-05-14T13:05:23Z

Description of the change:
Delegate string parsing to zap, allowing any zap-supported level (for both --zap-level and --zap-stacktrace-level)

Motivation for the change:
The --zap-stacktrace-level option (#2319) is currently not working for any level > error as the parsing will not allow it.

I also think it's cleaner to delegate this level parsing to zap itself, it allows anything supported upstream.

camilamacedo86

It shows great. It is just missing to add a fragment. See: https://github.com/operator-framework/operator-sdk/blob/master/changelog/fragments/00-template.yaml
It is used for we create the CHANGELOG for the releases.

vboulineau · 2020-05-14T14:12:30Z

It shows great. It is just missing to add a fragment. See: https://github.com/operator-framework/operator-sdk/blob/master/changelog/fragments/00-template.yaml
It is used for we create the CHANGELOG for the releases.

I looked around for a new place but did not check the fragments folder. Thank you!

joelanford · 2020-05-14T15:46:46Z

At one point, we made a decision to explicitly only support the set of zap log levels that overlap with supported logr.Logger levels.

I think there are cases where using levels that logr.Logger does not support can result in panics, but I'm not sure if that situation has changed since we looked at this.

joelanford · 2020-05-14T15:46:51Z

/hold

joelanford · 2020-05-14T15:53:29Z

Here's the original thread with the decision-making: #873 (comment)

vboulineau · 2020-05-14T16:31:19Z

Here's the original thread with the decision-making: #873 (comment)

I more or less got the issue, seems to be a touchy issue, thanks for pointing out. I can use the new code only for zap-stacktrace-level, that should be safe.

joelanford · 2020-05-15T01:45:37Z

@vboulineau can you describe what you're trying to accomplish? I may be misunderstanding something, but since logr specifically forbids logging with negative numbers, there's no supported way to log at any level beyond error in the direction of panic with logr.Logger.

What would be gained by allowing --zap-stacktrace-level to handle those decreasing verbosity levels, other than entirely disabling stacktraces?

I do wonder if it is even possible to disable stacktraces with the current flag value constraints. If that's ultimately what you're going for, perhaps we should add a new supported disabled value for the --zap-stacktrace-level flag.

joelanford · 2020-05-15T01:50:17Z

Also FWIW, we've been working to upstream the logging flags into controller-runtime, and deprecate the SDK's zap flag support as part of our work to transition to Kubebuilder's project layouts.

In fact, it appears that controller-runtime may already have support for these levels.

vboulineau · 2020-05-15T08:46:25Z

The original idea from #2319 was to allow disabling stack traces entirely, but as you suggested, it was better to keep stack traces for panic/fatal levels.

However, now it seems unlikely that users generate these logs (though it's still possible using .V(-3) as it goes to zapr which will convert it to 3 == panic).

I've tested upstream implementation in our operator and it works fine (already available in 0.17), so if you believe that's they way forward I can close that PR.

joelanford · 2020-05-19T02:30:04Z

@vboulineau If you've got things working upstream, I think that's the best way forward. I was actually looking a little closer at that implementation though, and I think there's a bug in controller-runtime.

The -zap-log-level help text says "Zap Level to configure the verbosity of logging. Can be one of 'debug', 'info', 'error', or any integer value > 0 which corresponds to custom debug levels of increasing verbosity" but the implementation aligns the flag values directly with the zap levels and numbering scheme.

There's a similar discrepancy with the -zap-stacktrace-level flag.

In SDK, we designed the level flags to align with the logr levels, not the zap levels, because it is more intuitive to use --zap-log-level=3 to line up with log.V(3), at least in our opinion.

It also looks like it isn't currently possible to enable log levels with verbosity at log.V(2) and higher.

If these bugs in controller-runtime were fixed such that the implementation aligned with the help text, would that break your use-case and operator again?

vboulineau · 2020-05-19T09:23:25Z

Indeed you're right, upstream is buggy, running this code:

	for i := -3; i <= 10; i++ {
		log.V(i).Info("### TEST LOGLEVEL V", "lvl", i)
	}

With --zap-log-level=debug:

2020-05-19T10:52:02.840+0200	DPANIC	cmd	### TEST LOGLEVEL V	{"lvl": -3}
2020-05-19T10:52:02.840+0200	ERROR	cmd	### TEST LOGLEVEL V	{"lvl": -2}
2020-05-19T10:52:02.840+0200	WARN	cmd	### TEST LOGLEVEL V	{"lvl": -1}
2020-05-19T10:52:02.840+0200	INFO	cmd	### TEST LOGLEVEL V	{"lvl": 0}
2020-05-19T10:52:02.840+0200	DEBUG	cmd	### TEST LOGLEVEL V	{"lvl": 1}

With --zap-log-level=3:

2020-05-19T10:52:21.213+0200	DPANIC	cmd	### TEST LOGLEVEL V	{"lvl": -3}
2020-05-19T10:52:21.213+0200	ERROR	cmd	### TEST LOGLEVEL V	{"lvl": -2}
2020-05-19T10:52:21.213+0200	WARN	cmd	### TEST LOGLEVEL V	{"lvl": -1}
2020-05-19T10:52:21.213+0200	INFO	cmd	### TEST LOGLEVEL V	{"lvl": 0}

Which is indeed problematic for logging with several debug levels.

I believe, fixing this would bring us back to the same state this PR aimed to solve, except if log-level and stacktrace-level have different allowed values.

Really the only feature we'd like to keep is remove stacktraces on "error" loglevel, we'd be happy with any solution that provides that.

I think two options are possible here:

I change this PR to only add a "panic" level to stacktrace, keep everything else as-is, should cover all use cases until controller-runtime is fixed and this package deprecated on your side.
Keep operator-sdk as-is until controller-runtime is fixed and this package deprecated on your side, meanwhile users need to wait for disable stacktrace or migrate to controller-runtime (loosing V(3+) for some time).

I believe first option is a bit better, but not a strong opinion, both makes sense. WDYT?

joelanford · 2020-05-19T15:13:12Z

@vboulineau Option 1 sounds good to me.

vboulineau · 2020-05-25T16:05:52Z

@vboulineau Option 1 sounds good to me.

Implemented!

changelog/fragments/3040-fix-loglevel.yaml

joelanford · 2020-06-01T14:49:46Z

pkg/log/zap/flags.go

+func intLogLevel(l string) (int, error) {
+	lower := strings.ToLower(l)
+	var lvl int
+	switch lower {
+	case logLevelDebug:
+		lvl = -1
+	case logLevelInfo:
+		lvl = 0
+	case logLevelError:
+		lvl = 2
+	default:
+		i, err := strconv.Atoi(lower)
+		if err != nil {
+			return lvl, fmt.Errorf("invalid log level \"%s\"", l)
+		}
+
+		if i > 0 {
+			lvl = -1 * i
+		} else {
+			return lvl, fmt.Errorf("invalid log level \"%s\"", l)
+		}
+	}
+	return lvl, nil
+}
+


Nit: move this function back to its original location (after Type()) to minimize changeset.

joelanford · 2020-06-01T14:52:54Z

pkg/log/zap/flags.go

+	lower := strings.ToLower(l)
+	var lvl int
+	switch lower {
+	case logLevelDebug:
+		lvl = -1
+	case logLevelInfo:
+		lvl = 0
+	case logLevelError:
+		lvl = 2
+	case logLevelPanic:
+		lvl = 4
+	default:
+		return fmt.Errorf("invalid stacktrace level \"%s\"", l)


With this change, we would no longer support integer-based stacktrace levels, which would be a regression of an existing feature. Can we still support integer stacktrace levels for values greater than 0?

I put back previous behaviour, so supporting int log level in stack trace.

pkg/log/zap/flags_test.go

vboulineau · 2020-06-02T08:54:45Z

Fixed nits and rebased but CI is still failing on doc part, it does not seem related to this PR?

joelanford · 2020-06-03T02:34:20Z

@vboulineau Can you rebase to the latest master? I think the issue you're running into has been fixed.

joelanford · 2020-06-04T13:03:16Z

/hold cancel

joelanford

LGTM whenever CI decides it wants to pass 😄. Thanks for this!

camilamacedo86 · 2020-06-04T13:20:37Z

pkg/log/zap/flags.go

@@ -26,6 +26,13 @@ import (
 	"k8s.io/klog"
 )

+const (


@joelanford may not be better we use here the "github.com/sirupsen/logrus" levels definition instead of creating our own const/values as it was done here for example?

I believe these are int constants (https://github.com/sirupsen/logrus/blob/master/logrus.go#L96), string constants are directly inlined line 25

We're also dealing with zap and logr levels here, not logrus. There don't appear to be string constants for the levels in zap or logr. I think what we're doing here in this PR is fine, and is in fact an improvement in that we now actually have constants rather than inlining them like we have been.

fair enough

joelanford · 2020-06-04T14:33:23Z

@vboulineau Sorry about this but I think we're now running into another separate CI error that will hopefully be fixed by #3158

Might have to rebase one more time after that merges.

vboulineau · 2020-06-04T14:34:47Z

@vboulineau Sorry about this but I think we're now running into another separate CI error that will hopefully be fixed by #3158

Might have to rebase one more time after that merges.

No worries, I'll check every now and then and rebase accordingly

camilamacedo86

/lgtm
/approve

camilamacedo86 · 2020-06-07T16:31:34Z

Hi @vboulineau,

The issue faced in the CI here was fixed in #3158. Could you please rebased it with the master for the pr pass in the CI and allow us to merge it?

Signed-off-by: Vincent Boulineau <vincent.boulineau@datadoghq.com>

openshift-ci-robot · 2020-06-08T07:58:33Z

New changes are detected. LGTM label has been removed.

vboulineau · 2020-06-08T08:59:25Z

Hi @vboulineau,

The issue faced in the CI here was fixed in #3158. Could you please rebased it with the master for the pr pass in the CI and allow us to merge it?

Thank you, CI is green now!

openshift-ci-robot requested review from jmccormick2001 and jmrodri May 14, 2020 13:05

camilamacedo86 reviewed May 14, 2020

View reviewed changes

vboulineau force-pushed the bugfix/log_level branch from 4d6c2a8 to 36bb94c Compare May 14, 2020 14:11

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 14, 2020

vboulineau closed this May 25, 2020

vboulineau force-pushed the bugfix/log_level branch from 36bb94c to b602297 Compare May 25, 2020 16:02

vboulineau reopened this May 25, 2020

vboulineau changed the title ~~log: fix log level parsing to allow all zap levels~~ log: fix "panic" loglevel parsing for --zap-stacktrace-level option May 25, 2020

vboulineau force-pushed the bugfix/log_level branch from c229ecb to b92a61c Compare May 27, 2020 10:16

joelanford reviewed Jun 1, 2020

View reviewed changes

vboulineau force-pushed the bugfix/log_level branch 2 times, most recently from 7f93d7f to 506bf0e Compare June 2, 2020 08:42

vboulineau force-pushed the bugfix/log_level branch 2 times, most recently from b1335c8 to 1314d54 Compare June 4, 2020 09:22

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 4, 2020

joelanford approved these changes Jun 4, 2020

View reviewed changes

camilamacedo86 reviewed Jun 4, 2020

View reviewed changes

camilamacedo86 approved these changes Jun 4, 2020

View reviewed changes

openshift-ci-robot assigned camilamacedo86 Jun 4, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 4, 2020

log: allow "panic" level for --zap-stacktrace-level

ead0212

Signed-off-by: Vincent Boulineau <vincent.boulineau@datadoghq.com>

vboulineau force-pushed the bugfix/log_level branch from 1314d54 to ead0212 Compare June 8, 2020 07:58

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jun 8, 2020

camilamacedo86 merged commit be71895 into operator-framework:master Jun 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log: fix "panic" loglevel parsing for --zap-stacktrace-level option #3040

log: fix "panic" loglevel parsing for --zap-stacktrace-level option #3040

vboulineau commented May 14, 2020

camilamacedo86 left a comment

vboulineau commented May 14, 2020

joelanford commented May 14, 2020

joelanford commented May 14, 2020

joelanford commented May 14, 2020

vboulineau commented May 14, 2020

joelanford commented May 15, 2020

joelanford commented May 15, 2020

vboulineau commented May 15, 2020

joelanford commented May 19, 2020

vboulineau commented May 19, 2020

joelanford commented May 19, 2020

vboulineau commented May 25, 2020

joelanford Jun 1, 2020

joelanford Jun 1, 2020

vboulineau Jun 2, 2020

vboulineau commented Jun 2, 2020

joelanford commented Jun 3, 2020

joelanford commented Jun 4, 2020

joelanford left a comment

camilamacedo86 Jun 4, 2020

vboulineau Jun 4, 2020

joelanford Jun 4, 2020

camilamacedo86 Jun 4, 2020

joelanford commented Jun 4, 2020

vboulineau commented Jun 4, 2020

camilamacedo86 left a comment

camilamacedo86 commented Jun 7, 2020

openshift-ci-robot commented Jun 8, 2020

vboulineau commented Jun 8, 2020

log: fix "panic" loglevel parsing for --zap-stacktrace-level option #3040

log: fix "panic" loglevel parsing for --zap-stacktrace-level option #3040

Conversation

vboulineau commented May 14, 2020

camilamacedo86 left a comment

Choose a reason for hiding this comment

vboulineau commented May 14, 2020

joelanford commented May 14, 2020

joelanford commented May 14, 2020

joelanford commented May 14, 2020

vboulineau commented May 14, 2020

joelanford commented May 15, 2020

joelanford commented May 15, 2020

vboulineau commented May 15, 2020

joelanford commented May 19, 2020

vboulineau commented May 19, 2020

joelanford commented May 19, 2020

vboulineau commented May 25, 2020

joelanford Jun 1, 2020

Choose a reason for hiding this comment

joelanford Jun 1, 2020

Choose a reason for hiding this comment

vboulineau Jun 2, 2020

Choose a reason for hiding this comment

vboulineau commented Jun 2, 2020

joelanford commented Jun 3, 2020

joelanford commented Jun 4, 2020

joelanford left a comment

Choose a reason for hiding this comment

camilamacedo86 Jun 4, 2020

Choose a reason for hiding this comment

vboulineau Jun 4, 2020

Choose a reason for hiding this comment

joelanford Jun 4, 2020

Choose a reason for hiding this comment

camilamacedo86 Jun 4, 2020

Choose a reason for hiding this comment

joelanford commented Jun 4, 2020

vboulineau commented Jun 4, 2020

camilamacedo86 left a comment

Choose a reason for hiding this comment

camilamacedo86 commented Jun 7, 2020

openshift-ci-robot commented Jun 8, 2020

vboulineau commented Jun 8, 2020