Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log: fix "panic" loglevel parsing for --zap-stacktrace-level option #3040

Merged

Conversation

vboulineau
Copy link
Contributor

Description of the change:
Delegate string parsing to zap, allowing any zap-supported level (for both --zap-level and --zap-stacktrace-level)

Motivation for the change:
The --zap-stacktrace-level option (#2319) is currently not working for any level > error as the parsing will not allow it.

I also think it's cleaner to delegate this level parsing to zap itself, it allows anything supported upstream.

Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shows great. It is just missing to add a fragment. See: https://github.com/operator-framework/operator-sdk/blob/master/changelog/fragments/00-template.yaml
It is used for we create the CHANGELOG for the releases.

@vboulineau
Copy link
Contributor Author

It shows great. It is just missing to add a fragment. See: https://github.com/operator-framework/operator-sdk/blob/master/changelog/fragments/00-template.yaml
It is used for we create the CHANGELOG for the releases.

I looked around for a new place but did not check the fragments folder. Thank you!

@joelanford
Copy link
Member

At one point, we made a decision to explicitly only support the set of zap log levels that overlap with supported logr.Logger levels.

I think there are cases where using levels that logr.Logger does not support can result in panics, but I'm not sure if that situation has changed since we looked at this.

@joelanford
Copy link
Member

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 14, 2020
@joelanford
Copy link
Member

Here's the original thread with the decision-making: #873 (comment)

@vboulineau
Copy link
Contributor Author

Here's the original thread with the decision-making: #873 (comment)

I more or less got the issue, seems to be a touchy issue, thanks for pointing out. I can use the new code only for zap-stacktrace-level, that should be safe.

@joelanford
Copy link
Member

@vboulineau can you describe what you're trying to accomplish? I may be misunderstanding something, but since logr specifically forbids logging with negative numbers, there's no supported way to log at any level beyond error in the direction of panic with logr.Logger.

What would be gained by allowing --zap-stacktrace-level to handle those decreasing verbosity levels, other than entirely disabling stacktraces?

I do wonder if it is even possible to disable stacktraces with the current flag value constraints. If that's ultimately what you're going for, perhaps we should add a new supported disabled value for the --zap-stacktrace-level flag.

@joelanford
Copy link
Member

Also FWIW, we've been working to upstream the logging flags into controller-runtime, and deprecate the SDK's zap flag support as part of our work to transition to Kubebuilder's project layouts.

In fact, it appears that controller-runtime may already have support for these levels.

@vboulineau
Copy link
Contributor Author

The original idea from #2319 was to allow disabling stack traces entirely, but as you suggested, it was better to keep stack traces for panic/fatal levels.

However, now it seems unlikely that users generate these logs (though it's still possible using .V(-3) as it goes to zapr which will convert it to 3 == panic).

I've tested upstream implementation in our operator and it works fine (already available in 0.17), so if you believe that's they way forward I can close that PR.

@joelanford
Copy link
Member

@vboulineau If you've got things working upstream, I think that's the best way forward. I was actually looking a little closer at that implementation though, and I think there's a bug in controller-runtime.

The -zap-log-level help text says "Zap Level to configure the verbosity of logging. Can be one of 'debug', 'info', 'error', or any integer value > 0 which corresponds to custom debug levels of increasing verbosity" but the implementation aligns the flag values directly with the zap levels and numbering scheme.

There's a similar discrepancy with the -zap-stacktrace-level flag.

In SDK, we designed the level flags to align with the logr levels, not the zap levels, because it is more intuitive to use --zap-log-level=3 to line up with log.V(3), at least in our opinion.

It also looks like it isn't currently possible to enable log levels with verbosity at log.V(2) and higher.

If these bugs in controller-runtime were fixed such that the implementation aligned with the help text, would that break your use-case and operator again?

@vboulineau
Copy link
Contributor Author

Indeed you're right, upstream is buggy, running this code:

	for i := -3; i <= 10; i++ {
		log.V(i).Info("### TEST LOGLEVEL V", "lvl", i)
	}

With --zap-log-level=debug:

2020-05-19T10:52:02.840+0200	DPANIC	cmd	### TEST LOGLEVEL V	{"lvl": -3}
2020-05-19T10:52:02.840+0200	ERROR	cmd	### TEST LOGLEVEL V	{"lvl": -2}
2020-05-19T10:52:02.840+0200	WARN	cmd	### TEST LOGLEVEL V	{"lvl": -1}
2020-05-19T10:52:02.840+0200	INFO	cmd	### TEST LOGLEVEL V	{"lvl": 0}
2020-05-19T10:52:02.840+0200	DEBUG	cmd	### TEST LOGLEVEL V	{"lvl": 1}

With --zap-log-level=3:

2020-05-19T10:52:21.213+0200	DPANIC	cmd	### TEST LOGLEVEL V	{"lvl": -3}
2020-05-19T10:52:21.213+0200	ERROR	cmd	### TEST LOGLEVEL V	{"lvl": -2}
2020-05-19T10:52:21.213+0200	WARN	cmd	### TEST LOGLEVEL V	{"lvl": -1}
2020-05-19T10:52:21.213+0200	INFO	cmd	### TEST LOGLEVEL V	{"lvl": 0}

Which is indeed problematic for logging with several debug levels.

I believe, fixing this would bring us back to the same state this PR aimed to solve, except if log-level and stacktrace-level have different allowed values.

Really the only feature we'd like to keep is remove stacktraces on "error" loglevel, we'd be happy with any solution that provides that.

I think two options are possible here:

  • I change this PR to only add a "panic" level to stacktrace, keep everything else as-is, should cover all use cases until controller-runtime is fixed and this package deprecated on your side.

  • Keep operator-sdk as-is until controller-runtime is fixed and this package deprecated on your side, meanwhile users need to wait for disable stacktrace or migrate to controller-runtime (loosing V(3+) for some time).

I believe first option is a bit better, but not a strong opinion, both makes sense. WDYT?

@joelanford
Copy link
Member

@vboulineau Option 1 sounds good to me.

@vboulineau vboulineau closed this May 25, 2020
@vboulineau vboulineau reopened this May 25, 2020
@vboulineau vboulineau changed the title log: fix log level parsing to allow all zap levels log: fix "panic" loglevel parsing for --zap-stacktrace-level option May 25, 2020
@vboulineau
Copy link
Contributor Author

@vboulineau Option 1 sounds good to me.

Implemented!

changelog/fragments/3040-fix-loglevel.yaml Outdated Show resolved Hide resolved
Comment on lines 145 to 169
func intLogLevel(l string) (int, error) {
lower := strings.ToLower(l)
var lvl int
switch lower {
case logLevelDebug:
lvl = -1
case logLevelInfo:
lvl = 0
case logLevelError:
lvl = 2
default:
i, err := strconv.Atoi(lower)
if err != nil {
return lvl, fmt.Errorf("invalid log level \"%s\"", l)
}

if i > 0 {
lvl = -1 * i
} else {
return lvl, fmt.Errorf("invalid log level \"%s\"", l)
}
}
return lvl, nil
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: move this function back to its original location (after Type()) to minimize changeset.

Comment on lines 177 to 189
lower := strings.ToLower(l)
var lvl int
switch lower {
case logLevelDebug:
lvl = -1
case logLevelInfo:
lvl = 0
case logLevelError:
lvl = 2
case logLevelPanic:
lvl = 4
default:
return fmt.Errorf("invalid stacktrace level \"%s\"", l)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, we would no longer support integer-based stacktrace levels, which would be a regression of an existing feature. Can we still support integer stacktrace levels for values greater than 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put back previous behaviour, so supporting int log level in stack trace.

pkg/log/zap/flags_test.go Outdated Show resolved Hide resolved
@vboulineau vboulineau force-pushed the bugfix/log_level branch 2 times, most recently from 7f93d7f to 506bf0e Compare June 2, 2020 08:42
@vboulineau
Copy link
Contributor Author

Fixed nits and rebased but CI is still failing on doc part, it does not seem related to this PR?

@joelanford
Copy link
Member

@vboulineau Can you rebase to the latest master? I think the issue you're running into has been fixed.

@vboulineau vboulineau force-pushed the bugfix/log_level branch 2 times, most recently from b1335c8 to 1314d54 Compare June 4, 2020 09:22
@joelanford
Copy link
Member

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 4, 2020
Copy link
Member

@joelanford joelanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM whenever CI decides it wants to pass 😄. Thanks for this!

@@ -26,6 +26,13 @@ import (
"k8s.io/klog"
)

const (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelanford may not be better we use here the "github.com/sirupsen/logrus" levels definition instead of creating our own const/values as it was done here for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these are int constants (https://github.com/sirupsen/logrus/blob/master/logrus.go#L96), string constants are directly inlined line 25

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're also dealing with zap and logr levels here, not logrus. There don't appear to be string constants for the levels in zap or logr. I think what we're doing here in this PR is fine, and is in fact an improvement in that we now actually have constants rather than inlining them like we have been.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough

@joelanford
Copy link
Member

@vboulineau Sorry about this but I think we're now running into another separate CI error that will hopefully be fixed by #3158

Might have to rebase one more time after that merges.

@vboulineau
Copy link
Contributor Author

@vboulineau Sorry about this but I think we're now running into another separate CI error that will hopefully be fixed by #3158

Might have to rebase one more time after that merges.

No worries, I'll check every now and then and rebase accordingly

Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 4, 2020
@camilamacedo86
Copy link
Contributor

Hi @vboulineau,

The issue faced in the CI here was fixed in #3158. Could you please rebased it with the master for the pr pass in the CI and allow us to merge it?

Signed-off-by: Vincent Boulineau <vincent.boulineau@datadoghq.com>
@openshift-ci-robot
Copy link

New changes are detected. LGTM label has been removed.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jun 8, 2020
@vboulineau
Copy link
Contributor Author

Hi @vboulineau,

The issue faced in the CI here was fixed in #3158. Could you please rebased it with the master for the pr pass in the CI and allow us to merge it?

Thank you, CI is green now!

@camilamacedo86 camilamacedo86 merged commit be71895 into operator-framework:master Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants