Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors to be retried are polluting the log output #2224

Closed
ggallen opened this issue Jan 30, 2024 · 5 comments · Fixed by #2230
Closed

Errors to be retried are polluting the log output #2224

ggallen opened this issue Jan 30, 2024 · 5 comments · Fixed by #2230
Milestone

Comments

@ggallen
Copy link

ggallen commented Jan 30, 2024

Bug Report

What did you do?

When reconciliation happens, conflicts (i.e. unhandled exceptions) may occur. When these do occur, the SDK schedules a retry.

However, the SDK still emits an error to the log.

What did you expect to see?

What I expect to see depends upon the current retry state:

  • If a retry is to be attempted, no errors should be emitted to the log. Instead, a warning should be written that lists the error and indicates that a retry will be attempted. The number of the attempted retry might be interesting information as well.
  • If retries are exhausted, the error should be emitted as it is today (with the traceback, etc). It also might include that retries are exhausted and indicate how many retries were attempted.

What did you see instead? Under which circumstances?

Currently, every failure emits an error. This leads to pollution in the log and makes it impossible to tell what is a real error and what is an error that will be retried.

Environment

OSD/minikube.

$ Mention java-operator-sdk version from pom.xml file

<quarkus.platform.version>3.5.0</quarkus.platform.version>

$ java -version

openjdk 17.0.8 2023-07-18
OpenJDK Runtime Environment (Red_Hat-17.0.8.0.7-1.fc38) (build 17.0.8+7)
OpenJDK 64-Bit Server VM (Red_Hat-17.0.8.0.7-1.fc38) (build 17.0.8+7, mixed mode, sharing)

$ kubectl version

Client Version: v1.28.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.26.1
WARNING: version difference between client (1.28) and server (1.26) exceeds the supported minor version skew of +/-1

Possible Solution

Additional context

@csviri csviri added this to the 5.0 milestone Jan 30, 2024
@csviri
Copy link
Collaborator

csviri commented Jan 30, 2024

I see this more like an improvement not a bug.

I agree that this make sense from the point of view that conflicts naturally happen in these systems. So we could log just a warning with a message for KubernetesClient exceptions that are for conflicts. And log the whole exception as error on debug level, for those who still want to see more information.
Conflicts in some cases still might to be result of issues with the reconciler implementation, but agree that is something to deal with using the debug level logs.

would be interesting to hear other opinions. cc @metacosm maybe @andreaTP

@andreaTP
Copy link
Collaborator

I see this more like an improvement not a bug.

Agree 👍

A minimal change, like changing the Logger name for retries vs. final failure, will make this behavior completely tunable by the end user providing an appropriate logging configuration. Or am I missing an invariant?

@ggallen
Copy link
Author

ggallen commented Jan 30, 2024

I see this more like an improvement not a bug.

I guess I don't disagree either. But the only choices I saw when creating an issue were to create a bug report or a security vulnerability. I picked bug.

I will, however, point out that the the current logging implementation makes it impossible to find what's a real error and what's going to be retried (and possibly succeeded on retry). As it looks now, there are errors in my log, and they should probably be investigated. But are they really errors? This is dangerous, and how real errors end up getting missed or ignored. I spent a couple of days debugging before I actually found out things might be working as expected. I assume they are, only because at the end my resource looks correct. But is there some other error in there? Who knows?

will make this behavior completely tunable by the end user providing an appropriate logging configuration.

I think this being tunable is OK, but I think the default should be as I described. If you get to tuning it, you probably know what you are doing. But out-of-the-box you shouldn't be presenting errors that aren't really errors.

@andreaTP
Copy link
Collaborator

you shouldn't be presenting errors that aren't really errors.

I agree that showing those errors should be done at a different level, like debug and with an explicit mention that the operation will be attempted again.

Would you be willing to submit a PR to improve this aspect?
Seems like a straight forward improvement to me but happy to hear @csviri opinion too!

@csviri
Copy link
Collaborator

csviri commented Jan 31, 2024

I basically agree with the approach, that by default we don't log the message maybe just a short warning note for conflicts. And have an opt-in option to allow debug all errors on debug level.

If any of you would like to do a PR that would be great, but I can do it too.

@csviri csviri modified the milestones: 5.0, 4.8 Jan 31, 2024
@csviri csviri linked a pull request Feb 2, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants