-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set the default log level to info instead of debug #75
Set the default log level to info instead of debug #75
Conversation
@openshift/sig-developer-experience ptal |
before we do this i'd love to know if there's a middle ground alternative because we have to debug registry push failures a lot (often not because of an actual issue in the registry that we could fix, but rather permission issues) and the debug info is super helpful when that happens. forcing people to redeploy their registry and recreate the failure so we can get logs is going to be painful. |
Can we bump permission issues up a level so they show by default?
We had 21Gb of logs from 10 days of runs - that’s high enough that I’d
expect us to cause disk pressure in a lot of environments, which raises the
possibility of other failures.
On Apr 15, 2018, at 11:26 PM, Ben Parees <notifications@github.com> wrote:
before we do this i'd love to know if there's a middle ground alternative
because we have to debug registry push failures a *lot* (often not because
of an actual issue in the registry that we could fix, but rather permission
issues) and the debug info is super helpful when that happens.
forcing people to redeploy their registry and recreate the failure so we
can get logs is going to be painful.
/cc @dmage <https://github.com/dmage> @legionus
<https://github.com/legionus> @smarterclayton
<https://github.com/smarterclayton>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#75 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AG7fCvQrmrNQIz6xjvm1Y2ZkQmsEASAeks5tpA92gaJpZM4TVz7S>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, but at first we need to fix how we log our errors. And unfortunately the upstream code needs to be fixed too. Otherwise we will not be able to explain the next "blob unknown to registry" error (did we tried to pull the blob from another registry? which one? and what about the local storage? etc.). The worst part: usually it's not the registry problem, but we still need to explain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All that is now in the log is necessary for investigations of various kinds of failures. Problems with the pullthrough, storage backend, authentication require the logging of different entities. In the current situation, all debug messages must be info, but all of them should remain in the log.
If there are problems with the disk space or expiration, then we should think about using the log server (syslog).
NAK
I'm guessing we don't have this level of control due to the docker distribution code, but it seems like a better solution would buffer the log messages and then only log the output if the overall request failed. Is such a thing possible for us here? |
We have a general log level policy, we do not ship components to customers that log at debug level excessively. Just because code has bugs doesn't mean that customer environments can tolerate logging everything in the world just on the off chance you catch a bug. Your error paths should certainly be logged by default. Important rejections or situational info can certainly be logged. But logging everything just because it's convenient for you causes problems for operators. |
Pulling some examples out so we can have more concrete discussion: One patch call:
Filesystem ones seem definitely not useful for the vast majority of debugging unless we have a fundamental error. The origin one might be useful, but only if it captured the result (which it doesn't look like it does?). Authorizing request doesn't seem useful for debugging. Head request:
None of the store level ones seem useful. Do we have better examples of normal path debug you wanted to see? If they are all under our control, just bump them to info, because they're not a sufficiently significant portion of the workload. If they're not, which ones have been most impactful? Just doing a completely unscientific pass over things that I doubt we'd ever care about, I was down to about 1/3 by excluding the top 10 or so boring messages. Does logrus allow us to omit any of the extra variables that we could just log once at startup? |
I can change of our log records from
No. This information is important because our users sometimes have problems with the storage. I have several times this year faced with the investigation of problems related to nfs and glusterfs.
And for me this log says that the
Due to bad design of docker/distribution cache, pullthrough and image prunning, errors can occur at each stage of the request processing. We can not give the docker client a real error. Therefore, we need detailed logging on the server side.
No. We control only the middlewares. The whole logic of parsing the request and preparing the answer is not under our control. Also we do not control the logic of the cache, BlobStore and everything that licks below the our middlewares. Basically we do not control everything inside
After that "throwing out" of records it is necessary to carry out the detailed analysis and to spend testing there will be no information for investigation of breakages. If your goal is simply to reduce the logs, then you set
No and the construction of the logger takes place outside our control area. P.S. Let me fork the |
I have another solution: let's set the loglevel to |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: coreydaley Assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The loglevel in this PR has now been set to error per @legionus comment above. |
why error and not info? I agree w/ the implied sentiment that any setting other than debug means we are signing up for a bad support experience, but is there a reason to completely cripple ourselves or was this just a sarcastic suggestion? |
Yes. We will highly degrade the quality of the support if we change the loglevel. I'm completely against it. I never put lgtm on this PR. You or Clayton can approve it and this will be your decision. You know, that our users can not give us registry logs even now. They either can not collect logs, or they send for an incorrect period of time. So as a result we ask all the logs that they have and we ourselves are looking for the right place. If we ask users to re-deploy the registry with debug loglevel and ask to reproduce the error, it means throwing away more than 80% of the error reports, because users simply can not do it correctly. What can I say, not all support engineers can do this correctly. And honestly, I do not understand what problem we are solving this PR.
Damn, it seems you know me too well. The reason for my sarcastic proposal is that since you guys want to take away the necessary logs, then for me there is no difference what loglevel will be specified. Let it be an |
@coreydaley: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@bparees should i just close this PR? |
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1567658