Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lowering log levels and outlining a guideline for log levels in general #2434

Closed
celesteking opened this issue May 23, 2018 · 3 comments
Closed

Comments

@celesteking
Copy link
Contributor

Hey. Let's have a discussion on log levels over the HK codebase.
My point is that there should be a base "alert the sysadmin" log level established that important messages requiring sysadm attention should be logged with.

For example, mind the following use cases:

  • A plugin connects to some external service, e.g., clamav or SA plugin. Remote service (spamd, clamd, etc.) is unreachable because, e.g., it's down, but it should be up.
  • Sysadmin sends an internalcmd (flush the queue, delete the message, reconfigure, etc...) to HK daemon and gets a reply "check the logs". Command doesn't succeed.
  • A plugin is misbehaving because there's a bug in the code. It doesn't work as planned by the creator.

Those should be logged with ERROR level. This level means something is not right, but server operation continues. Some functionality may be lost, e.g., messages won't be scanned for spam, ES won't receive events, etc.

Next group:

  • Any user-supplied data. Like incorrectly crafted rcpt or envelope addresses. Message headers containing invalid data. Incorrect encoding, line feeds not ending with the correct ending.
  • Anything that is derived from user-supplied data. E.g., failed MX lookup of misspelled recipient domain. A bounce occurring because of regular reasons.
  • User restrictions. Like max message size exceeded, rate limit hit, access restrictions.

These should be logged with NOTICE(?) level.

Next group:

  • Something that affects basic functionality, possibly temporarily, but is very important. E.g., out of disk space situation, a plugin that sources routing/authentication information from redis and the redis server is down.

Log these with CRIT level.

Next group:

  • "Can't start" or abnormal termination events: Can't listen on IP, exception in base code (either on start or while in operation). Same as with CRIT, but permanent (won't resolve by itself).

Log with EMERG or ALERT level (possibly only once, as server probably won't be able to continue running).
Tell me what you think.

@msimerson
Copy link
Member

+1. I think we're not a long ways away from this.

@msimerson
Copy link
Member

I think your message above should be added to Logging.md under a Guidlines heading.

@msimerson
Copy link
Member

Moved to wiki

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants