Improve error messages and log levels #1984

Firstyear · 2023-08-15T08:59:56Z

I think we need to do a bit of an audit of our logging messages. Currently we have a bit of a mix of things, and that leads to great messages like:

Aug 15 18:54:55 topaz.prd.blackhats.net.au kanidm_unixd[1468]: 00000000-0000-0000-0000-000000000000 ERROR    🚨 [error]: json error -> Error("expected value", line: 1, column: 1)

Which even as the author of most of the project, I have no idea what the error was since we don't list the file name.

I think we need a big effort to update messages to:

info -> messages that are useful for an admin to be aware of, but not critical. This could be auditing information too
warn -> messages that an admin should be aware of but may not require immediate action. These messages should describe why the warning occured and possible ways to resolve/investigate it.
error -> messages that show a critical issues that must be actioned. This must explain what the error is, why it occured, and actionable steps.
debug -> Messages that may improve the experience for an admin to resolve errors or issues in the server. Generally they should add context to surrounding messages. any admin problem that can't be solved at debug level is a bug.
trace -> Messages for developers. These are compiled out in release builds. These show excruciating levels of detail about server internals, and likely should provide file/line numbers of instances and rely on the dev having the ability to link context to the message.

The text was updated successfully, but these errors were encountered:

yaleman · 2023-08-15T09:41:02Z

I wholeheartedly agree with this plan 💟

I have no idea what the error was since we don't list the file name.

Yeaaaah, I've been progressively making these unique over time, but that one's a doozie 😄

Firstyear · 2023-08-15T23:29:47Z

I think we need a way to id updated error messages. We probably need to consider error type updates too in this.

Firstyear · 2023-08-15T23:40:34Z

To expand on that last thought. I was thinking instead of OperationError we should have unique error codes for every error site. This way we can embed more data via traits like https://github.com/dtolnay/thiserror

My thinking is we can then have categories of error too like:

use thiserror::Error;

#[derive(Error, Debug)]
pub enum DataStoreError {
    #[error("serde error")]
    F_SERDE0001(#[from] io::Error),

    #[error("the entity for key `{0}` is not available")]
    I_REFINT0001(Uuid),
}

We can the make every error site a unique code, and then we can class errors. It would be interesting if there is a way to nest these lik:

enum Error {
    FAULT ( ... ),
    INPUT ( ... ),
}

That way we can have HTTP status codes set based on the Error class. The classes I was thinking are:

INPUT - a user input is invalid and can not proceed
FAULT - a server error has occured that requires attention

And probably more I'll think of later.

The reason I was thinking unique codes, is then we can also have the what/why/how steps in the error descriptor, so we can present this properly to clients and logs, and it means if someone says "Hey, I keep hitting E_ATTRUNIQUE0003" we can precisely locate exactly what error it is in the code too.

I think it'll make for more work to add errors, but it'll allow better error management and fault resolution steps. What do you think?

Firstyear · 2023-08-15T23:43:25Z

I also wonder if there are possibly ways we could have errors that are "resolveable" where we can then queue up a fault management task to auto-resolve the issues?

yaleman · 2023-08-16T00:05:29Z

That way we can have HTTP status codes set based on the Error class. The classes I was thinking are:

INPUT - a user input is invalid and can not proceed

FAULT - a server error has occured that requires attention

And probably more I'll think of later.

The reason I was thinking unique codes, is then we can also have the what/why/how steps in the error descriptor, so we can present this properly to clients and logs, and it means if someone says "Hey, I keep hitting E_ATTRUNIQUE0003" we can precisely locate exactly what error it is in the code too.

I think it'll make for more work to add errors, but it'll allow better error management and fault resolution steps. What do you think?

I definitely think the error codes idea is a good one - windows event codes or rust's compiler codes are very good examples of prior art.

Firstyear · 2023-08-16T00:14:57Z

Okay, lets do it then. I think we make a dedicated kanidm_error library, then everything else can draw on that. I think we can have a bridge for now for the new error type into OperationError and then we can work from "bottom up" replacing everying in the main server in small batches. How does that sound?

yaleman · 2023-08-16T00:22:40Z

Sounds like a plan to me 🥳

Firstyear mentioned this issue Aug 15, 2023

API consistency #1954

Open

yaleman mentioned this issue Aug 16, 2023

Improve error messages #969

Closed

yaleman mentioned this issue Sep 12, 2023

i18n in kanidm #2105

Open

yaleman mentioned this issue Oct 3, 2023

Link to help pages #2172

Open

This was referenced Oct 6, 2023

Fault Management Subsystem #2183

Open

Audit log messages #2185

Open

Firstyear added this to the 1.2.0 milestone Feb 11, 2024

Firstyear added enhancement New feature or request internals Deep wizardry kanidmd Server-side things labels Feb 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve error messages and log levels #1984

Improve error messages and log levels #1984

Firstyear commented Aug 15, 2023

yaleman commented Aug 15, 2023

Firstyear commented Aug 15, 2023

Firstyear commented Aug 15, 2023

Firstyear commented Aug 15, 2023

yaleman commented Aug 16, 2023

Firstyear commented Aug 16, 2023

yaleman commented Aug 16, 2023

Improve error messages and log levels #1984

Improve error messages and log levels #1984

Comments

Firstyear commented Aug 15, 2023

yaleman commented Aug 15, 2023

Firstyear commented Aug 15, 2023

Firstyear commented Aug 15, 2023

Firstyear commented Aug 15, 2023

yaleman commented Aug 16, 2023

Firstyear commented Aug 16, 2023

yaleman commented Aug 16, 2023