Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error messages and log levels #1984

Open
Firstyear opened this issue Aug 15, 2023 · 7 comments
Open

Improve error messages and log levels #1984

Firstyear opened this issue Aug 15, 2023 · 7 comments
Labels
enhancement New feature or request internals Deep wizardry kanidmd Server-side things
Milestone

Comments

@Firstyear
Copy link
Member

I think we need to do a bit of an audit of our logging messages. Currently we have a bit of a mix of things, and that leads to great messages like:

Aug 15 18:54:55 topaz.prd.blackhats.net.au kanidm_unixd[1468]: 00000000-0000-0000-0000-000000000000 ERROR    🚨 [error]: json error -> Error("expected value", line: 1, column: 1)

Which even as the author of most of the project, I have no idea what the error was since we don't list the file name.

I think we need a big effort to update messages to:

  • info -> messages that are useful for an admin to be aware of, but not critical. This could be auditing information too
  • warn -> messages that an admin should be aware of but may not require immediate action. These messages should describe why the warning occured and possible ways to resolve/investigate it.
  • error -> messages that show a critical issues that must be actioned. This must explain what the error is, why it occured, and actionable steps.
  • debug -> Messages that may improve the experience for an admin to resolve errors or issues in the server. Generally they should add context to surrounding messages. any admin problem that can't be solved at debug level is a bug.
  • trace -> Messages for developers. These are compiled out in release builds. These show excruciating levels of detail about server internals, and likely should provide file/line numbers of instances and rely on the dev having the ability to link context to the message.
@yaleman
Copy link
Member

yaleman commented Aug 15, 2023

I wholeheartedly agree with this plan 💟

I have no idea what the error was since we don't list the file name.

Yeaaaah, I've been progressively making these unique over time, but that one's a doozie 😄

Screenshot 2023-08-15 at 19 36 51

@Firstyear
Copy link
Member Author

I think we need a way to id updated error messages. We probably need to consider error type updates too in this.

@Firstyear
Copy link
Member Author

To expand on that last thought. I was thinking instead of OperationError we should have unique error codes for every error site. This way we can embed more data via traits like https://github.com/dtolnay/thiserror

My thinking is we can then have categories of error too like:

use thiserror::Error;

#[derive(Error, Debug)]
pub enum DataStoreError {
    #[error("serde error")]
    F_SERDE0001(#[from] io::Error),

    #[error("the entity for key `{0}` is not available")]
    I_REFINT0001(Uuid),
}

We can the make every error site a unique code, and then we can class errors. It would be interesting if there is a way to nest these lik:

enum Error {
    FAULT ( ... ),
    INPUT ( ... ),
}

That way we can have HTTP status codes set based on the Error class. The classes I was thinking are:

  • INPUT - a user input is invalid and can not proceed
  • FAULT - a server error has occured that requires attention

And probably more I'll think of later.

The reason I was thinking unique codes, is then we can also have the what/why/how steps in the error descriptor, so we can present this properly to clients and logs, and it means if someone says "Hey, I keep hitting E_ATTRUNIQUE0003" we can precisely locate exactly what error it is in the code too.

I think it'll make for more work to add errors, but it'll allow better error management and fault resolution steps. What do you think?

@Firstyear
Copy link
Member Author

I also wonder if there are possibly ways we could have errors that are "resolveable" where we can then queue up a fault management task to auto-resolve the issues?

@yaleman
Copy link
Member

yaleman commented Aug 16, 2023

That way we can have HTTP status codes set based on the Error class. The classes I was thinking are:

  • INPUT - a user input is invalid and can not proceed
  • FAULT - a server error has occured that requires attention

And probably more I'll think of later.

The reason I was thinking unique codes, is then we can also have the what/why/how steps in the error descriptor, so we can present this properly to clients and logs, and it means if someone says "Hey, I keep hitting E_ATTRUNIQUE0003" we can precisely locate exactly what error it is in the code too.

I think it'll make for more work to add errors, but it'll allow better error management and fault resolution steps. What do you think?

I definitely think the error codes idea is a good one - windows event codes or rust's compiler codes are very good examples of prior art.

@Firstyear
Copy link
Member Author

Okay, lets do it then. I think we make a dedicated kanidm_error library, then everything else can draw on that. I think we can have a bridge for now for the new error type into OperationError and then we can work from "bottom up" replacing everying in the main server in small batches. How does that sound?

@yaleman
Copy link
Member

yaleman commented Aug 16, 2023

Sounds like a plan to me 🥳

This was referenced Oct 6, 2023
@Firstyear Firstyear added this to the 1.2.0 milestone Feb 11, 2024
@Firstyear Firstyear added enhancement New feature or request internals Deep wizardry kanidmd Server-side things labels Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request internals Deep wizardry kanidmd Server-side things
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants