Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiline error messages are bad for logs #683

Closed
vorner opened this issue May 28, 2020 · 6 comments
Closed

Multiline error messages are bad for logs #683

vorner opened this issue May 28, 2020 · 6 comments
Labels

Comments

@vorner
Copy link

vorner commented May 28, 2020

Hello

If there's an error in syntax of regular expression, the Display of the Error tries to be helpful and outputs the original pattern with an arrow pointing to the place where it happened.

use regex::Regex;

fn main() {
    let e = Regex::new("This is wrong )[").unwrap_err();
    println!("There's an error: {}", e);
}
There's an error: regex parse error:
    This is wrong )[
                  ^
error: unopened group

However, I feel like being multiline is not something one would expect from error types. From a more practical perspective, this makes these errors mess up log files, where it is a rule of thumb to never write anything multiline to allow easy grepping through it.

I don't have an idea what exactly to do about the situation ‒ I understand some might like the helpful multiline messages, so removing them completely probably is not the best option, but I'd also like to be able to output errors to logs ‒ and do that in generic way, without knowing what exact error it is, or even have it as one of the error cause.

@BurntSushi
Copy link
Member

I don't necessarily have a problem with doing something that allows one to print an error message on a single line, but I don't think I'm willing to prioritize the log case over the better UX in the common case.

Personally, if it were me, I'd probably just split the error message into lines, drop everything but the last line and log that. I'm not sure the regex crate itself needs to do anything for that. I'm possibly open to other ideas.

@vorner
Copy link
Author

vorner commented May 29, 2020

Well, the problem with that is that:

  • At the time I'm printing the error, I no longer know it is the error from regex, because of Box<dyn Error + Send + Sync> (or one of the other crates for that). And it might actually be buried somewhere in the chain of error causes. So that makes special casing for regex a bit difficult.
  • I could probably do that to all errors. But that feels a bit like a wasted effort and would probably require me to do that at all places that I log any errors.

So I was thinking more in the lines of regex::Error::single_line(self) -> Self method that would „configure“ the error. It doesn't sound exactly right either, though so I hoped someone could come up with a better option.

@kchibisov
Copy link

kchibisov commented Jun 9, 2020

Personally, if it were me, I'd probably just split the error message into lines, drop everything but the last line and log that. I'm not sure the regex crate itself needs to do anything for that. I'm possibly open to other ideas.

Well the main issues here is that strategy to propagate errors is not useful for downstream to provide proper error messages that makes sense for the users, especially, when you try to throw them in GUI applications. Default formatting is bad, but it's subjective, but that's not how you can solve/improve that.

The main issue is that the information that Error is carrying can't be used by downstream other than just reading the actual debug impl and trying to parse everything from that String. Ideally Syntax error should send 'location' (offset in regex), 'kind of syntax error' (Information of what syntax error we've reached) and 'original regex string'. With those information, you can actually generate accurate error message by yourself quite easily.

So it should look more like.

pub enum Error {
    /// A syntax error.
    Syntax(String, usize, String),
    /// The compiled program exceeded the set size limit.
    /// The argument is the size limit imposed.
    CompiledTooBig(usize),
    /// Hints that destructuring should not be exhaustive.
    ///
    /// This enum may grow additional variants, so this makes sure clients
    /// don't count on exhaustive matching. (Otherwise, adding a new variant
    /// could break existing code.)
    #[doc(hidden)]
    __Nonexhaustive,
}

Where the first element in Syntax error is a source we were using to build the regex, the second is an offset of a syntax error in the first element, and the third is kind of that error, which could be some enum, like SyntaxErrorKind, but String is likely enough. Well, the first element might be redundant, but I guess nothing stops from sending it to.

Note, even ripgrep could benefit from that, so you can print error: part in red, so the actual error message will be more noticeable.

~ ➜ rg "hello("
regex parse error:
    hello(
         ^
error: unclosed group

@kchibisov
Copy link

kchibisov commented Jun 9, 2020

Oh, in addition, if default formatting should be changed, I guess library should print a bit more boring debug formatting. Right now it looks like that.

Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
    hello(
         ^
error: unclosed group
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)

However something like this makes a bit more sense for a library.

Syntax(source: "hello", offset: 5, kind: "unclosed group")

@BurntSushi
Copy link
Member

The main issue is that the information that Error is carrying can't be used by downstream other than just reading the actual debug impl and trying to parse everything from that String. Ideally Syntax error should send 'location' (offset in regex), 'kind of syntax error' (Information of what syntax error we've reached) and 'original regex string'. With those information, you can actually generate accurate error message by yourself quite easily.

You can do that today using regex-syntax. Otherwise, this seems like a separate issue from the one described by the OP. I've already said that changing the default formatting is off the table. The entire point of the multi-line error message is that it is the default. If it wasn't the default, it would be almost pointless because folks aren't going to know to opt into it.

If you really want to make an earnest argument against the default formatting, then please open another issue. Otherwise, I don't see a reason to continue down that path.

So I was thinking more in the lines of regex::Error::single_line(self) -> Self method that would „configure“ the error. It doesn't sound exactly right either, though so I hoped someone could come up with a better option.

@vorner This sounds like something I might be open to, but to be honest, it seems pretty ham-fisted to me. If you have a requirement that log messages are all one line, then it seems to me like you need to enforce that requirement on your end.

@BurntSushi
Copy link
Member

BurntSushi commented Mar 6, 2023

I'm going to close this for now.

I remain open to the idea of exposing some way of emitting single-line error messages. A mutator on the error type itself kind of seems non-ideal. Another possibility is a global flag or perhaps even an option on RegexBuilder, which appeals more to me personally. If someone wants to make a more flushed out proposal, I'd be happy to revisit this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants