Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNT]: User feedback should use warnings, not logs #23422

Open
mwaskom opened this issue Jul 12, 2022 · 41 comments · May be fixed by #27073
Open

[MNT]: User feedback should use warnings, not logs #23422

mwaskom opened this issue Jul 12, 2022 · 41 comments · May be fixed by #27073

Comments

@mwaskom
Copy link

mwaskom commented Jul 12, 2022

Summary

There are at least two places where matplotlib provides a message to the user about usage patterns that are potentially mistakes using logging functionality, rather than as a warning.

Two examples I've encountered recently:

I don't think logging is the right mechanism here:

  • The feedback provides very little information about the source of the problem, you just get some text in stderr.
  • Related, you can convert a warning to an error and then drop into a debugger to figure out what's happening; I don't think that's the case with logging?
  • Warnings can be intercepted by third party libraries that are doing something on purpose (i.e., passing numbers-as-strings to use a categorical axis), whereas logs are harder to muffle without intervening.

Proposed fix

It's possible I am wrong here, but warnings.warn feels like the right mechanism for both of these pieces of feedback.

@story645
Copy link
Member

So agree w/ you and also the discussion in/around #13026 was that the categorical warning shouldn't be a warning since it's the expected/intended behavior and that's why it became info and it maybe shouldn't even exist. meeting notes about it

@mwaskom
Copy link
Author

mwaskom commented Jul 13, 2022

Ah thanks missed that original discussion.

It seems like @jklymak made the opposite argument from point 3 here, that logging is preferable because it's easier for knowledgable users to disable it. I'm not sure I understand that argument? Personally I find Python's logging interface confounding so I'm not sure I share the intuition. But my concern here is primarily about 3rd party libraries, who really shouldn't be messing with matplotlib's logging settings IMO.

It also feels odd to me to be logging to communicate with a user in the moment ... logging feels more about communicating with a user in the future. But that's maybe just a personal inclination.

(I'd also prefer if the categorical warning were simply removed, although I appreciate that it's a source of confusion and gather that the ship has sailed).

@jklymak
Copy link
Member

jklymak commented Jul 13, 2022

But my concern here is primarily about 3rd party libraries, who really shouldn't be messing with matplotlib's logging settings IMO.

I am by no means an expert in deciding on the best way to pass info between libraries. But I'm not clear how messing with Matplotlib's logging is any worse than messing with Matplotlib's warnings? https://docs.python.org/3/howto/logging.html is the basic guidance.

For the categorical info message, that really seems to not be something the user should see unless they are debugging. I think its a little more urgent than "debug", which barfs out a lot of output, but not as urgent as a "warning". For the default logging config, users should not see this message at all, only if they are in the habit of bumping their log level to see what is breaking.

For the RGB message, that seems something a user is more likely to get wrong than a library, in which case the log message seems the best place to put this according to my parsing of the guidelines?

But I don't feel strongly about any of this, and I am not speaking from a wealth of experience dealing with maintaining a downstream library...

@mwaskom
Copy link
Author

mwaskom commented Jul 13, 2022

Well the warnings module has a context manager for temporarily changing warning filters. I’m not aware of anything similar for logging?

@mwaskom
Copy link
Author

mwaskom commented Jul 13, 2022

For the RGB message, that seems something a user is more likely to get wrong than a library, in which case the log message seems the best place to put this according to my parsing of the guidelines?

Maybe, although the proximate cause for this issue was someone opening a seaborn issue about this log that I was (eventually) able to trace down to a problem in pandas. Which would have been a lot easier if I could have just flipped a switch to make the warning an error and get a stack trace. I agree that in general that c=rgb_tuple is not something a library should be doing on purpose.

@mwaskom
Copy link
Author

mwaskom commented Jul 13, 2022

Another case where it is useful to turn warnings into errors is in test suites.

@story645
Copy link
Member

For the default logging config, users should not see this message at all, only if they are in the habit of bumping their log level to see what is breaking

I'm now confused then on why the categorical message is a log and not a warning, given it was specifically aimed at novice users, who presumably aren't messing w/ log levels.

@jklymak
Copy link
Member

jklymak commented Jul 13, 2022

For the categorical issue, warnings show by default and you don't want warnings to show up if the user is doing something they indeed mean to be doing.

@story645
Copy link
Member

For the categorical issue, warnings show by default and you don't want warnings to show up if the user is doing something they indeed mean to be doing.

this was always the problem w/ this warning though, right? that it's the totally expected behavior of the library and also something many users may not have intended. And the users who may not have intended it and therefore may benefit most from seeing the warning, are maybe least likely to see the warning since it's an info log. Just raising that I think there's maybe a UX issue here, though also wondering if at this point it can be removed given categoricals went in in 2.1

@mwaskom
Copy link
Author

mwaskom commented Jul 13, 2022

I think you can use warnings and satisfy your desire for a plt.helpful mode by defining a custom warning class (MatplotlibSharpEdgeWarning 😉 ) and setting filters on it.

I realized why I am seeing the categorical log even when not asking for it; some transitive dependency does logging.basicConfig(“INFO”) which activates a lot more verbosity I didn’t want :(

@timhoffm
Copy link
Member

timhoffm commented Jul 21, 2022

From https://docs.python.org/3/howto/logging.html#when-to-use-logging

The best tool for the task:

  • warnings.warn() in library code if the issue is avoidable and the client application should be modified to eliminate the warning
  • logging.warning() if there is nothing the client application can do about the situation, but the event should still be noted

Applied to the cases:

  • Passing string-formatted numbers as data
    It is a valid use case, but most of the time not what the user intended. One can argue both ways:

    • We don't want false positives -> do not show the warning by default -> logging
    • We want to show the warning by default to warn most users -> warnings.warn, and we accept some false positives, which these users could filter out.

    I'm now inclined towards the second way.

  • Passing an RGB tuple to the c parameter of scatter
    There are unambiguous ways for the user to specify the color (see the message). -> User should take action -> We should use warnings.warn()

@jklymak
Copy link
Member

jklymak commented Jul 21, 2022

The rgb case I agree should just be a warning

Still not convinced about categorical. The point was if someone does something and it behaves unexpectedly we can ask them to turn logging on. If someone does something and it works, and is legitimate, I don't think they should have to filter a warning.

@timhoffm
Copy link
Member

timhoffm commented Jul 22, 2022

Usually I'd argee with you that legitimate use cases should not issue warnings.

But weighting the likelihood, we may want to deviate from this:
I assume, the misuse (accidentally passing strings but intending numbers) is far more likely than the legitimate use (wants string labels that happen to be numbers).

If we strictly follow the above rule, we make life much harder for the majority of users. You say "The point was if someone does something and it behaves unexpectedly we can ask them to turn logging on." That's very far down the line from the problem: The user sees a strange output. Then search for a solution. Maybe they find our FAQ or they ask on discourse or open an issue. It would be much more efficient for them if we would issue a warning right away.

OTOH I'd argue that number labels is a special enough case that we can burden users to explicitly consent to this. - They may just live with the warning or, suppress it with a filter (which we should describe in the warning message).

If we want to try being more distinctive, one could make this dependent on the plot type: I'd say that numeric labels are mostly for categorical data, and these are usually visualized in descrete plots like bar(), boxplot(), violinplot(). So one could leave out the warning for them. But I'm not sure this is worth the additional effort.

@mwaskom
Copy link
Author

mwaskom commented Jul 22, 2022

The point was if someone does something and it behaves unexpectedly we can ask them to turn logging on.

I still think logging is the wrong mechanism here, but this would suggest DEBUG and not INFO is the right level.

one could make this dependent on the plot type

This has some appeal and logic seems relatively sound, but I would expect that this might get complicated and difficult to maintain. One other distinction that you could draw is between calling a plotting method vs. calling Axis.update_units directly. If the user is doing the latter, I think it's safe to say they know what they're doing and really do want numbers-as-strings.

@jklymak
Copy link
Member

jklymak commented Jul 22, 2022

I still think logging is the wrong mechanism here, but this would suggest DEBUG and not INFO is the right level.

Well the nice thing about INFO is that it is not super cluttered, whereas DEBUG is very verbose. Again, the default level is WARNING, so most users shouldn't see this log message. Asking folks who have a mysterious problem to turn on logging to INFO should give them something useful. DEBUG is largely for the devs.

Usually I'd argee with you that legitimate use cases should not issue warnings.

The original problem is numbers (or dates) being stored in a list or object array as strings. Unfortunately, many csv file decoders will return these as text if you don't specify the dtype, and a lot of people use csv files.

If we really think that 99.9% of the users who pass plot(['1', '2', '3']) mean for that to be a float array, such that we are willing to issue a warning to the 0.1% who mean for those to be categories, maybe we should go back to the original idea of just converting the array to float (or datetime64) and providing a mechanism to force the units to be categorical.

timhoffm added a commit to timhoffm/matplotlib that referenced this issue Nov 11, 2022
 c)

When to use what:

- warnings.warn() in library code if the issue is avoidable and the
  client application should be modified to eliminate the warning
- logging.warning() if there is nothing the client application can
  do about the situation, but the event should still be noted

There are unambiguous ways for the user to specify the color (see the
message). Here, the user should take action and switch from using *c*
to using *color*. Thus warnings.warn() is the way to go.

Closes half of matplotlib#23422.
timhoffm added a commit to timhoffm/matplotlib that referenced this issue Nov 12, 2022
 c)

When to use what:

- warnings.warn() in library code if the issue is avoidable and the
  client application should be modified to eliminate the warning
- logging.warning() if there is nothing the client application can
  do about the situation, but the event should still be noted

There are unambiguous ways for the user to specify the color (see the
message). Here, the user should take action and switch from using *c*
to using *color*. Thus warnings.warn() is the way to go.

Closes half of matplotlib#23422.
timhoffm added a commit to timhoffm/matplotlib that referenced this issue Nov 12, 2022
 c)

When to use what:

- warnings.warn() in library code if the issue is avoidable and the
  client application should be modified to eliminate the warning
- logging.warning() if there is nothing the client application can
  do about the situation, but the event should still be noted

There are unambiguous ways for the user to specify the color (see the
message). Here, the user should take action and switch from using *c*
to using *color*. Thus warnings.warn() is the way to go.

Closes half of matplotlib#23422.
timhoffm added a commit to timhoffm/matplotlib that referenced this issue Nov 13, 2022
 c)

When to use what:

- warnings.warn() in library code if the issue is avoidable and the
  client application should be modified to eliminate the warning
- logging.warning() if there is nothing the client application can
  do about the situation, but the event should still be noted

There are unambiguous ways for the user to specify the color (see the
message). Here, the user should take action and switch from using *c*
to using *color*. Thus warnings.warn() is the way to go.

Closes half of matplotlib#23422.
melissawm pushed a commit to melissawm/matplotlib that referenced this issue Dec 19, 2022
 c)

When to use what:

- warnings.warn() in library code if the issue is avoidable and the
  client application should be modified to eliminate the warning
- logging.warning() if there is nothing the client application can
  do about the situation, but the event should still be noted

There are unambiguous ways for the user to specify the color (see the
message). Here, the user should take action and switch from using *c*
to using *color*. Thus warnings.warn() is the way to go.

Closes half of matplotlib#23422.
@pedrompecanha
Copy link
Contributor

Hello everyone! I see that the second example was marked was done. What about the first one? Could it still be changed to a warning? Thanks!

@timhoffm
Copy link
Member

If we really think that 99.9% of the users who pass plot(['1', '2', '3']) mean for that to be a float array, such that we are willing to issue a warning to the 0.1% who mean for those to be categories, maybe we should go back to the original idea of just converting the array to float (or datetime64) and providing a mechanism to force the units to be categorical.

No. We don't want to support sloppy programming. If the user wants numbers, they should provide numbers. The warning has the purpose to make the user aware that they should likely convert their data to numbers.

There are exactly two possible cases
a) The user wanted numbers but accidentally passed strings - in this case, we should warnings.warn
b) The user actually wanted numeric categorical labels - in this case we should do nothing

Unfortunately, we cannot infer the user intention reliably. This leaves us with three options

  1. Change to warnings.warn under the assumtion that a) is much more likely, and accepting a small false-positive rate of warning for b)
  2. keep logging.warning with the argument that the case b) users do not feel obliged to act on it (logging is only for information, whereas warnings imply a request for change of user code)
  3. Do not warn at all because we fundamentally don't want a false-positive warnings for b)

Personally, I'm +0.75 for 1)

@pedrompecanha
Copy link
Contributor

pedrompecanha commented Oct 12, 2023

Hey! So, as it has been more than half a month and no additional input was given, I'm going to follow your suggestion number 1. I'll create the pull request implementing it, and, if any new input about it is given, I can change it right away.

edit: time interval was wrong

@jklymak
Copy link
Member

jklymak commented Oct 12, 2023

I don't think we should warn on potentially valid input.

@anntzer
Copy link
Contributor

anntzer commented Oct 12, 2023

I emphatically agree that potentially valid input should never warn.

@mwaskom
Copy link
Author

mwaskom commented Oct 13, 2023

What is that logging call supposed to achieve that is different from “warning” then?

@jklymak
Copy link
Member

jklymak commented Oct 13, 2023

Logging is opt-in and most people leave it off. If someone has a mysterious bug they can turn it on and maybe get some info. A warning would need to be opt-out.

@mwaskom
Copy link
Author

mwaskom commented Oct 13, 2023

To circle back to the beginning of the thread, Python logging is “opt in” in a very crude sense and a line of code somewhere in your dependency tree can opt you into matplotlib logging without you wanting that (how I ran into this).

This still just feels like the wrong mechanism for this kind of information but if you are against it being a warning (I’m fine with that) and still think it adds value (I agree this is a common gotcha, but I am skeptical anyone has thought to debug it by saying “I know, I will turn on matplotlib logging”) we can go ahead and close the thread.

@timhoffm
Copy link
Member

timhoffm commented Oct 14, 2023

I think we all agree that

  • if there was only the unintended strings case, an immediate warning (I.e. warnings.warn) is more helpful than an opt-in (logging).
  • opt-in warnings will only be seen by a very small amount of users.
  • valid input should never warn.

The decision here is a trade-off. The intended all string numbers case should be quite rare. It's a judgement call: How likely is the valid use case vs. the mis-use? 1%, 0.1%? And is there a limit where we say it is acceptable to show a false positive warning in rare cases to help with a common pitfall in the large majority of the cases.

@mwaskom
Copy link
Author

mwaskom commented Oct 14, 2023

To be clear, in the seaborn usecase, it is 100% intentional to be sending in string-typed numbers.

@timhoffm
Copy link
Member

@mwaskom Do I understand you correctly? You intentionally send string-tryped numbers. But you still want us to warnings.warn on them? Do you plan to filter the warning?

@mwaskom
Copy link
Author

mwaskom commented Oct 17, 2023

Right, it would be easier for me overall if there were just no feedback here, though I am sympathetic about this being a common gotcha. But whereas with warnings.catch_warnings/warnings.simplefilter there is an established pattern for ignoring "expected" warnings, I am not aware of an analogous pattern for ignoring logs. That's not to say it's impossible, but something feels wrong about one library manipulating the warning settings of another library.

Maybe I am just not used to normal patterns because it's pretty unusual for libraries in the scientific Python ecosystem to use logging; logging feels like something that is usually done at the application layer, not the library layer. But that goes back to why it feels like matplotlib is doing something odd here.

@mwaskom
Copy link
Author

mwaskom commented Oct 17, 2023

As a maybe orthogonal conversation, maybe axis.update_units could have a "verbosity" parameter that sets some state to prevent messages (logging or warning) of this kind. Would need some additional thinking though.

@story645
Copy link
Member

story645 commented Oct 17, 2023

I want to circle back to #23422 (comment) cause I think a CustomWarning could work well here. It'd provide an easy hook for filtering and let us name it in a way where it's clear that this is a "PossibleWarning"

And also categoricals have been in the library since 2016 😓 so half/half on warning.

ETA: also this is now explicitly documented, so I think there's probably less need to warn:

@jklymak
Copy link
Member

jklymak commented Oct 17, 2023

but something feels wrong about one library manipulating the warning settings of another library.

Matplotlib is not manipulating any other library's settings. I'm not sure what you mean by this? In normal usage of Matplotlib there are no logging messages. If you turn logging on, then yes, you will see our messages, but it's hard to see why that is something we are doing that is incorrect.

@ksunden
Copy link
Member

ksunden commented Oct 17, 2023

@mwaskom was speaking from the perspective of seaborn modifying matplotlibs log settings for the case where numeric strings are added intentionally to avoid warnings from being printed. (though such warnings only appear with non-default settings, so unclear to me if that is necessary)

@mwaskom
Copy link
Author

mwaskom commented Oct 17, 2023

was speaking from the perspective of seaborn modifying matplotlibs log settings for the case where numeric strings are added intentionally to avoid warnings from being printed.

that's right, thanks

(though such warnings only appear with non-default settings, so unclear to me if that is necessary)

i am raising this both as the maintainer of seaborn and as a user who sometimes works in a codebase where a transitive dependency does logging.basicConfig(level=INFO), which is why i end up seeing these messages (i do think it's telling that you call them "warnings" 🙂). so this is an example of a case where i did not personally opt-into seeing the logs, and i am also not (directly) doing the thing that is triggering them, resulting in a bunch of confusing and ultimately meaningless warnings cluttering my notebook. and even though i'm a knowledgable user of matplotlib and also wrote the relevant seaborn code, tracking down the source of them was quite difficult because (unlike with a warning) the log was not associated with any specific call into matplotlib. which, all together, feels like bad ux.

@jklymak
Copy link
Member

jklymak commented Oct 17, 2023

who sometimes works in a codebase where a transitive dependency does logging.basicConfig(level=INFO)

Libraries should never do that - I think your major complaint is with that library, not Matplotlib. Users who set the logging level should expect to see a bunch of messages. However, libraries should never set the logging level for users.

If, as a library maintainer, you really want to suppress our logging message, you can temporarily reset the logging level when you call us:

mlog = logging.get_logger('matplotlib')
level = mlog.getEffectiveLevel()
mlog.setLevel(logging.CRITICAL)
ax.plot_matplotlibthing()
mlog.setLevel(level)

If you are doing this a lot, you could easily write a context manager. Its a bit of a pain for you as a maintainer, however, in my opinion it is much more burdensome for end-users to have to filter warnings for valid code than for a maintainer to suppress log messages they don't want.

@mwaskom
Copy link
Author

mwaskom commented Oct 17, 2023

It feels like there are a whole additional set of arguments here about why the logging approach is unusual and provides bad ux that you’re not engaging with.

@story645
Copy link
Member

why the logging approach is unusual and provides bad ux

Since this is getting completely lost, gonna again bring up your I think very good suggestion to make a CustomWarning. #23422 (comment)

I'm thinking here of warnings as analogous to sphinx admonitions- the purpose of either is to bring attention to important information. Just like how we use different types of admonitions all over the place to flag things, we could use a "FlagThings" warning to bring attention to this.

@jklymak
Copy link
Member

jklymak commented Oct 17, 2023

It feels like there are a whole additional set of arguments here about why the logging approach is unusual and provides bad ux that you’re not engaging with.

I think that's true, because the basic argument is that we are spamming your console. But we are not, your dependency is.

Matplotlib has always had logging information being spat out if you set a level of verbosity. We turned those messages into standard logging calls quite a few years ago because it is a standard library with standard ways of controlling the level of output. If the argument is general, in that we should never use logging to provide extra debugging info to users, then we would have to go back to conditional print statements, which seems a huge step backwards, or not provide extra debugging info.

If the argument is specific to this situation, as @timhoffm says it's a judgement call if we warn for every time a user does this, or if we just log. I personally don't think we should warn users on valid inputs. I guess we could argue about the level of the message, but INFO is more verbose than WARNING.

I think very good suggestion to make a CustomWarning

This is still a warning that users will have to explicitly filter even if they are using the library properly.

The logging tutorial suggests that warnings.warn should be used "in library code if the issue is avoidable and the client application should be modified to eliminate the warning". That doesn't apply to this case if the user means to do plot(["1", "3", "2"], [1, 2, 3]) - there is no code the user could change to eliminate the warning, no matter what it is called.

@story645
Copy link
Member

This is still a warning that users will have to explicitly filter even if they are using the library properly.

Yes, but at least it's a warning we can name "ValidButRareUseCase" so that users who intend this use case like @mwaskom can easily filter it but it's visible to its intended audience.

Granted I also fully support just getting rid of it b/c we document this pretty well now & someone seeing this warning will have to Google the same behavior as someone who doesn't.

Which going back to the start of this thread and @mwaskom's point - it's questionable whether logging helps anyone. This is a rough guess based on my behavior, but I think most folks are more likely to Google strange behavior than to think to check logs they may not even know about.

And practically speaking, if a user comes to us w/ this issue we're unlikely to ask them to check their logs b/c we recognize what's triggering this odd behavior and will directly point them at the docs for this (w/ maybe a 'can you tell us the dtype?" first).

@mwaskom
Copy link
Author

mwaskom commented Oct 17, 2023

I think that's true, because the basic argument is that we are spamming your console. But we are not, your dependency is.

To be pedantic, you're spamming my notebook. The distinction actually matters because notebooks with matplotlib plots are often (in a data science context) intended as communicative documents so there is extra cost to pointless content in stderr.

Beyond that, I've made a number of other points about why logging is less effective for debugging (from my OP):

  • The feedback provides very little information about the source of the problem, you just get some text in stderr.
  • Related, you can convert a warning to an error and then drop into a debugger to figure out what's happening; I don't think that's the case with logging?
  • Warnings can be intercepted by third party libraries that are doing something on purpose (i.e., passing numbers-as-strings to use a categorical axis), whereas logs are harder to muffle without intervening.

Plus (in most cases) you need to know to switch on INFO mode logging, which is an unusual pattern in scientific Python — logging isn't to my knowledge used in the other core libraries.

Its a bit of a pain for you as a maintainer, however, in my opinion it is much more burdensome for end-users to have to filter warnings for valid code than for a maintainer to suppress log messages they don't want.

And then circling back here, I don't really think this distinction is relevant. Set aside my poorly-behaved dependency opting me into matplotlib logging. If I, as a user, wanted useful information from matplotlib and activated its logging, I'd still need the third party libraries like seaborn to do the hacky manipulation of matplotlib's log level, otherwise I would be getting uninformative "warnings" for valid code.

@rcomer
Copy link
Member

rcomer commented Oct 17, 2023

it's questionable whether logging helps anyone. This is a rough guess based on my behavior, but I think most folks are more likely to Google strange behavior than to think to check logs they may not even know about.

I didn’t know we did logging until I reviewed #26802

@story645
Copy link
Member

story645 commented Oct 17, 2023

Adding on to the argument that I think at least me and @mwaskom are making that warning is:

  • more likely to reach the intended audience then logs
  • easier for users and library maintainers to filter out if they intended that behavior

and my argument that

  • CustomWarning lets us better communicate why we're bringing this to the users attention.

I think this concern:

there is no code the user could change to eliminate the warning, no matter what it is called.

Can be addressed by adding "to not see this message, cast to a numerical type or add import warning; warning.simplefilter("CustomWarning: you're doing a valid thing") in the WarningMessage, just like we add 'use alt' to deprecation messages.

ETA:

I’m not aware of anything similar for logging?

If you can catch the handler for the matplotlib logger, you may be able to add a filter https://stackoverflow.com/a/17276457/1267531 but that will I think suppress it for folks who are using seaborn and may run into this in their non seaborn code.

@mwaskom
Copy link
Author

mwaskom commented Oct 17, 2023

I'm not exactly sure how it would work, but it also seems possible to have matplotlib's "maybe warnings" suppressed by default and able to be toggled on with a plt.verbose_warnings() (or whatever name) switch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants