Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove sklearn logger default StreamHandler #16451

Merged
merged 5 commits into from May 4, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/whats_new/v0.23.rst
Expand Up @@ -187,6 +187,10 @@ Changelog
`ValueError` for arguments `n_classes < 1` OR `length < 1`.
:pr:`16006` by :user:`Rushabh Vasani <rushabh-v>`.

- |API| The `StreamHandler` was removed from `sklearn.logger`. To continue to
receive logging messages from :mod:`sklearn.datasets`, a handler should
be added to `sklearn.logger`. :pr:`16451` by :user:`Christoph Deil <cdeil>`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the statement as-is in the changelog isn't 100% correct. In many cases the last resort handler is in place and will log messages by default to the console. If the user wants to change logging, they can add handlers to the sklear.logger, but e.g. also the root logger.

Maybe not try to explain how logging handlers work via the what's new entry, and to just explain why the handler was removed?

Suggested change
- |API| The `StreamHandler` was removed from `sklearn.logger`. To continue to
receive logging messages from :mod:`sklearn.datasets`, a handler should
be added to `sklearn.logger`. :pr:`16451` by :user:`Christoph Deil <cdeil>`.
- |API| The `StreamHandler` was removed from `sklearn.logger` to avoid
double logging of messages in common cases where a hander is attached
to the root logger, and to follow the Python logging documentation
recommendation for libraries to leave the log message handling to
users and application code. :pr:`16451` by :user:`Christoph Deil <cdeil>`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the statement as-is in the changelog isn't 100% correct. In many cases the last resort handler is in place and will log messages by default to the console.

The last resort handler will log warnings to stderr. The datasets module logs with info and debug levels, and nothing is shown on the console.

Although, I am okay with updating the whats new to your suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

from sklearn.datasets import fetch_kddcup99
fetch_kddcup99(data_home="data")

will print Downloading https://ndownloader.figshare.com/files/5976042 on master, and nothing with this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So here the sklearn.datasets._kddcup99 logger is created, it has the default level of warning and thus the info message isn't shown.

You could add a logger.setLevel(logging.INFO) there if you want it to appear by default for users.

My understanding is that messing with handlers in libraries is bad (causes confusion and double log messages for users), but messing with log levels is OK in libraries, to get the default behaviour you want.

Another option could be to use the sklearn.logger there to emit the message, but then I think you have to be very careful to avoid circular imports, probably is tricky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lastresort handler will still filter everything out.

The only want to get the logging messages to appear is to change the logger to logger.warning.

Anyways, I am okay with removing the messages so we can be a better citizen of using logging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm that seems fine as info and debug seem to be the correct logging levels. We could set the level to 'warning' if we think that downloading the data is unexpect, but 'info' seems right and so it shouldn't be printed by default.


:mod:`sklearn.decomposition`
............................

Expand Down
1 change: 0 additions & 1 deletion sklearn/__init__.py
Expand Up @@ -19,7 +19,6 @@
from ._config import get_config, set_config, config_context

logger = logging.getLogger(__name__)
logger.addHandler(logging.StreamHandler())
logger.setLevel(logging.INFO)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely we shouldn't be doing this either?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused what effect this has on the user now. Does setting the level on a logger that has no handler do anything useful?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like consensus :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we then remove this one @thomasjpfan ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, we should remove the setLevel as well, updating PR.



Expand Down