-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logging - Inconsistent behaviour when handling unicode #81292
Comments
Python is inconsistent in how it handles errors that have some unicode characters. It works to screen but fails to log This works:
The following breaks:
This raises a unicode error: Python 3.6.3 Given that the file created by the logger is utf-8, it's unclear why it doesn't work. I found a workaround by using a Handler, but surely the loggers should all work the same way so that people don't get unpleasant surprises that even more painful to debug when things only break in certain logging modes? |
I don't think this is true. Default file handler created using filename argument is opened with encoding None (meaning that it relies on system-default encoding, which is on Windows is not utf-8). And it also explains why your work-around works, provided you've opened file handler with explicit utf-8 encoding. |
It definitely claims to be "utf-8" in NotePad++. I've attached it if you want to double-check. (Windows 7) |
Loggers don't create files - handlers do. The file that you attached seems to be just a text file containing ASCII text. Any ASCII is also utf-8, so Notepad++'s assertion doesn't mean anything in this situation.
People have been using logging, on Windows, without problems, for years, often using utf-8 to encode their log files. Perhaps you need to read the documentation and look at examples more carefully - I'm not intending to be rude, but say that because using a handler isn't some kind of workaround, and you thinking that it was indicates that you're not sufficiently familiar with the documentation. I'd suggest posting questions on Stack Overflow or the python-list mailing list, to establish whether there really seems to be a Python bug, before actually logging a Python issue. I appreciate that you're trying to improve Python. |
Thank you for your comments but this wasn't a question and I maintain this is a bug, or at least undesirable behaviour. I'd consider it a bug in my own software. Reasoning:
Per this page: https://docs.python.org/3/howto/unicode.html
Doing something as basic as logging unicode shouldn't require knowledge of "handlers" - that's failing "simple is better than complex". |
I reiterate my statement that it appears that you aren't sufficiently familiar with how logging is designed to work. Logging is more complex than print() or open() because it has to work flexibly in lots of different scenarios. Handlers are part of this, as are Formatters and Filters. See my 2009 blog post https://plumberjack.blogspot.com/2009/09/python-logging-101.html for more information for why logging is designed the way it is. If you want to configure a stream with utf-8 encoding, without explicitly using handlers, this is easily doable, as follows: C:\Users\Vinay> \python37\python
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logging.basicConfig(stream=open(r'c:\temp\my_log.log', 'w', encoding='utf-8'))
>>> logging.error('จุด1')
>>> ^Z and, lo and behold, the file c:\temp\my_log.log contains ERROR:root:จุด1 The use of the stream keyword parameter is clearly documented at https://docs.python.org/3/library/logging.html#logging.basicConfig Did you look at the basicConfig documentation before raising this issue? |
This seems like an attempt at victim blaming. But yes, I did. In fact, this is now the third time I've looked at that page - once before raising this issue, once before my previous reply, and now. I note that your example and nothing like your example is anywhere on that page. The word "encoding" doesn't appear anywhere on the page either. Sure "stream" is on there, but then you need to know about streams and make the association with logging which I apparently don't. You have to remember not everyone has your level of proficiency in the language. In fact, most Python users don't. Lets put this another way - is there a reason NOT to have Unicode logging as the default? Clearly Unicode was important enough for Guido-et-al to decide to throw Python 2 under the bus. |
Er, no, it was a straight question about whether you'd read the documentation.
Er, no, it was stating that lots of people have successfully used logging on Windows with utf-8, as a way of justifying my opinion that there's no deficiency here that needs addressing.
Sure, and that's fine. Experienced developers learn stuff every day too. But responding to issues takes time (notice how many people will have been notified about this issue from the "nosy list" above; each would have to spend at least some time looking at it), so raising issues should be done only after exhausting other resources to make sure that the problem isn't in one's own level of knowledge or understanding, but an actual problem with the software. This is why I earlier suggested Stack Overflow and python-list as useful resources for improving one's knowledge and understanding.
I have no idea what you mean by this. As my example illustrated, it's quite easy to log Unicode in log files. A simple Internet search for "basicConfig encoding" yields for me as the second result this Stack Overflow question https://stackoverflow.com/questions/21402530 for which the accepted answer is a slight variation on what I suggested earlier. |
I don't see how I can be clearer. What are the reasons for NOT implementing logging to file be unicode as a default? Logging to screen is unicode as a default. What are the reasons for not wanting consistency?
Indeed, and it was from that question I got my solution in fact. The problem was the 30-60 minutes I wasted before that trying to figure out why my program was crashing and why it was only crashing *sometimes*. I'd written the logging part of the program a year ago and not really touched it since, so the logging module being a possible culprit was not even in my mind when the program crashed.
Yes, things are easy when you know it's necessary. It's the process of discovery that's an unnecessary waste of people's time. That's why I raised this and that's why I would consider this a bug in my own software. It's inconsistent, it invites problems, and it wastes peoples time. |
Learning is not a waste of time. You're entitled to your opinion, but this is not a bug in logging. We'll have to agree to disagree. |
I agree and value learning a great deal. However learning should happen on your own time, NOT when a program crashes randomly and tries taking you down the rabbit hole. I like learning but not about unrelated things when I'm trying to do useful work. Fine, if you don't consider this a bug, consider it a feature request. "User would like Python logging of Unicode characters to be consistent" is not an unreasonable request. |
Different people have different ideas of what a default should be. You can pretty much guarantee that whatever a default is, someone will think it should be something else. The basicConfig functionality has been around in its present form since 2003, and people in general haven't had a problem with the current defaults. There is no case for changing a default unless there is a consensus among lots of people that a default needs changing. |
It is consistent. The encoding of logging.basicConfig(filename='c:\\my_log.log') is consistent with the encoding of open('c:\\my_log.log') Both use the system default encoding, which is not UTF-8. There is some discussion going on right now, as to whether it *should* be, but it isn't, and I wouldn't consider changing the behaviour of loging *without* changing the behaviour of open() to be consistent. Logging to the console is consistent with the standard IO streams, and it was PEP-528 (https://www.python.org/dev/peps/pep-0528/) that made that change - before that, the standard IO streams, and logging to the console, used the console codepage. So, while I agree that the behaviour takes a bit of work to understand, it's not (IMO) inconsistent, nor is it (IMO) a bug. |
I agree with Paul. Consistency with open() is the most important aspect here, and when we investigated changing that to UTF-8 it was decided to be too disruptive. That said, I'm not opposed to adding encoding/errors to basicConfig if someone wants to do that (and the logging specialists are okay with it). |
Hmm,, about encoding, I agree that default encoding of open() should be used. On the other hand, couldn't we use different default error handler? |
I've left it as is using the same rationale as I guess open() has at the moment - "Errors should never pass silently. Unless explicitly silenced". |
But open() is general purpose, and logging is for logging. Note that stderr uses 'backslashreplace', even when PYTHONIOENCODING=ascii:strict is used. And if we don't log something due to encoding error, doesn't it mean "errors may ignored completely silently by default"? |
Hm, it seems logging does best job, show enough information to stderr. But sometime, Python is embedded in web servers and there is no stderr. pythonw.exe may not have stderr too.... I still feel backslashreplace is better default error handler. But it's not strong opinion and it shouldn't block this issue. Go ahead. $ cat x.py
import logging logging.basicConfig(filename="mylog.txt")
msg = "hello, はろー"
logging.error(msg) $ LC_ALL=en_US.US-ASCII python3 x.py
--- Logging error Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/logging/__init__.py", line 1037, in emit
stream.write(msg + self.terminator)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128)
Call stack:
File "x.py", line 5, in <module>
logging.error(msg)
Message: 'hello, \u306f\u308d\u30fc'
Arguments: () |
I think Vinay gets to make the call, but I also agree that some sort of error handling by default makes sense for logging. Especially now that it's (about to be) easier to override the default. FWIW, I'd leave the default arguments as None and just coerce it as late as possible, before TextIOWrapper does it. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: