New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scrapy logging show UnicodeDecodeError #1270
Comments
Hi, thanks for reporting this issue! Seems like the new logging system isn't handling well some encoding, though I'm not sure how could this issue be reproduced without breaking the old log system too. Could you show us the code from your spider (and maybe your project settings if you have redefined some of them) so we can try to reproduce it? |
yes, I upload my project to Dropbox,you can download it , |
I was able to emulate a similar issue touching some settings, I couldn't reproduce it otherwise in the attached project. Problem here seems unrelated to the system locale. Both log calls, the one in utils/log.py and the one in core/engine.py, try to log an unicode message. By doing that all attributes in the log record are decoded to unicode using the ascii encoding, and that breaks with the reported traceback if there are any non-ascii characters. The values in the broken line are roughly: >>> _fmt = '%(asctime)s [%(name)s] %(levelname)s: %(message)s'
>>> record = {
'asctime': '2015-06-02 12:17:01+0000',
'created': 1433258457.911135,
'exc_info': None,
'exc_text': None,
'filename': 'log.py',
'funcName': 'log_scrapy_info',
'levelname': 'INFO',
'levelno': 20,
'lineno': 108,
'module': log,
'msecs': 911.1349582672119,
'message': u'Scrapy 1.1.0rc1 started (bot: imzx)',
'name': 'scrapy',
'pathname': 'scrapy/utils/log.py',
'process': 19931,
'processName': 'MainProcess',
'relativeCreated': 66988981.72998428,
'thread': 139932245554944,
'threadName': 'MainThread'}
}
>>> s = _fmt % record Since Problem is, I don't know how any of those could hold non-ascii characters. The default Something that I tested to trigger these errors was to install scrapy in a path with non-ascii characters, and then change the default format with the LOG_FORMAT setting to @binbibi, can I ask you to run the following commands (notice the leading >>> import logging
>>> logging.basicConfig(format='%(asctime)s [%(name)s] %(levelname)s: %(message)s')
>>> logging.warning(u'First log message') >>> import logging
>>> from scrapy.utils.log import configure_logging
>>> configure_logging({})
>>> logger = logging.getLogger('scrapy')
>>> logger.debug(u'Second message') |
second command error again,I run the command in ipython,you can see the result: In [1]: import logging
In [2]: logging.basicConfig(format='%(asctime)s [%(name)s] %(levelname)s: %(message)s')
In [3]: logging.warning(u'First log message')
2015-06-03 12:40:35,532 [root] WARNING: First log message
In [4]: import logging
In [5]: from scrapy.utils.log import configure_logging
In [6]: configure_logging({})
In [7]: logger = logging.getLogger('scrapy')
In [8]: logger.debug(u'Second message')
2015-06-03 12:41:34,206 [scrapy] DEBUG: Second message
Traceback (most recent call last):
File "C:\Python27\lib\logging\__init__.py", line 859, in emit
msg = self.format(record)
File "C:\Python27\lib\logging\__init__.py", line 732, in format
return fmt.format(record)
File "C:\Python27\lib\logging\__init__.py", line 474, in format
s = self._fmt % record.__dict__
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 19: ordinal not in range(128)
Logged from file <ipython-input-8-86121427498f>, line 1 |
I really appreciate your help! Guess it truly is an issue concerning Scrapy alone, I'd really like to inspect that >>> from scrapy.settings.default_settings import LOG_FORMAT
>>> LOG_FORMAT.decode('ascii') |
I debug the code , __author__ = 'Binbibi'
import logging
from scrapy.utils.log import configure_logging
configure_logging({})
logger = logging.getLogger('scrapy')
logger.debug(u'Second message') you are write ,the LOG_DATEFORMAT cause error,the timeformat "%Y-%m-%d %H:%M:%S%z" The bug is happen when you add "%z",python logging use formatTime formatTime to handle time,but def formatTime(self, record, datefmt=None):
ct = self.converter(record.created)
if datefmt:
s = time.strftime(datefmt, ct)
else:
t = time.strftime("%Y-%m-%d %H:%M:%S", ct)
s = "%s,%03d" % (t, record.msecs)
return s when scrapy use "%Y-%m-%d %H:%M:%S%z" timeformat,the %Z is Time zone name,I come from s = time.strftime(datefmt, ct).decode('gbk') Then I find everything is Ok,you can see the output: 2015-06-04 01:26:02中国标准时间 [scrapy] DEBUG: Second message and my project imzx also work fine without logging trackback,so I think ,we can simply delete the "%z" in LOG_DATEFORMAT so scrapy use the default logging.py timeformat "%Y-%m-%d %H:%M:%S",otherwise ,hack the logging source code,I'm not sure I'm right,so you can try test again,just chinese development environment,I think you can reproduce it ; |
Ohh, the timezone was the catch! Thanks a lot for debugging it, I couldn't reproduce it in the linux I'm using since the timezone name is stored in english regardless system language (I got 'CST' there even with chinese locale), seems it's something platform specific. I still have a doubt, why is '%z' being formatted as '%Z', or was that an example? The first one should return the offset from UTC (something like '+0800') and the second one the timezone name (something like '中国标准时间' or 'CST'). I would have hoped that '%z' contained only ascii characters. Running In any case, I think we should remove '%z' if it brings this kind of issues, seems like if was deprecated too: https://docs.python.org/2/library/time.html#id2. I'll create a pull request so we can review that change. |
thanks for the details about "%z" and "%Z",In my VM Ubuntu,It work the way as you point,But in windows,"%z" Is equivalent to "%Z",you can see the result: In [28]: import time
In [29]: print time.strftime('%z').decode('gbk')
中国标准时间
In [30]: print time.strftime('%Z').decode('gbk')
中国标准时间 I don't know why |
Windows seems to implement it differently, probably it just fallbacks to Thanks again @binbibi for such dedicated report, you'll see the fix for it in the next release candidate :) |
hello,all, I use scrapy 1.00rc1 from pip ,my pc is win7x64,python is 2.7.9(x64),
when I start my project
Then I find utils/log.py ,line 108 ,I commented-out the code as you can see
And I restart my project,it's OK ,but something about logging error still:
It's to much in my screen,though too many logging Traceback , my code work fine ,I get all data I need, But I confused,my code work fine in scrapy 0.24 without any above logging Traceback,so I think it's maybe scrapy bug
The text was updated successfully, but these errors were encountered: