Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Python logging #1060

Merged
merged 21 commits into from Apr 29, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
7a958f9
Replace scrapy.log calls for their equivalents in the logging std module
curita Feb 28, 2015
c2d7168
Use LogCapture in testfixtures package for tests
curita Feb 28, 2015
6f9b423
Restructure LogFormatter to comply with std logging calls
curita Feb 28, 2015
8baad55
New scrapy/utils/log.py file with basic log helpers
curita Feb 28, 2015
b75556e
Add a logging filter to mimic Twisted's log.err formating for Failures
curita Feb 28, 2015
ac40ef6
Custom handler to count log level occurrences in a crawler
curita Feb 28, 2015
1c8708e
Create a logger for every Spider and adapt Spider.log to log through it
curita Feb 28, 2015
ffd97f2
Set root handlers based on settings in configure_logging
curita Feb 28, 2015
6acb384
Stdout redirect in configure_logging
curita Feb 28, 2015
c174d78
Deprecate scrapy/log.py
curita Feb 28, 2015
21b9f37
Deprecate more frequently used functions from scrapy/log.py
curita Mar 4, 2015
ccdd8bf
Parametrize log formatting strings
curita Mar 5, 2015
d47a7ed
Update Logging topic on docs
curita Mar 10, 2015
4811d16
Update `logger` attr and `log` method in the Spiders topic on docs
curita Mar 10, 2015
bd0b639
Fix logging usage across docs
curita Mar 10, 2015
69a3d58
Basic example on manually configuring log handlers
curita Mar 19, 2015
4f54ca3
Change 'scrapy' logger for '__name__' on every module
curita Mar 19, 2015
6d12050
Add a filter to replace '__name__' loggers with 'scrapy'
curita Mar 19, 2015
7a92dae
Change Scrapy log output through docs
curita Mar 19, 2015
4858af4
Fix backward compatible functions in scrapy.log
curita Apr 22, 2015
1d8f822
Add backward compatibility to LogFormatter
curita Apr 22, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
30 changes: 0 additions & 30 deletions conftest.py
@@ -1,6 +1,5 @@
import six
import pytest
from twisted.python import log

collect_ignore = ["scrapy/stats.py", "scrapy/project.py"]

Expand All @@ -10,35 +9,6 @@
if len(file_path) > 0 and file_path[0] != '#':
collect_ignore.append(file_path)

class LogObservers:
"""Class for keeping track of log observers across test modules"""

def __init__(self):
self.observers = []

def add(self, logfile='test.log'):
fileobj = open(logfile, 'wb')
observer = log.FileLogObserver(fileobj)
log.startLoggingWithObserver(observer.emit, 0)
self.observers.append((fileobj, observer))

def remove(self):
fileobj, observer = self.observers.pop()
log.removeObserver(observer.emit)
fileobj.close()


@pytest.fixture(scope='module')
def log_observers():
return LogObservers()


@pytest.fixture()
def setlog(request, log_observers):
"""Attach test.log file observer to twisted log, for trial compatibility"""
log_observers.add()
request.addfinalizer(log_observers.remove)


@pytest.fixture()
def chdir(tmpdir):
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Expand Up @@ -119,7 +119,7 @@ Built-in services
topics/webservice

:doc:`topics/logging`
Understand the simple logging facility provided by Scrapy.
Learn how to use Python's builtin logging on Scrapy.

:doc:`topics/stats`
Collect statistics about your scraping crawler.
Expand Down
17 changes: 8 additions & 9 deletions docs/intro/tutorial.rst
Expand Up @@ -154,15 +154,14 @@ similar to this::
2014-01-23 18:13:07-0400 [scrapy] INFO: Enabled downloader middlewares: ...
2014-01-23 18:13:07-0400 [scrapy] INFO: Enabled spider middlewares: ...
2014-01-23 18:13:07-0400 [scrapy] INFO: Enabled item pipelines: ...
2014-01-23 18:13:07-0400 [dmoz] INFO: Spider opened
2014-01-23 18:13:08-0400 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
2014-01-23 18:13:09-0400 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
2014-01-23 18:13:09-0400 [dmoz] INFO: Closing spider (finished)
2014-01-23 18:13:07-0400 [scrapy] INFO: Spider opened
2014-01-23 18:13:08-0400 [scrapy] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
2014-01-23 18:13:09-0400 [scrapy] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
2014-01-23 18:13:09-0400 [scrapy] INFO: Closing spider (finished)


.. note::
The lines containing ``[dmoz]`` correspond to the log messages for
our spider. You can see a log line for each URL defined in ``start_urls``.
At the end you can see a log line for each URL defined in ``start_urls``.
Because these URLs are the starting ones, they have no referrers, which is
shown at the end of the log line, where it says ``(referer: None)``.

Expand Down Expand Up @@ -268,7 +267,7 @@ This is what the shell looks like::

[ ... Scrapy log here ... ]

2014-01-23 17:11:42-0400 [default] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
2014-01-23 17:11:42-0400 [scrapy] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
[s] Available Scrapy objects:
[s] crawler <scrapy.crawler.Crawler object at 0x3636b50>
[s] item {}
Expand Down Expand Up @@ -424,11 +423,11 @@ Spider would be like this::

Now crawling dmoz.org yields ``DmozItem`` objects::

[dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
[scrapy] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'desc': [u' - By David Mertz; Addison Wesley. Book in progress, full text, ASCII format. Asks for feedback. [author website, Gnosis Software, Inc.\n],
'link': [u'http://gnosis.cx/TPiP/'],
'title': [u'Text Processing in Python']}
[dmoz] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
[scrapy] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
{'desc': [u' - By Sean McGrath; Prentice Hall PTR, 2000, ISBN 0130211192, has CD-ROM. Methods to build XML applications fast, Python tutorial, DOM and SAX, new Pyxie open source XML processing library. [Prentice Hall PTR]\n'],
'link': [u'http://www.informit.com/store/product.aspx?isbn=0130211192'],
'title': [u'XML Processing with Python']}
Expand Down
30 changes: 15 additions & 15 deletions docs/topics/benchmarking.rst
Expand Up @@ -19,20 +19,20 @@ To run it use::
You should see an output like this::

2013-05-16 13:08:46-0300 [scrapy] INFO: Scrapy 0.17.0 started (bot: scrapybot)
2013-05-16 13:08:47-0300 [follow] INFO: Spider opened
2013-05-16 13:08:47-0300 [follow] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:48-0300 [follow] INFO: Crawled 74 pages (at 4440 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:49-0300 [follow] INFO: Crawled 143 pages (at 4140 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:50-0300 [follow] INFO: Crawled 210 pages (at 4020 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:51-0300 [follow] INFO: Crawled 274 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:52-0300 [follow] INFO: Crawled 343 pages (at 4140 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:53-0300 [follow] INFO: Crawled 410 pages (at 4020 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:54-0300 [follow] INFO: Crawled 474 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:55-0300 [follow] INFO: Crawled 538 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:56-0300 [follow] INFO: Crawled 602 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:57-0300 [follow] INFO: Closing spider (closespider_timeout)
2013-05-16 13:08:57-0300 [follow] INFO: Crawled 666 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:57-0300 [follow] INFO: Dumping Scrapy stats:
2013-05-16 13:08:47-0300 [scrapy] INFO: Spider opened
2013-05-16 13:08:47-0300 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:48-0300 [scrapy] INFO: Crawled 74 pages (at 4440 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:49-0300 [scrapy] INFO: Crawled 143 pages (at 4140 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:50-0300 [scrapy] INFO: Crawled 210 pages (at 4020 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:51-0300 [scrapy] INFO: Crawled 274 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:52-0300 [scrapy] INFO: Crawled 343 pages (at 4140 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:53-0300 [scrapy] INFO: Crawled 410 pages (at 4020 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:54-0300 [scrapy] INFO: Crawled 474 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:55-0300 [scrapy] INFO: Crawled 538 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:56-0300 [scrapy] INFO: Crawled 602 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:57-0300 [scrapy] INFO: Closing spider (closespider_timeout)
2013-05-16 13:08:57-0300 [scrapy] INFO: Crawled 666 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
2013-05-16 13:08:57-0300 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 231508,
'downloader/request_count': 682,
'downloader/request_method_count/GET': 682,
Expand All @@ -49,7 +49,7 @@ You should see an output like this::
'scheduler/enqueued': 12767,
'scheduler/enqueued/memory': 12767,
'start_time': datetime.datetime(2013, 5, 16, 16, 8, 47, 676539)}
2013-05-16 13:08:57-0300 [follow] INFO: Spider closed (closespider_timeout)
2013-05-16 13:08:57-0300 [scrapy] INFO: Spider closed (closespider_timeout)

That tells you that Scrapy is able to crawl about 3900 pages per minute in the
hardware where you run it. Note that this is a very simple spider intended to
Expand Down
5 changes: 1 addition & 4 deletions docs/topics/debug.rst
Expand Up @@ -132,16 +132,13 @@ Logging is another useful option for getting information about your spider run.
Although not as convenient, it comes with the advantage that the logs will be
available in all future runs should they be necessary again::

from scrapy import log

def parse_details(self, response):
item = response.meta.get('item', None)
if item:
# populate more `item` fields
return item
else:
self.log('No item received for %s' % response.url,
level=log.WARNING)
self.logger.warning('No item received for %s', response.url)

For more information, check the :ref:`topics-logging` section.

Expand Down
8 changes: 4 additions & 4 deletions docs/topics/downloader-middleware.rst
Expand Up @@ -232,14 +232,14 @@ header) and all cookies received in responses (ie. ``Set-Cookie`` header).

Here's an example of a log with :setting:`COOKIES_DEBUG` enabled::

2011-04-06 14:35:10-0300 [diningcity] INFO: Spider opened
2011-04-06 14:35:10-0300 [diningcity] DEBUG: Sending cookies to: <GET http://www.diningcity.com/netherlands/index.html>
2011-04-06 14:35:10-0300 [scrapy] INFO: Spider opened
2011-04-06 14:35:10-0300 [scrapy] DEBUG: Sending cookies to: <GET http://www.diningcity.com/netherlands/index.html>
Cookie: clientlanguage_nl=en_EN
2011-04-06 14:35:14-0300 [diningcity] DEBUG: Received cookies from: <200 http://www.diningcity.com/netherlands/index.html>
2011-04-06 14:35:14-0300 [scrapy] DEBUG: Received cookies from: <200 http://www.diningcity.com/netherlands/index.html>
Set-Cookie: JSESSIONID=B~FA4DC0C496C8762AE4F1A620EAB34F38; Path=/
Set-Cookie: ip_isocode=US
Set-Cookie: clientlanguage_nl=en_EN; Expires=Thu, 07-Apr-2011 21:21:34 GMT; Path=/
2011-04-06 14:49:50-0300 [diningcity] DEBUG: Crawled (200) <GET http://www.diningcity.com/netherlands/index.html> (referer: None)
2011-04-06 14:49:50-0300 [scrapy] DEBUG: Crawled (200) <GET http://www.diningcity.com/netherlands/index.html> (referer: None)
[...]


Expand Down
9 changes: 6 additions & 3 deletions docs/topics/extensions.rst
Expand Up @@ -102,9 +102,12 @@ number of items will be specified through the ``MYEXT_ITEMCOUNT`` setting.

Here is the code of such extension::

import logging
from scrapy import signals
from scrapy.exceptions import NotConfigured

logger = logging.getLogger(__name__)

class SpiderOpenCloseLogging(object):

def __init__(self, item_count):
Expand Down Expand Up @@ -133,15 +136,15 @@ Here is the code of such extension::
return ext

def spider_opened(self, spider):
spider.log("opened spider %s" % spider.name)
logger.info("opened spider %s", spider.name)

def spider_closed(self, spider):
spider.log("closed spider %s" % spider.name)
logger.info("closed spider %s", spider.name)

def item_scraped(self, item, spider):
self.items_scraped += 1
if self.items_scraped % self.item_count == 0:
spider.log("scraped %d items" % self.items_scraped)
logger.info("scraped %d items", self.items_scraped)


.. _topics-extensions-ref:
Expand Down