-
Notifications
You must be signed in to change notification settings - Fork 77
Conversation
Counts, and displays on closing: deltafetch/skipped deltafetch/stored
@@ -36,7 +36,7 @@ class DeltaFetch(object): | |||
|
|||
""" | |||
|
|||
def __init__(self, dir, reset=False): | |||
def __init__(self, dir, stats=None, reset=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it could happen in practice, but for those subclassing DeltaFetch
and using positional arguments, non-None
stats could trigger a reset=True
(although passing 3 arguments could break their init before that)
For backward compatibility, can you pass stats
after reset=False
?
Wow, sorry for reviewing so late... |
Hey @redapple! Thanks for the comment and good catch! This last commit should be backwards compatible. P |
Looks good to me. |
lgtm. what if we move this extension under scrapy-plugins/scrapy-deltafetch ? scrapylib is kind of death now. |
I don't mind merging but it won't be available any time soon in SH servers if that is expected. |
ok @dangra . But that won't change the fate of scrapy-plugins/scrapy-deltafetch though, or? |
No, it won't. scrapy-plugins/scrapy-deltafetch is the way to go |
thanks @pallih ! |
@@ -93,10 +94,12 @@ def process_spider_output(self, response, result, spider): | |||
key = self._get_key(r) | |||
if self.db.has_key(key): | |||
spider.log("Ignoring already visited: %s" % r, level=log.INFO) | |||
self.stats.inc_value('deltafetch/skipped', spider=spider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone overrides from_crawler
method - there's a chance that object is still initialized as:
o = cls(dir, reset)
which leaves self.stats = None
and this (and next changed) line will fail with AttributeError
. Code should either fail fast with meaningful error message if stats
are not passed or there should be check for stats value before incrementing stats:
if self.stats:
self.stats.inc_value('deltafetch/skipped', spider=spider)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct! thanks @chekunkov
@dangra , @pallih , @chekunkov , Version 1.0.0 is already on PyPI |
Counts, and displays on closing:
deltafetch/skipped
deltafetch/stored
Example:
2016-03-24 01:28:56 [scrapy] INFO: Dumping Scrapy stats:
{'deltafetch/skipped': 41,
'deltafetch/stored': 150,
'downloader/request_bytes': 50476,
'downloader/request_count': 153,
etc,