Skip to content
This repository has been archived by the owner on Jul 19, 2018. It is now read-only.

Count stats #72

Merged
merged 2 commits into from
Jun 14, 2016
Merged

Count stats #72

merged 2 commits into from
Jun 14, 2016

Conversation

pallih
Copy link
Contributor

@pallih pallih commented Mar 24, 2016

Counts, and displays on closing:

deltafetch/skipped
deltafetch/stored

Example:

2016-03-24 01:28:56 [scrapy] INFO: Dumping Scrapy stats:
{'deltafetch/skipped': 41,
'deltafetch/stored': 150,
'downloader/request_bytes': 50476,
'downloader/request_count': 153,
etc,

Counts, and displays on closing:

deltafetch/skipped
deltafetch/stored
@@ -36,7 +36,7 @@ class DeltaFetch(object):

"""

def __init__(self, dir, reset=False):
def __init__(self, dir, stats=None, reset=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it could happen in practice, but for those subclassing DeltaFetch and using positional arguments, non-None stats could trigger a reset=True (although passing 3 arguments could break their init before that)

For backward compatibility, can you pass stats after reset=False?

@redapple
Copy link
Contributor

Wow, sorry for reviewing so late...
I really like scrapy stats and DeltaFetch could really use them indeed.

@pallih
Copy link
Contributor Author

pallih commented Jun 14, 2016

Hey @redapple!

Thanks for the comment and good catch! This last commit should be backwards compatible.

P

@redapple
Copy link
Contributor

Looks good to me.
@dangra , @chekunkov ?

@dangra
Copy link
Contributor

dangra commented Jun 14, 2016

lgtm. what if we move this extension under scrapy-plugins/scrapy-deltafetch ? scrapylib is kind of death now.

@redapple
Copy link
Contributor

@dangra , good idea. Let's do that after the merge maybe? (so as not to complicate @pallih 's contribution)
I'll open an issue

@dangra
Copy link
Contributor

dangra commented Jun 14, 2016

I don't mind merging but it won't be available any time soon in SH servers if that is expected.

@redapple
Copy link
Contributor

ok @dangra . But that won't change the fate of scrapy-plugins/scrapy-deltafetch though, or?

@dangra
Copy link
Contributor

dangra commented Jun 14, 2016

No, it won't. scrapy-plugins/scrapy-deltafetch is the way to go

@redapple redapple merged commit f2129dc into scrapinghub:master Jun 14, 2016
@redapple
Copy link
Contributor

thanks @pallih !

@@ -93,10 +94,12 @@ def process_spider_output(self, response, result, spider):
key = self._get_key(r)
if self.db.has_key(key):
spider.log("Ignoring already visited: %s" % r, level=log.INFO)
self.stats.inc_value('deltafetch/skipped', spider=spider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @redapple, @pallih

If someone overrides from_crawler method - there's a chance that object is still initialized as:

o = cls(dir, reset)

which leaves self.stats = None and this (and next changed) line will fail with AttributeError. Code should either fail fast with meaningful error message if stats are not passed or there should be check for stats value before incrementing stats:

if self.stats:
    self.stats.inc_value('deltafetch/skipped', spider=spider)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct! thanks @chekunkov

@redapple
Copy link
Contributor

@dangra , @pallih , @chekunkov ,
This middleware now lives at https://github.com/scrapy-plugins/scrapy-deltafetch

Version 1.0.0 is already on PyPI

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants