Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OffsiteMiddleware: add 2 stats counters #566

Merged
merged 2 commits into from Jan 31, 2014
Merged

Conversation

@redapple
Copy link
Contributor

@redapple redapple commented Jan 28, 2014

Similarly to #553, I'm proposing adding 2 counters for OffsiteMiddleware:

  • number of filtered requests (stats key offsite/filtered)
  • number of filtered domains (stats key offsite/domains)
spider.crawler.stats
self.stats_enabled = True
# happens when running tests: spider not bound to crawler
except:

This comment has been minimized.

@kmike

kmike Jan 28, 2014
Member

Can we use a more specific exception here? At least Exception - using bare except: is rarely a good idea because it catches KeyboardInterrupt and SystemExit.

This comment has been minimized.

@kmike

kmike Jan 28, 2014
Member

I haven't looked at this in details, but maybe move stats enabling to from_crawler method?

This comment has been minimized.

@redapple

redapple Jan 28, 2014
Author Contributor

Running the tests without it raises an AssertionError

Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 323, in run
    self.setUp()
  File "/home/paul/dev/scrapy-dev/scrapy/tests/test_spidermiddleware_offsite.py", line 13, in setUp
    self.mw.spider_opened(self.spider)
  File "/home/paul/dev/scrapy-dev/scrapy/contrib/spidermiddleware/offsite.py", line 67, in spider_opened
    spider.crawler.stats
  File "/home/paul/dev/scrapy-dev/scrapy/spider.py", line 41, in crawler
    assert hasattr(self, '_crawler'), "Spider not bounded to any crawler"
exceptions.AssertionError: Spider not bounded to any crawler

This comment has been minimized.

@kmike

kmike Jan 28, 2014
Member

But it seems we can avoid this by enabling stats only when spider is created with a crawler.

This comment has been minimized.

@redapple

redapple Jan 28, 2014
Author Contributor

good point

@kmike
Copy link
Member

@kmike kmike commented Jan 28, 2014

Looks good to me.

@@ -13,9 +13,12 @@

class OffsiteMiddleware(object):

def __init__(self, stats=False):
self.stats_enabled = stats

This comment has been minimized.

@dangra

dangra Jan 28, 2014
Member

Let collect stats by default, and no need for this flag.

This comment has been minimized.

@redapple

redapple Jan 28, 2014
Author Contributor

stats on by default make this test fail:

===============================================================================
[FAIL]
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 332, in run
    testMethod()
  File "/home/paul/dev/scrapy-dev/scrapy/tests/test_spidermiddleware_offsite.py", line 33, in test_process_spider_output
    out = list(self.mw.process_spider_output(res, reqs, self.spider))
  File "/home/paul/dev/scrapy-dev/scrapy/contrib/spidermiddleware/offsite.py", line 38, in process_spider_output
    spider.crawler.stats.inc_value('offsite/domains',
  File "/home/paul/dev/scrapy-dev/scrapy/spider.py", line 41, in crawler
    assert hasattr(self, '_crawler'), "Spider not bounded to any crawler"
exceptions.AssertionError: Spider not bounded to any crawler

scrapy.tests.test_spidermiddleware_offsite.TestOffsiteMiddleware.test_process_spider_output
-------------------------------------------------------------------------------

@dangra
Copy link
Member

@dangra dangra commented Jan 29, 2014

it's caused by the lack of formality initializing components in our test cases, using spider.crawler in the middleware is legit, but there is also another option to instantiate the middleware with .from_crawler().

in either case, the test case has to be updated to initialize a crawler.

@redapple
Copy link
Contributor Author

@redapple redapple commented Jan 29, 2014

a lot of failed tests due to reordering of expected results.
@dangra , @kmike , any idea why? are the test too strict?
or did I miss something big?

Let's see after merging master in...

@dangra
Copy link
Member

@dangra dangra commented Jan 31, 2014

waiting on travis to merge this.

dangra added a commit that referenced this pull request Jan 31, 2014
OffsiteMiddleware: add 2 stats counters
@dangra dangra merged commit e1c6d3f into scrapy:master Jan 31, 2014
1 check passed
1 check passed
default The Travis CI build passed
Details
@redapple redapple deleted the redapple:offsite-stats branch Jul 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants