Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrapy hangs if an exception raises in start_requests #83

Closed
dangra opened this issue Jan 27, 2012 · 2 comments
Closed

Scrapy hangs if an exception raises in start_requests #83

dangra opened this issue Jan 27, 2012 · 2 comments

Comments

@dangra
Copy link
Member

dangra commented Jan 27, 2012

When start_requests iterator throws an exception, it makes engine._next_request fail with an UnhandledError and prevents scrapy from correctly stop the engine hanging forever

It is requried Ctrl-C to stop it.


2012-01-27 17:10:09-0200 [scrapy] INFO: Scrapy 0.15.1 started (bot: testbot)
2012-01-27 17:10:09-0200 [spidername.com] INFO: Spider opened
2012-01-27 17:10:09-0200 [spidername.com] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2012-01-27 17:10:09-0200 [-] Unhandled Error
    Traceback (most recent call last):
      File "/home/daniel/src/scrapy/scrapy/commands/crawl.py", line 45, in run
        self.crawler.start()
      File "/home/daniel/src/scrapy/scrapy/crawler.py", line 76, in start
        reactor.run(installSignalHandlers=False) # blocking call
      File "/home/daniel/envs/mytestenv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 1169, in run
        self.mainLoop()
      File "/home/daniel/envs/mytestenv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 1178, in mainLoop
        self.runUntilCurrent()
    --- <exception caught here> ---
      File "/home/daniel/envs/mytestenv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 800, in runUntilCurrent
        call.func(*call.args, **call.kw)
      File "/home/daniel/src/scrapy/scrapy/utils/reactor.py", line 41, in __call__
        return self._func(*self._a, **self._kw)
      File "/home/daniel/src/scrapy/scrapy/core/engine.py", line 108, in _next_request
        request = slot.start_requests.next()
      File "/home/daniel/src/testbot/testbot/spiders_dev/myspider.py", line 32, in start_requests
        'spidername.com does not support url mapping'
    exceptions.AssertionError: spidername.com does not support url mapping

^C2012-01-27 17:10:11-0200 [scrapy] INFO: Received SIGINT, shutting down gracefully. Send again to force unclean shutdown
2012-01-27 17:10:11-0200 [spidername.com] INFO: Closing spider (shutdown)
2012-01-27 17:10:11-0200 [spidername.com] INFO: Dumping spider stats:
    {'finish_reason': 'shutdown',
     'finish_time': datetime.datetime(2012, 1, 27, 19, 10, 11, 757102),
     'start_time': datetime.datetime(2012, 1, 27, 19, 10, 9, 487178)}
2012-01-27 17:10:11-0200 [spidername.com] INFO: Spider closed (shutdown)
2012-01-27 17:10:11-0200 [scrapy] INFO: Dumping global stats:
    {'memusage/max': 111865856, 'memusage/startup': 111865856}
dangra added a commit to dangra/scrapy that referenced this issue Jan 27, 2012
@dangra
Copy link
Member Author

dangra commented Jan 27, 2012

now it closes fine

2012-01-27 17:19:11-0200 [scrapy] INFO: Scrapy 0.15.1 started (bot: testbot)
2012-01-27 17:19:11-0200 [spidername.com] INFO: Spider opened
2012-01-27 17:19:11-0200 [spidername.com] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2012-01-27 17:19:11-0200 [spidername.com] ERROR: Obtaining request from start requests
    Traceback (most recent call last):
      File "/home/daniel/envs/mytestenv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 1169, in run
        self.mainLoop()
      File "/home/daniel/envs/mytestenv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 1178, in mainLoop
        self.runUntilCurrent()
      File "/home/daniel/envs/mytestenv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 800, in runUntilCurrent
        call.func(*call.args, **call.kw)
      File "/home/daniel/src/scrapy/scrapy/utils/reactor.py", line 41, in __call__
        return self._func(*self._a, **self._kw)
    --- <exception caught here> ---
      File "/home/daniel/src/scrapy/scrapy/core/engine.py", line 108, in _next_request
        request = slot.start_requests.next()
      File "/home/daniel/src/testbot/testbot/spiders_dev/myspider.py", line 32, in start_requests
        'spidername.com does not support url mapping'
    exceptions.AssertionError: spidername.com does not support url mapping

2012-01-27 17:19:11-0200 [spidername.com] INFO: Closing spider (finished)
2012-01-27 17:19:11-0200 [spidername.com] INFO: Dumping spider stats:
    {'finish_reason': 'finished',
     'finish_time': datetime.datetime(2012, 1, 27, 19, 19, 11, 981009),
     'start_time': datetime.datetime(2012, 1, 27, 19, 19, 11, 973632)}
2012-01-27 17:19:11-0200 [spidername.com] INFO: Spider closed (finished)
2012-01-27 17:19:11-0200 [scrapy] INFO: Dumping global stats:
    {'memusage/max': 111972352, 'memusage/startup': 111972352}

@dangra
Copy link
Member Author

dangra commented Oct 10, 2013

tests added by 5eb4299

lucywang000 pushed a commit to lucywang000/scrapy that referenced this issue Feb 24, 2019
Making canonical url solver operate as middleware.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant