Scrapy hangs if an exception raises in start_requests #83

dangra · 2012-01-27T19:14:31Z

When start_requests iterator throws an exception, it makes engine._next_request fail with an UnhandledError and prevents scrapy from correctly stop the engine hanging forever

It is requried Ctrl-C to stop it.


2012-01-27 17:10:09-0200 [scrapy] INFO: Scrapy 0.15.1 started (bot: testbot)
2012-01-27 17:10:09-0200 [spidername.com] INFO: Spider opened
2012-01-27 17:10:09-0200 [spidername.com] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2012-01-27 17:10:09-0200 [-] Unhandled Error
    Traceback (most recent call last):
      File "/home/daniel/src/scrapy/scrapy/commands/crawl.py", line 45, in run
        self.crawler.start()
      File "/home/daniel/src/scrapy/scrapy/crawler.py", line 76, in start
        reactor.run(installSignalHandlers=False) # blocking call
      File "/home/daniel/envs/mytestenv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 1169, in run
        self.mainLoop()
      File "/home/daniel/envs/mytestenv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 1178, in mainLoop
        self.runUntilCurrent()
    --- <exception caught here> ---
      File "/home/daniel/envs/mytestenv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 800, in runUntilCurrent
        call.func(*call.args, **call.kw)
      File "/home/daniel/src/scrapy/scrapy/utils/reactor.py", line 41, in __call__
        return self._func(*self._a, **self._kw)
      File "/home/daniel/src/scrapy/scrapy/core/engine.py", line 108, in _next_request
        request = slot.start_requests.next()
      File "/home/daniel/src/testbot/testbot/spiders_dev/myspider.py", line 32, in start_requests
        'spidername.com does not support url mapping'
    exceptions.AssertionError: spidername.com does not support url mapping

^C2012-01-27 17:10:11-0200 [scrapy] INFO: Received SIGINT, shutting down gracefully. Send again to force unclean shutdown
2012-01-27 17:10:11-0200 [spidername.com] INFO: Closing spider (shutdown)
2012-01-27 17:10:11-0200 [spidername.com] INFO: Dumping spider stats:
    {'finish_reason': 'shutdown',
     'finish_time': datetime.datetime(2012, 1, 27, 19, 10, 11, 757102),
     'start_time': datetime.datetime(2012, 1, 27, 19, 10, 9, 487178)}
2012-01-27 17:10:11-0200 [spidername.com] INFO: Spider closed (shutdown)
2012-01-27 17:10:11-0200 [scrapy] INFO: Dumping global stats:
    {'memusage/max': 111865856, 'memusage/startup': 111865856}

The text was updated successfully, but these errors were encountered:

dangra · 2012-01-27T19:46:52Z

now it closes fine

2012-01-27 17:19:11-0200 [scrapy] INFO: Scrapy 0.15.1 started (bot: testbot)
2012-01-27 17:19:11-0200 [spidername.com] INFO: Spider opened
2012-01-27 17:19:11-0200 [spidername.com] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2012-01-27 17:19:11-0200 [spidername.com] ERROR: Obtaining request from start requests
    Traceback (most recent call last):
      File "/home/daniel/envs/mytestenv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 1169, in run
        self.mainLoop()
      File "/home/daniel/envs/mytestenv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 1178, in mainLoop
        self.runUntilCurrent()
      File "/home/daniel/envs/mytestenv/local/lib/python2.7/site-packages/twisted/internet/base.py", line 800, in runUntilCurrent
        call.func(*call.args, **call.kw)
      File "/home/daniel/src/scrapy/scrapy/utils/reactor.py", line 41, in __call__
        return self._func(*self._a, **self._kw)
    --- <exception caught here> ---
      File "/home/daniel/src/scrapy/scrapy/core/engine.py", line 108, in _next_request
        request = slot.start_requests.next()
      File "/home/daniel/src/testbot/testbot/spiders_dev/myspider.py", line 32, in start_requests
        'spidername.com does not support url mapping'
    exceptions.AssertionError: spidername.com does not support url mapping

2012-01-27 17:19:11-0200 [spidername.com] INFO: Closing spider (finished)
2012-01-27 17:19:11-0200 [spidername.com] INFO: Dumping spider stats:
    {'finish_reason': 'finished',
     'finish_time': datetime.datetime(2012, 1, 27, 19, 19, 11, 981009),
     'start_time': datetime.datetime(2012, 1, 27, 19, 19, 11, 973632)}
2012-01-27 17:19:11-0200 [spidername.com] INFO: Spider closed (finished)
2012-01-27 17:19:11-0200 [scrapy] INFO: Dumping global stats:
    {'memusage/max': 111972352, 'memusage/startup': 111972352}

dangra · 2013-10-10T01:05:14Z

tests added by 5eb4299

Making canonical url solver operate as middleware.

dangra added a commit to dangra/scrapy that referenced this issue Jan 27, 2012

Catch start_requests iterator errors. refs scrapy#83

2e18f0d

dangra closed this as completed Jan 27, 2012

dangra added a commit to dangra/scrapy that referenced this issue Feb 7, 2012

Catch start_requests iterator errors. refs scrapy#83

454a21d

krya pushed a commit to krya/scrapy that referenced this issue Mar 20, 2012

Catch start_requests iterator errors. refs scrapy#83

0d3abfd

kmike mentioned this issue Oct 3, 2013

Scrapy hangs if an exception raises in start_requests #411

Closed

lucywang000 pushed a commit to lucywang000/scrapy that referenced this issue Feb 24, 2019

Merge pull request scrapy#83 from scrapinghub/canonical-solver-mw

d1a2ea8

Making canonical url solver operate as middleware.

joaquingx mentioned this issue Sep 17, 2019

Add HTML5Parser option scrapy/parsel#133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrapy hangs if an exception raises in start_requests #83

Scrapy hangs if an exception raises in start_requests #83

dangra commented Jan 27, 2012

dangra commented Jan 27, 2012

dangra commented Oct 10, 2013

Scrapy hangs if an exception raises in start_requests #83

Scrapy hangs if an exception raises in start_requests #83

Comments

dangra commented Jan 27, 2012

dangra commented Jan 27, 2012

dangra commented Oct 10, 2013