[MRG+1] Add ExecutionEngine.close() method #1423
Conversation
Some other proposed solutions can be seen here: #1272 (comment), but this one seems the most practical. |
crawler = get_crawler(SimpleSpider) | ||
class TestError(Exception): | ||
pass | ||
with mock.patch('scrapy.crawler.ExecutionEngine.open_spider') as mock_os: |
kmike
Aug 27, 2015
Member
Could you please add a comment about how can this error happen? Is it an error raised in some user component? I have troubles reading tests which use mocks, sorry :)
Could you please add a comment about how can this error happen? Is it an error raised in some user component? I have troubles reading tests which use mocks, sorry :)
jdemaeyer
Aug 28, 2015
Author
Contributor
It simulates an exception being thrown somewhere inside the try
block of Crawler.crawl()
, after the engine has been created. It could be in self.spider.start_requests()
, or (as mocked here) in self.engine.open_spider()
. E.g. in the scheduler class's from_crawler()
, in a spider middleware's process_start_requests()
, in a pipeline's open_spider()
, etc.
(I have trouble reading some mock tests too, but I liked this tutorial :))
It simulates an exception being thrown somewhere inside the try
block of Crawler.crawl()
, after the engine has been created. It could be in self.spider.start_requests()
, or (as mocked here) in self.engine.open_spider()
. E.g. in the scheduler class's from_crawler()
, in a spider middleware's process_start_requests()
, in a pipeline's open_spider()
, etc.
(I have trouble reading some mock tests too, but I liked this tutorial :))
kmike
Aug 28, 2015
Member
Thanks, it is more clear now.
Test reads as follows: if ExecutionEngine.open_spider
raises an error then crawler.crawl()
should raise the same error and stop crawling. But it is not clear why ExecutionEngine.open_spider
can raise an error - is it some protection against errors in Scrapy itself, or a protection against errors in user components? This is what made the test harder to read. Testing Engine on its own makes sense, but I think it worths adding your comment to the source code.
A test like "Here is a faulty spider middleware, we enable it and crawl, and as a result an exception is raised and crawler is stopped" can be easier to read, but it won't check all options. Maybe we should have both. Or maybe we already have such test, sorry for noise :)
Thanks, it is more clear now.
Test reads as follows: if ExecutionEngine.open_spider
raises an error then crawler.crawl()
should raise the same error and stop crawling. But it is not clear why ExecutionEngine.open_spider
can raise an error - is it some protection against errors in Scrapy itself, or a protection against errors in user components? This is what made the test harder to read. Testing Engine on its own makes sense, but I think it worths adding your comment to the source code.
A test like "Here is a faulty spider middleware, we enable it and crawl, and as a result an exception is raised and crawler is stopped" can be easier to read, but it won't check all options. Maybe we should have both. Or maybe we already have such test, sorry for noise :)
raise | ||
if self.engine is not None: | ||
yield self.engine.close() | ||
raise e |
kmike
Aug 27, 2015
Member
By changing raise
to raise e
you're loosing the original traceback - is it intentional?
By changing raise
to raise e
you're loosing the original traceback - is it intentional?
jdemaeyer
Aug 28, 2015
Author
Contributor
Hm, didn't catch that raise e
doesn't preserve the traceback :/ The problem is that lines 77/78 somehow make Python forget the last active exception, and therefore raise
gives a TypeError
because I'm not allowed to raise None
(and it wouldn't be what we want anyways). I guess the forgetting happens somewhere inside twisted's yield
magic, but no idea.
raise e
does preserve the traceback on python3, and there's an ugly workaround for python2 (4th code block here). Maybe that will do inside an if six.PY2
block?
Hm, didn't catch that raise e
doesn't preserve the traceback :/ The problem is that lines 77/78 somehow make Python forget the last active exception, and therefore raise
gives a TypeError
because I'm not allowed to raise None
(and it wouldn't be what we want anyways). I guess the forgetting happens somewhere inside twisted's yield
magic, but no idea.
raise e
does preserve the traceback on python3, and there's an ugly workaround for python2 (4th code block here). Maybe that will do inside an if six.PY2
block?
curita
Aug 28, 2015
Member
I think six.reraise()
could handle that, provided with an exc_info
stored before yield self.engine.close()
.
I think six.reraise()
could handle that, provided with an exc_info
stored before yield self.engine.close()
.
jdemaeyer
Aug 28, 2015
Author
Contributor
Aahh there's the six function I was looking for, ty :)
Aahh there's the six function I was looking for, ty :)
4687e4a
to
d9d12b3
Current coverage is
|
Thanks for the feedback everybody, and apologies for the huge delay :( I've implemented @nramirezuy's solution to keep the traceback and got rid of the mock in the test (thanks @kmike) |
Hm I don't see why codecov marks the |
@jdemaeyer it may be a codecov issue similar to https://github.com/codecov/support/issues/100 |
76440e7
to
d9d12b3
After chatting back and forth with Steve from Codecov, and doing some more tests to make sure that the I've reported it to them and Ned is currently looking into it (after mentioning that "BTW: I used scrapy yesterday, good stuff :)"). So I think the coverage should be fine. |
return self.stop() | ||
elif self.open_spiders: | ||
# Will close downloader | ||
return self._close_all_spiders() |
kmike
Oct 28, 2015
Member
Our method names are weird: based on comments here the difference between self._close_all_spiders()
and self.stop()
is that self._close_all_spiders()
doesn't close spiders.
Our method names are weird: based on comments here the difference between self._close_all_spiders()
and self.stop()
is that self._close_all_spiders()
doesn't close spiders.
jdemaeyer
Oct 28, 2015
Author
Contributor
Hehe I should reformulate the comments. They should read "Will also close spiders and downloader" etc.
Hehe I should reformulate the comments. They should read "Will also close spiders and downloader" etc.
jdemaeyer
Oct 29, 2015
Author
Contributor
done.
done.
d9d12b3
to
db45095
db45095
to
8307c12
Thanks @jdemaeyer, @curita and @nramirezuy. |
…lose [MRG+1] Add ExecutionEngine.close() method
Depending on what state the engine is in, there are different measures that need to be taken to shut it down gracefully (i.e. leaving the reactor clean): Stop the engine, close the spider, or close the downloader.
This PR adds a new method as a single entry point for shutting down the engine and integrates it into
Crawler.crawl()
for graceful error handling during the crawling process.