Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Add ExecutionEngine.close() method #1423

Merged
merged 1 commit into from Oct 29, 2015

Conversation

@jdemaeyer
Copy link
Contributor

@jdemaeyer jdemaeyer commented Aug 11, 2015

Depending on what state the engine is in, there are different measures that need to be taken to shut it down gracefully (i.e. leaving the reactor clean): Stop the engine, close the spider, or close the downloader.

This PR adds a new method as a single entry point for shutting down the engine and integrates it into Crawler.crawl() for graceful error handling during the crawling process.

@jdemaeyer jdemaeyer mentioned this pull request Aug 19, 2015
37 of 37 tasks complete
@curita
Copy link
Member

@curita curita commented Aug 24, 2015

👍

Some other proposed solutions can be seen here: #1272 (comment), but this one seems the most practical.

@curita curita changed the title Add ExecutionEngine.close() method [MRG+1] Add ExecutionEngine.close() method Aug 24, 2015
crawler = get_crawler(SimpleSpider)
class TestError(Exception):
pass
with mock.patch('scrapy.crawler.ExecutionEngine.open_spider') as mock_os:

This comment has been minimized.

@kmike

kmike Aug 27, 2015
Member

Could you please add a comment about how can this error happen? Is it an error raised in some user component? I have troubles reading tests which use mocks, sorry :)

This comment has been minimized.

@jdemaeyer

jdemaeyer Aug 28, 2015
Author Contributor

It simulates an exception being thrown somewhere inside the try block of Crawler.crawl(), after the engine has been created. It could be in self.spider.start_requests(), or (as mocked here) in self.engine.open_spider(). E.g. in the scheduler class's from_crawler(), in a spider middleware's process_start_requests(), in a pipeline's open_spider(), etc.

(I have trouble reading some mock tests too, but I liked this tutorial :))

This comment has been minimized.

@kmike

kmike Aug 28, 2015
Member

Thanks, it is more clear now.

Test reads as follows: if ExecutionEngine.open_spider raises an error then crawler.crawl() should raise the same error and stop crawling. But it is not clear why ExecutionEngine.open_spider can raise an error - is it some protection against errors in Scrapy itself, or a protection against errors in user components? This is what made the test harder to read. Testing Engine on its own makes sense, but I think it worths adding your comment to the source code.

A test like "Here is a faulty spider middleware, we enable it and crawl, and as a result an exception is raised and crawler is stopped" can be easier to read, but it won't check all options. Maybe we should have both. Or maybe we already have such test, sorry for noise :)

raise
if self.engine is not None:
yield self.engine.close()
raise e

This comment has been minimized.

@kmike

kmike Aug 27, 2015
Member

By changing raise to raise e you're loosing the original traceback - is it intentional?

This comment has been minimized.

@jdemaeyer

jdemaeyer Aug 28, 2015
Author Contributor

Hm, didn't catch that raise e doesn't preserve the traceback :/ The problem is that lines 77/78 somehow make Python forget the last active exception, and therefore raise gives a TypeError because I'm not allowed to raise None (and it wouldn't be what we want anyways). I guess the forgetting happens somewhere inside twisted's yield magic, but no idea.

raise e does preserve the traceback on python3, and there's an ugly workaround for python2 (4th code block here). Maybe that will do inside an if six.PY2 block?

This comment has been minimized.

@curita

curita Aug 28, 2015
Member

I think six.reraise() could handle that, provided with an exc_info stored before yield self.engine.close().

This comment has been minimized.

@jdemaeyer

jdemaeyer Aug 28, 2015
Author Contributor

Aahh there's the six function I was looking for, ty :)

This comment has been minimized.

@jdemaeyer jdemaeyer force-pushed the jdemaeyer:enhancement/executionengine-close branch 2 times, most recently from 4687e4a to d9d12b3 Oct 26, 2015
@codecov-io
Copy link

@codecov-io codecov-io commented Oct 26, 2015

Current coverage is 82.69%

Merging #1423 into master will decrease coverage by -0.10% as of 0c45c7e

Powered by Codecov. Updated on successful CI builds.

@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Oct 26, 2015

Thanks for the feedback everybody, and apologies for the huge delay :(

I've implemented @nramirezuy's solution to keep the traceback and got rid of the mock in the test (thanks @kmike)

@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Oct 26, 2015

Hm I don't see why codecov marks the yield exc line as 'not hit'. The test fails when I remove it and passes when it's there.

@kmike
Copy link
Member

@kmike kmike commented Oct 26, 2015

@jdemaeyer it may be a codecov issue similar to https://github.com/codecov/support/issues/100

@jdemaeyer jdemaeyer force-pushed the jdemaeyer:enhancement/executionengine-close branch from 76440e7 to d9d12b3 Oct 27, 2015
@jdemaeyer
Copy link
Contributor Author

@jdemaeyer jdemaeyer commented Oct 27, 2015

After chatting back and forth with Steve from Codecov, and doing some more tests to make sure that the yield line is indeed hit, we came to the conclusion that it is an issue with coverage.py.

I've reported it to them and Ned is currently looking into it (after mentioning that "BTW: I used scrapy yesterday, good stuff :)"). So I think the coverage should be fine.

return self.stop()
elif self.open_spiders:
# Will close downloader
return self._close_all_spiders()

This comment has been minimized.

@kmike

kmike Oct 28, 2015
Member

Our method names are weird: based on comments here the difference between self._close_all_spiders() and self.stop() is that self._close_all_spiders() doesn't close spiders.

This comment has been minimized.

@jdemaeyer

jdemaeyer Oct 28, 2015
Author Contributor

Hehe I should reformulate the comments. They should read "Will also close spiders and downloader" etc.

This comment has been minimized.

@jdemaeyer

jdemaeyer Oct 29, 2015
Author Contributor

done.

@jdemaeyer jdemaeyer force-pushed the jdemaeyer:enhancement/executionengine-close branch from d9d12b3 to db45095 Oct 29, 2015
@jdemaeyer jdemaeyer force-pushed the jdemaeyer:enhancement/executionengine-close branch from db45095 to 8307c12 Oct 29, 2015
@kmike
Copy link
Member

@kmike kmike commented Oct 29, 2015

kmike added a commit that referenced this pull request Oct 29, 2015
…lose

[MRG+1] Add ExecutionEngine.close() method
@kmike kmike merged commit caf2080 into scrapy:master Oct 29, 2015
1 of 2 checks passed
1 of 2 checks passed
codecov/patch 80.00% of diff hit (target 100.00%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@kmike kmike mentioned this pull request Feb 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants