Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Return non-zero exit code from commands in case of errors in spiders constructor #3226

Merged
merged 18 commits into from
Jun 14, 2018

Conversation

whalebot-helmsman
Copy link
Contributor

Return non-zero exit code from commands in case if we have error in spiders constructor

@codecov
Copy link

codecov bot commented Apr 20, 2018

Codecov Report

Merging #3226 into master will increase coverage by <.01%.
The diff coverage is 80%.

@@            Coverage Diff             @@
##           master    #3226      +/-   ##
==========================================
+ Coverage   82.14%   82.15%   +<.01%     
==========================================
  Files         228      228              
  Lines        9599     9609      +10     
  Branches     1385     1387       +2     
==========================================
+ Hits         7885     7894       +9     
- Misses       1456     1458       +2     
+ Partials      258      257       -1
Impacted Files Coverage Δ
scrapy/commands/crawl.py 27.9% <0%> (-1.37%) ⬇️
scrapy/crawler.py 92.26% <100%> (+0.9%) ⬆️
scrapy/commands/runspider.py 74.28% <100%> (+0.75%) ⬆️

def is_crawlers_has_spider(self):
return reduce(lambda x,y: x and y,
self.crawlers_has_spiders,
True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any(self.crawlers_has_spiders)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @VMRuiz noted, it is actually all(...), not any(...), which is an additional argument to change it, as the code is not obvious :P

@whalebot-helmsman
Copy link
Contributor Author

whalebot-helmsman commented May 8, 2018

Test failed due to new twisted release, fix in twisted trunk twisted/twisted#1008 but not released to pypi. All PRs are broken.

@@ -1,3 +1,5 @@
import pdb
import inspect
Copy link
Member

@kmike kmike Jun 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pdb import is unneeded

from scrapy.extensions.throttle import AutoThrottle
from twisted.internet import defer
import twisted.trial.unittest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per pep8, twisted imports should go in a separate group, between stdlib imports and scrapy imports

@@ -220,6 +222,16 @@ def test_runspider(self):
self.assertIn("INFO: Closing spider (finished)", log)
self.assertIn("INFO: Spider closed (finished)", log)

def test_run_fail_spider(self):
proc = self.runspider("import scrapy\n" + inspect.getsource(ExceptionSpider))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice trick!

@kmike
Copy link
Member

kmike commented Jun 6, 2018

Looks good to me, thanks @whalebot-helmsman! I'd remove _is from method names, but it is just a style preference. +1 to merge.

@kmike kmike changed the title Addressing issue #3211 [MRG+1] Addressing issue #3211 Jun 6, 2018
@@ -221,6 +226,9 @@ def join(self):
while self._active:
yield defer.DeferredList(self._active)

def _is_spider_created_for_every_crawler(self):
return all(self.crawlers_has_spiders)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to propose the removal of all this methods and use a single boolean flag that summarizes the outcome of bootstraping the spider.

The public attribute name would be bootstrap_failed and we set it at _crawl / _done() like:

    self.bootstrap_failed |= not getattr(crawler, 'spider', None)

this public attribute replaces _is_spider_created_for_every_crawler() method and crawlers_has_spiders atttribute, but also remove the need for private _is_spider_created() method.

then the check in runspider.py and crawl.py looks like:

if not self.crawler_process.bootstrap_failed:
    self.exitcode = 1

@kmike kmike merged commit 72d0899 into scrapy:master Jun 14, 2018
@kmike
Copy link
Member

kmike commented Jun 14, 2018

Thanks @whalebot-helmsman and @dangra!

@dangra
Copy link
Member

dangra commented Jun 14, 2018

thanks @whalebot-helmsman and @kmike

@dangra
Copy link
Member

dangra commented Jun 14, 2018

I restarted the failed build to be sure it was a temporal failure.

@kmike kmike added this to the v1.6 milestone Jul 4, 2018
@kmike kmike changed the title [MRG+1] Addressing issue #3211 [MRG+1] Return non-zero exit code from commands in case of errors in spiders constructor Dec 26, 2018
@whalebot-helmsman whalebot-helmsman deleted the i=3211 branch July 18, 2019 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants