Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Use dont_filter=True for contracts requests #3381

Merged
merged 8 commits into from Sep 6, 2018

Conversation

StasDeep
Copy link
Contributor

Fixes #3380.

@codecov
Copy link

codecov bot commented Aug 11, 2018

Codecov Report

Merging #3381 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #3381      +/-   ##
==========================================
+ Coverage   84.48%   84.49%   +<.01%     
==========================================
  Files         167      167              
  Lines        9371     9376       +5     
  Branches     1392     1392              
==========================================
+ Hits         7917     7922       +5     
  Misses       1199     1199              
  Partials      255      255
Impacted Files Coverage Δ
scrapy/contracts/__init__.py 81.3% <100%> (+0.79%) ⬆️

@kmike
Copy link
Member

kmike commented Aug 15, 2018

Looks good! Would you mind adding a test case?

@StasDeep
Copy link
Contributor Author

@kmike I can't start CrawlerProcess in the test. Maybe you know what's wrong with this?

class TestSameUrlSpider(Spider):

    def parse_first(self, response):
        """first callback
        @url http://scrapy.org
        """
        pass

    def parse_second(self, response):
        """second callback
        @url http://scrapy.org
        """
        pass


class ContractsManagerTest(unittest.TestCase):
    contracts = [UrlContract, ReturnsContract, ScrapesContract]

    def setUp(self):
        self.conman = ContractsManager(self.contracts)
        self.results = TextTestResult(stream=None, descriptions=False, verbosity=0)

    def test_same_url(self):
        crawler_process = CrawlerProcess()

        TestSameUrlSpider.start_requests = lambda s: self.conman.from_spider(s, self.results)
        crawler_process.crawl(TestSameUrlSpider)
        crawler_process.start()

        self.assertEqual(self.results.testsRun, 2)

@kmike
Copy link
Member

kmike commented Aug 15, 2018

@StasDeep these tests needs to be run in an event loop, the code is async. See

class CrawlerRunnerHasSpider(unittest.TestCase):
for an example - you need to use twisted.trial.unittest and @defer.inlineCallbacks if you want ot run CrawlerProcess (or better CrawlerRunner).


@defer.inlineCallbacks
def test_same_url(self):
TestSameUrlSpider.start_requests = lambda s: self.conman.from_spider(s, self.results)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if TestSameUrlSpider can be used only in this test (as its start_requests is monkey-patched), what do you think about just defining the spider inside the method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me!


def parse_first(self, response):
"""first callback
@url http://scrapy.org
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it going to be a real http request to scrapy.org? If so, this would make tests flaky. We have a mockserver available in tests, so it is better to use it -

class CrawlTestCase(TestCase):
is an usage example.

self.visited += 1
return TestItem()

TestSameUrlSpider.start_requests = lambda s: self.conman.from_spider(s, self.results)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can define a method, no need to monkey-patch it


crawler = CrawlerRunner().create_crawler(TestSameUrlSpider)
with MockServer() as mockserver:
yield crawler.crawl(mockserver=mockserver)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately mockserver doesn't make Scrapy magically make local requests :) MockServer just starts a process which listens to HTTP, and has a few endpoints you can use for testing. To actually use it you need to instruct Scrapy to make requests to it.

In Scrapy Contracts URL to fetch is defined in a docstring (@url ...), so if you define http://scrapy.org there, this test will fetch scrapy.org website. A sanity check is to try running this test with Internet disabled (e.g. WiFi turned off) - Scrapy testing suite shouldn't need external resources to run. To make it work you need to use an URL of some mockserver endpoint instead of scrapy.org.

Copy link
Member

@kmike kmike Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crawler.crawl arguments are passed to Spider.from_crawler, which eventually sets them as attributes. So this line does the following: starts crawling using TestSameUrlSpider, and this spider has mockserver attribute set to mockserver instance - which is not useful, as you're not using it in a spider.

@StasDeep
Copy link
Contributor Author

StasDeep commented Aug 18, 2018

@kmike thanks for your comments. Now it works without internet connection.

@StasDeep
Copy link
Contributor Author

StasDeep commented Sep 3, 2018

@kmike could you please give it a pass? Hopefully, final one :)

@dangra dangra added this to the v1.6 milestone Sep 5, 2018
@kmike kmike changed the title Use dont_filter=True for contracts requests [MRG+1] Use dont_filter=True for contracts requests Sep 5, 2018
@kmike
Copy link
Member

kmike commented Sep 5, 2018

@StasDeep yeah, looks good, thanks for the fix!

@dangra dangra merged commit 7223978 into scrapy:master Sep 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants