Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Automatic port selection for servicies in unit tests #3210

Merged
merged 53 commits into from May 23, 2018

Conversation

@whalebot-helmsman
Copy link
Contributor

@whalebot-helmsman whalebot-helmsman commented Apr 10, 2018

On a busy host usual ports are already occupied by different services. Tests just fails if some base server can not listen on port.
This pull-request add ability searching of free port for mockserver and proxyserver

@codecov
Copy link

@codecov codecov bot commented Apr 10, 2018

Codecov Report

Merging #3210 into master will increase coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3210      +/-   ##
==========================================
+ Coverage   82.12%   82.14%   +0.02%     
==========================================
  Files         228      228              
  Lines        9599     9599              
  Branches     1385     1385              
==========================================
+ Hits         7883     7885       +2     
+ Misses       1457     1456       -1     
+ Partials      259      258       -1
Impacted Files Coverage Δ
scrapy/http/cookies.py 91.07% <0%> (+1.78%) ⬆️
@whalebot-helmsman whalebot-helmsman changed the title Automatic port selection Automatic port selection for servicies in unit tests Apr 11, 2018
@kmike
Copy link
Member

@kmike kmike commented Apr 12, 2018

Hey! We've been discussing this issue in #1117

@whalebot-helmsman
Copy link
Contributor Author

@whalebot-helmsman whalebot-helmsman commented Apr 12, 2018

Problem with ephemeral port - there is no reliable way to return it from mockserver to process running unit tests. I didn't count piping as a reliable way.
And about splash tests they do not always work. In my case it obtained port for external IP of host. For security reasons almost all ports are closed for external IP. I get no responses to splash in docker container only timeouts.

@@ -17,6 +18,12 @@
from scrapy.utils.python import to_bytes, to_unicode


def get_ephemeral_port():
s = socket.socket()
s.bind(("", 0))

This comment has been minimized.

@kmike

kmike Apr 13, 2018
Member

Hm, so you said there was an issue because Splash tests are binding port for 0.0.0.0 (the same as ""?), and it was problematic? Should we try finding a port by binding to localhost only?


self._oldenv = os.environ.copy()
os.environ[self.HTTP] = 'http://localhost:%s' % (http_port, )
os.environ[self.HTTPS] = 'https://localhost:%s' % (https_port, )

This comment has been minimized.

@kmike

kmike Apr 13, 2018
Member

I think one of the goals should be to allow running Scrapy tests in parallel, including parallel in the same process. For this to work ports shouldn't be global - they should be attributes of the MockServer instance. I know it can be a larger change :(

This comment has been minimized.

@kmike

kmike Apr 13, 2018
Member

Hm, but using a single MockServer can be fine for Scrapy purposes, so maybe having host/port global is fine.

This comment has been minimized.

@whalebot-helmsman

whalebot-helmsman Apr 13, 2018
Author Contributor

To make port as variable of object you need to pass it to spider classes. And there is no clean way to do this. scrapy.crawler.CrawlerRunner.create_crawler does not accept arguments. There are lot of such places.

HTTPS = 'mock_https'

@classmethod
def from_mock(cls, path, is_secure=False):

This comment has been minimized.

@kmike

kmike Apr 13, 2018
Member

What do you think about renaming it to get_url, or just url?

This comment has been minimized.

@whalebot-helmsman

whalebot-helmsman Apr 13, 2018
Author Contributor

Not a problem for me

return self

def __exit__(self, exc_type, exc_value, traceback):
self.proc.kill()
self.proc.wait()
time.sleep(0.2)
os.environ = self._oldenv

This comment has been minimized.

@kmike

kmike Apr 13, 2018
Member

does it work, i.e. does it replace environment of a process, or does it just replace attribute of the os module? If that's the latter, then why do we need to use env variables, instead of using global variables (class attributes?)

This comment has been minimized.

@whalebot-helmsman

whalebot-helmsman Apr 13, 2018
Author Contributor

copied this solution from ProxyConnectTestCase.tearDown

This comment has been minimized.

@whalebot-helmsman

whalebot-helmsman Apr 13, 2018
Author Contributor

It is a variable, not a module https://stackoverflow.com/a/43487762/8640733

@whalebot-helmsman
Copy link
Contributor Author

@whalebot-helmsman whalebot-helmsman commented Apr 13, 2018

Considering all pypy tests fail. (https://travis-ci.org/scrapy/scrapy/jobs/366201130)
In pypy case get_ephemeral_port don't release socket on end of function.
In python case just binds one port and releases at function end, this port number goes to subprocess. But there is no guarantee it was not binded by some other process

It is not how ephemeral socket intended to work

@whalebot-helmsman
Copy link
Contributor Author

@whalebot-helmsman whalebot-helmsman commented May 17, 2018

@kmike after merging #3263 test are good

@kmike kmike changed the title Automatic port selection for servicies in unit tests [MRG+1] Automatic port selection for servicies in unit tests May 22, 2018
@kmike
kmike approved these changes May 22, 2018
Copy link
Member

@kmike kmike left a comment

Looks perfect, thanks @whalebot-helmsman!

+1 to squash and merge.

# Wait for the proxy to start.
time.sleep(1.0)
os.environ['http_proxy'] = 'http://scrapy:scrapy@localhost:8888'
os.environ['https_proxy'] = 'http://scrapy:scrapy@localhost:8888'
os.environ['https_proxy'] = self._proxy.http_address()

This comment has been minimized.

@lopuhin

lopuhin May 23, 2018
Member

should it be self._proxy.https_address() instead?

This comment has been minimized.

@whalebot-helmsman

whalebot-helmsman May 23, 2018
Author Contributor

We have two type of protocols - protocol we talk with a crawled server and protocol we talk with a proxy. For different protocols for crawled server(http and https) we use same proxy protocol(http).

Copy link
Member

@lopuhin lopuhin left a comment

Looks good, left some minor comments regarding http/https

@@ -52,17 +64,17 @@ def tearDown(self):
def test_https_connect_tunnel(self):
crawler = get_crawler(SimpleSpider)
with LogCapture() as l:
yield crawler.crawl("https://localhost:8999/status?n=200")
yield crawler.crawl(self.mockserver.url("/status?n=200"))

This comment has been minimized.

@lopuhin

lopuhin May 23, 2018
Member

missing is_secure=True?

crawler = get_crawler(SimpleSpider)
with LogCapture() as l:
yield crawler.crawl("https://localhost:8999/status?n=200")
yield crawler.crawl(self.mockserver.url("/status?n=200"))

This comment has been minimized.

@lopuhin

lopuhin May 23, 2018
Member

missing is_secure=True?


@defer.inlineCallbacks
def test_https_tunnel_without_leak_proxy_authorization_header(self):
request = Request("https://localhost:8999/echo")
request = Request(self.mockserver.url("/echo"))

This comment has been minimized.

@lopuhin

lopuhin May 23, 2018
Member

missing is_secure=True?

@lopuhin lopuhin merged commit b364d27 into scrapy:master May 23, 2018
2 checks passed
2 checks passed
codecov/patch Coverage not affected when comparing bac1e2d...e33056d
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@lopuhin
Copy link
Member

@lopuhin lopuhin commented May 23, 2018

@whalebot-helmsman many thanks for fixing this and thanks @kmike for review! 👍

@lopuhin
Copy link
Member

@lopuhin lopuhin commented May 24, 2018

For reference, master build failed on pypy3 for a seemingly unrelated reason, I restarted it and it worked.
I think the errors were at teardown of FeedExportTest.test_export_indentation and FeedExportTest.test_export_feed_export_fields: twisted.internet.error.AlreadyCancelled: Tried to cancel an already-cancelled event.. Overall stats: == 2 failed, 1660 passed, 11 skipped, 13 xfailed, 2 error in 1343.60 seconds === (I downloaded original log too). I don't see it mentioned in the issues, if this repeats I'll create a separate issue.

@whalebot-helmsman
Copy link
Contributor Author

@whalebot-helmsman whalebot-helmsman commented May 24, 2018

@lopuhin there is incorrect interaction between scrapy, twisted(or may be twisted-test) and how we run mockserver. It manifests itself from time to time. There is a reliable way to trigger it #3238 in all pythons.

There is a workaround. We need to put mockserver handling for these cases to setUp/tearDown methods. Tests will be green, but the problem persists.

May be, solution is to run mockserver in same process.

It is hard to propose a solution as we do not know root of a problem.

@kmike kmike added this to the v1.6 milestone Jul 4, 2018
@whalebot-helmsman whalebot-helmsman deleted the whalebot-helmsman:automatic_port_selection branch Jul 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants