Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Automatic port selection for servicies in unit tests #3210

Merged
merged 53 commits into from
May 23, 2018

Conversation

whalebot-helmsman
Copy link
Contributor

@whalebot-helmsman whalebot-helmsman commented Apr 10, 2018

On a busy host usual ports are already occupied by different services. Tests just fails if some base server can not listen on port.
This pull-request add ability searching of free port for mockserver and proxyserver

@codecov
Copy link

codecov bot commented Apr 10, 2018

Codecov Report

Merging #3210 into master will increase coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3210      +/-   ##
==========================================
+ Coverage   82.12%   82.14%   +0.02%     
==========================================
  Files         228      228              
  Lines        9599     9599              
  Branches     1385     1385              
==========================================
+ Hits         7883     7885       +2     
+ Misses       1457     1456       -1     
+ Partials      259      258       -1
Impacted Files Coverage Δ
scrapy/http/cookies.py 91.07% <0%> (+1.78%) ⬆️

@whalebot-helmsman whalebot-helmsman changed the title Automatic port selection Automatic port selection for servicies in unit tests Apr 11, 2018
@kmike
Copy link
Member

kmike commented Apr 12, 2018

Hey! We've been discussing this issue in #1117

@whalebot-helmsman
Copy link
Contributor Author

whalebot-helmsman commented Apr 12, 2018

Problem with ephemeral port - there is no reliable way to return it from mockserver to process running unit tests. I didn't count piping as a reliable way.
And about splash tests they do not always work. In my case it obtained port for external IP of host. For security reasons almost all ports are closed for external IP. I get no responses to splash in docker container only timeouts.

@@ -17,6 +18,12 @@
from scrapy.utils.python import to_bytes, to_unicode


def get_ephemeral_port():
s = socket.socket()
s.bind(("", 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, so you said there was an issue because Splash tests are binding port for 0.0.0.0 (the same as ""?), and it was problematic? Should we try finding a port by binding to localhost only?


self._oldenv = os.environ.copy()
os.environ[self.HTTP] = 'http://localhost:%s' % (http_port, )
os.environ[self.HTTPS] = 'https://localhost:%s' % (https_port, )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of the goals should be to allow running Scrapy tests in parallel, including parallel in the same process. For this to work ports shouldn't be global - they should be attributes of the MockServer instance. I know it can be a larger change :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but using a single MockServer can be fine for Scrapy purposes, so maybe having host/port global is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make port as variable of object you need to pass it to spider classes. And there is no clean way to do this. scrapy.crawler.CrawlerRunner.create_crawler does not accept arguments. There are lot of such places.

HTTPS = 'mock_https'

@classmethod
def from_mock(cls, path, is_secure=False):
Copy link
Member

@kmike kmike Apr 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about renaming it to get_url, or just url?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem for me

return self

def __exit__(self, exc_type, exc_value, traceback):
self.proc.kill()
self.proc.wait()
time.sleep(0.2)
os.environ = self._oldenv
Copy link
Member

@kmike kmike Apr 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it work, i.e. does it replace environment of a process, or does it just replace attribute of the os module? If that's the latter, then why do we need to use env variables, instead of using global variables (class attributes?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied this solution from ProxyConnectTestCase.tearDown

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a variable, not a module https://stackoverflow.com/a/43487762/8640733

@whalebot-helmsman
Copy link
Contributor Author

whalebot-helmsman commented Apr 13, 2018

Considering all pypy tests fail. (https://travis-ci.org/scrapy/scrapy/jobs/366201130)
In pypy case get_ephemeral_port don't release socket on end of function.
In python case just binds one port and releases at function end, this port number goes to subprocess. But there is no guarantee it was not binded by some other process

It is not how ephemeral socket intended to work

os.environ['https_proxy'] = 'http://wrong:wronger@localhost:8888?noconnect'
proxy = os.environ['https_proxy']
bad_auth_proxy = list(urlsplit(proxy))
bad_auth_proxy[1] = bad_auth_proxy[1].replace('scrapy:scrapy@', 'wrong:wronger@')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to create a small utility function, to avoid copy-paste

@whalebot-helmsman
Copy link
Contributor Author

@kmike after merging #3263 test are good

@kmike kmike changed the title Automatic port selection for servicies in unit tests [MRG+1] Automatic port selection for servicies in unit tests May 22, 2018
Copy link
Member

@kmike kmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect, thanks @whalebot-helmsman!

+1 to squash and merge.

# Wait for the proxy to start.
time.sleep(1.0)
os.environ['http_proxy'] = 'http://scrapy:scrapy@localhost:8888'
os.environ['https_proxy'] = 'http://scrapy:scrapy@localhost:8888'
os.environ['https_proxy'] = self._proxy.http_address()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be self._proxy.https_address() instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two type of protocols - protocol we talk with a crawled server and protocol we talk with a proxy. For different protocols for crawled server(http and https) we use same proxy protocol(http).

Copy link
Member

@lopuhin lopuhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left some minor comments regarding http/https

@@ -52,17 +64,17 @@ def tearDown(self):
def test_https_connect_tunnel(self):
crawler = get_crawler(SimpleSpider)
with LogCapture() as l:
yield crawler.crawl("https://localhost:8999/status?n=200")
yield crawler.crawl(self.mockserver.url("/status?n=200"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing is_secure=True?

crawler = get_crawler(SimpleSpider)
with LogCapture() as l:
yield crawler.crawl("https://localhost:8999/status?n=200")
yield crawler.crawl(self.mockserver.url("/status?n=200"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing is_secure=True?


@defer.inlineCallbacks
def test_https_tunnel_without_leak_proxy_authorization_header(self):
request = Request("https://localhost:8999/echo")
request = Request(self.mockserver.url("/echo"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing is_secure=True?

@lopuhin lopuhin merged commit b364d27 into scrapy:master May 23, 2018
@lopuhin
Copy link
Member

lopuhin commented May 23, 2018

@whalebot-helmsman many thanks for fixing this and thanks @kmike for review! 👍

@lopuhin
Copy link
Member

lopuhin commented May 24, 2018

For reference, master build failed on pypy3 for a seemingly unrelated reason, I restarted it and it worked.
I think the errors were at teardown of FeedExportTest.test_export_indentation and FeedExportTest.test_export_feed_export_fields: twisted.internet.error.AlreadyCancelled: Tried to cancel an already-cancelled event.. Overall stats: == 2 failed, 1660 passed, 11 skipped, 13 xfailed, 2 error in 1343.60 seconds === (I downloaded original log too). I don't see it mentioned in the issues, if this repeats I'll create a separate issue.

@whalebot-helmsman
Copy link
Contributor Author

whalebot-helmsman commented May 24, 2018

@lopuhin there is incorrect interaction between scrapy, twisted(or may be twisted-test) and how we run mockserver. It manifests itself from time to time. There is a reliable way to trigger it #3238 in all pythons.

There is a workaround. We need to put mockserver handling for these cases to setUp/tearDown methods. Tests will be green, but the problem persists.

May be, solution is to run mockserver in same process.

It is hard to propose a solution as we do not know root of a problem.

@kmike kmike added this to the v1.6 milestone Jul 4, 2018
@whalebot-helmsman whalebot-helmsman deleted the automatic_port_selection branch July 18, 2019 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants