Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jenkins failure #648

Closed
gforcada opened this issue Dec 20, 2018 · 20 comments
Closed

Jenkins failure #648

gforcada opened this issue Dec 20, 2018 · 20 comments

Comments

@gforcada
Copy link
Sponsor Contributor

Any idea what could go wrong here? It is failing since yesterday 馃槙

Error in test test_documentation_search_fullobjects (plone.restapi.tests.test_documentation.TestDocumentation)
Traceback (most recent call last):
  File "/srv/python2.7/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/home/jenkins/workspace/plone-5.2-python-2.7/src/plone.restapi/src/plone/restapi/tests/test_documentation.py", line 499, in test_documentation_search_fullobjects
    response = self.api_session.get('/@search', params=query)
  File "/home/jenkins/.buildout/eggs/cp27m/requests-2.21.0-py2.7.egg/requests/sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "/home/jenkins/workspace/plone-5.2-python-2.7/src/plone.restapi/src/plone/restapi/testing.py", line 359, in request
    return super(RelativeSession, self).request(method, url, **kwargs)
  File "/home/jenkins/.buildout/eggs/cp27m/requests-2.21.0-py2.7.egg/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/jenkins/.buildout/eggs/cp27m/requests-2.21.0-py2.7.egg/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/home/jenkins/.buildout/eggs/cp27m/requests-2.21.0-py2.7.egg/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', error(104, 'Connection reset by peer'))
@tisto
Copy link
Sponsor Member

tisto commented Dec 22, 2018

@gforcada connection timeout when the test tries to do a request against the functional test fixture. Maybe a port allocation problem on Jenkins?

@mauritsvanrees
Copy link
Sponsor Member

In the traceback above, the error is in test_documentation_search_fullobjects. On latest Jenkins it is in test_documentation_search.

@mauritsvanrees
Copy link
Sponsor Member

On Python3 locally I get hundreds of these warnings, each with a different port number:

.../zope.testrunner-4.9.2-py3.7.egg/zope/testrunner/runner.py:386: 
ResourceWarning: unclosed <socket.socket fd=9, family=AddressFamily.AF_INET,
type=SocketKind.SOCK_STREAM, proto=6, 
laddr=('127.0.0.1', 50686), raddr=('127.0.0.1', 49936)>

That can easily cause problems on Jenkins, especially when multiple jobs are run in parallel. As far as I see, no open ports/sockets are lingering after all the tests have run, so at the end it works out. On Python 2 I do not see these warnings.

Strangely, after I run the tests (bin/test --all -s plone.restapi on coredev 5.2, both on Python 2 and 3), there are dozens of changed files in the plone.restapi checkout. 82 changes on Python 2, 91 on Python 3, but maybe this differs from run to run. A sample change gives a hint of the underlying problem:

--- a/src/plone/restapi/tests/http-examples/404_not_found.resp
+++ b/src/plone/restapi/tests/http-examples/404_not_found.resp
@@ -2,6 +2,6 @@ HTTP/1.1 404 Not Found
 Content-Type: application/json
 
 {
-  "message": "Resource not found: http://localhost:55001/plone/non-existing-resource", 
+  "message": "Resource not found: http://localhost:50796/plone/non-existing-resource",
   "type": "NotFound"
 }

So the source code contains output from a local test run. I guess this is for debugging purposes. It has port 55001, but on coredev (or latest plone.testing?) the ports are randomized

@mauritsvanrees
Copy link
Sponsor Member

Hey, I didn't mean to close this issue. I was still editing my previous comment...
Anyway, plone.testing 7.0.0 has this change by @Rotonen: "Default to picking a dynamical port for ZServer layers instead of a static default port."

I guess what happens in plone.restapi on Python 3 is that too many ports are left open during parallel runs, and then either there are no open ports left, or maybe (worse) a test gets access to a port that is opened by a test from a different test run.

@Rotonen, would you know if there is a spot in plone.testing where on Python 3 the port/socket would need to be closed explicitly?

@mauritsvanrees
Copy link
Sponsor Member

Ah wait, we don't get a different randomized port for each test, but for each process, or maybe each layer. When checking with lsof locally (Mac), when doing four parallel test runs, I get between four and eight open ports:

$ lsof -i -n -P | grep Python
Python    96982 maurits    6u  IPv4 0xbcb26b87281b3a7b      0t0  TCP 127.0.0.1:51770 (LISTEN)
Python    96983 maurits    6u  IPv4 0xbcb26b872560aa7b      0t0  TCP 127.0.0.1:51775 (LISTEN)
Python    96983 maurits    9u  IPv4 0xbcb26b87313520fb      0t0  TCP 127.0.0.1:51987->127.0.0.1:51775 (ESTABLISHED)
Python    96983 maurits   10u  IPv4 0xbcb26b873135177b      0t0  TCP 127.0.0.1:51775->127.0.0.1:51987 (ESTABLISHED)
Python    97450 maurits    6u  IPv4 0xbcb26b872debd47b      0t0  TCP 127.0.0.1:51832 (LISTEN)
Python    97918 maurits    6u  IPv4 0xbcb26b8730d6e0fb      0t0  TCP 127.0.0.1:51917 (LISTEN)

And with those four parallel runs, on Python 3 at least, I get an error in three of them, like this:

Error in test test_documentation_users_authorized_get (plone.restapi.tests.test_documentation.TestDocumentation)
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/unittest/case.py", line 615, in run
    testMethod()
  File "/Users/maurits/community/plone-coredev/py3/src/plone.restapi/src/plone/restapi/tests/test_documentation.py", line 766, in test_documentation_users_authorized_get
    response = logged_out_api_session.get('@users/noam')
  File "/Users/maurits/shared-eggs/cp37m/requests-2.21.0-py3.7.egg/requests/sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "/Users/maurits/community/plone-coredev/py3/src/plone.restapi/src/plone/restapi/testing.py", line 359, in request
    return super(RelativeSession, self).request(method, url, **kwargs)
  File "/Users/maurits/shared-eggs/cp37m/requests-2.21.0-py3.7.egg/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/maurits/shared-eggs/cp37m/requests-2.21.0-py3.7.egg/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/Users/maurits/shared-eggs/cp37m/requests-2.21.0-py3.7.egg/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

@mauritsvanrees
Copy link
Sponsor Member

I did another four parallel test runs on Python 3. Now three of the four runs had two errors, and the fourth one had one error.

@mauritsvanrees
Copy link
Sponsor Member

With four parellel Python 2 runs, I get a few errors like this:

Error in test test_documentation_registry_update (plone.restapi.tests.test_documentation.TestDocumentation)
Traceback (most recent call last):
  File "/anaconda2/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/Users/maurits/community/plone-coredev/5.2/src/plone.restapi/src/plone/restapi/tests/test_documentation.py", line 524, in test_documentation_registry_update
    json={'plone.app.querystring.field.path.title': 'Value'})
  File "/Users/maurits/shared-eggs/cp27m/requests-2.21.0-py2.7.egg/requests/sessions.py", line 605, in patch
    return self.request('PATCH', url, data=data, **kwargs)
  File "/Users/maurits/community/plone-coredev/5.2/src/plone.restapi/src/plone/restapi/testing.py", line 359, in request
    return super(RelativeSession, self).request(method, url, **kwargs)
  File "/Users/maurits/shared-eggs/cp27m/requests-2.21.0-py2.7.egg/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/maurits/shared-eggs/cp27m/requests-2.21.0-py2.7.egg/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/Users/maurits/shared-eggs/cp27m/requests-2.21.0-py2.7.egg/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', BadStatusLine("''",))

Halfway through, two of the four runs have this error once. It stays that way at the end of the four runs.

@Rotonen
Copy link
Contributor

Rotonen commented Dec 27, 2018

A shot in the dark: something is not closing the file descriptors of the sockets being opened via os.close() and a ulimit is hit?

@Rotonen
Copy link
Contributor

Rotonen commented Dec 27, 2018

Also plone.testing still listens to the environment variables and those take precedence, so falling back to those is possible without any code changes.

@mauritsvanrees
Copy link
Sponsor Member

On Python 2 I sometimes get the same error also in a single run.

@tisto
Copy link
Sponsor Member

tisto commented Dec 27, 2018

@mauritsvanrees we need to set a fixed port in order to being able to test the full output path. Since we test against files, we can't parameterize this easily. I stopped doing port allocations in tests on CI long time ago because it was a constant source of troubles. Running each test in its own container/vm is a much more scalable approach IMHO.

@Rotonen
Copy link
Contributor

Rotonen commented Dec 27, 2018

As localhost can be up to a /8, we could also just use something like dnsmasq to conveniently resolve localhost and vary the IP? But then the server setup complexity grows again.

One serious option to consider would be to roll AWS AMI images for the dependencies of a test run and run each build on a fresh server now that they support per second billing (one minute minimum charge). That'd allow for pretty quick parallel builds too - and no queuing.

@Rotonen
Copy link
Contributor

Rotonen commented Dec 27, 2018

@mauritsvanrees are these happening on the ZServer layer or on the WSGI layer? So far the WSGI layer has been rather troublesome in regards to isolation.

Ref. plone/plone.testing#58

@mauritsvanrees
Copy link
Sponsor Member

@Rotonen These are in tests that use the ZServer layer:

PLONE_RESTAPI_DX_FUNCTIONAL_TESTING_FREEZETIME = FunctionalTesting(
bases=(FREEZE_TIME_FIXTURE,
PLONE_RESTAPI_DX_FIXTURE,
z2.ZSERVER_FIXTURE),
name="PloneRestApiDXLayerFreeze:Functional"
)

@mauritsvanrees
Copy link
Sponsor Member

@tisto wrote:

@mauritsvanrees we need to set a fixed port in order to being able to test the full output path.

Those tests don't actually fail. I think you are referring to

if [ "$PLONE_VERSION" = "4.3.x" ] || [ "$PLONE_VERSION" = "5.0.x" ] || [ "$PLONE_VERSION" = "5.2.x" ]; then
# request/response dumps have known differences for Plone 5
# => skip, we can't have the Plone 5 build fail because of those
echo "Skipping checks for undocumented changes for Plone 4 and 5.0.x"
exit 0
fi
which is not run on Jenkins, only on Travis. So the uncommitted changes after a test run are not the problem.

@mauritsvanrees
Copy link
Sponsor Member

For the unclosed sockets, there is actually already issue #636.

mauritsvanrees added a commit that referenced this issue Dec 27, 2018
This prevents lots of ResourceWarnings about unclosed sockets.
Fixes #636 and #648.
@mauritsvanrees
Copy link
Sponsor Member

I have created a PR: #651

mister-roboto pushed a commit to plone/buildout.coredev that referenced this issue Dec 27, 2018
Branch: refs/heads/master
Date: 2018-12-27T16:58:44+01:00
Author: Maurits van Rees (mauritsvanrees) <maurits@vanrees.org>
Commit: plone/plone.restapi@7e57735

Close the api_session in tests.

This prevents lots of ResourceWarnings about unclosed sockets.
Fixes plone/plone.restapi#636 and plone/plone.restapi#648.

Files changed:
M CHANGES.rst
M src/plone/restapi/tests/test_documentation.py
Repository: plone.restapi

Branch: refs/heads/master
Date: 2018-12-28T00:01:34+01:00
Author: Gil Forcada Codinachs (gforcada) <gil.gnome@gmail.com>
Commit: plone/plone.restapi@fbd357b

Merge pull request #651 from plone/close-api-session

Close the api_session in tests.

Files changed:
M CHANGES.rst
M src/plone/restapi/tests/test_documentation.py
@davisagli
Copy link
Sponsor Member

I've been assuming that the leaked sockets are from some bug in the shutdown of webtest.http.StopableWSGIServer which is used by the WSGI_SERVER fixture (which is confusingly what you now get when you import plone.testing.z2.ZSERVER_FIXTURE). But I haven't looked into it enough to have actual evidence. Possibly something similar to zopefoundation/zope.server#14, since StopableWSGIServer extends waitress.TcpWSGIServer I believe waitress grew out of zope.server. But again, I haven't taken a close look.

@mauritsvanrees
Copy link
Sponsor Member

It should be less of a problem now that PR #651 and #652 have been merged.
But we still sometimes have a ConnectionError, for example here on Python 3.6.

mauritsvanrees added a commit that referenced this issue Dec 31, 2018
On Jenkins we often get one ConnectionError in a seemingly random test.
Happens on both Python 2 and 3.
Retrying after a short pause helps.
Fixes #648
@mauritsvanrees
Copy link
Sponsor Member

I added a pdb, and tried a few parallel test runs. When getting a ConnectionError, I could see that the port was still open, and I could view the Plone site in a browser just fine.
Retrying the same request after a short pause, helps. Perhaps not the nicest solution, but if it gets Jenkins green, I am happy. See PR #654.

mauritsvanrees added a commit that referenced this issue Dec 31, 2018
On Jenkins we often get one ConnectionError in a seemingly random test.
Happens on both Python 2 and 3.
Retrying after a short pause helps.
Fixes #648
mister-roboto pushed a commit to plone/buildout.coredev that referenced this issue Dec 31, 2018
Branch: refs/heads/master
Date: 2018-12-31T13:08:28+01:00
Author: Maurits van Rees (mauritsvanrees) <maurits@vanrees.org>
Commit: plone/plone.restapi@e6b601c

Tests: retry request on ConnectionError.

On Jenkins we often get one ConnectionError in a seemingly random test.
Happens on both Python 2 and 3.
Retrying after a short pause helps.
Fixes plone/plone.restapi#648

Files changed:
M CHANGES.rst
M src/plone/restapi/testing.py
Repository: plone.restapi

Branch: refs/heads/master
Date: 2018-12-31T19:03:25+01:00
Author: Gil Forcada Codinachs (gforcada) <gil.gnome@gmail.com>
Commit: plone/plone.restapi@e8a9f4f

Merge pull request #654 from plone/issue-648-retry-on-connectionerror

Tests: retry request on ConnectionError.

Files changed:
M CHANGES.rst
M src/plone/restapi/testing.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants