Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

rpc_mclient tests fail occasionally #646

Open
kmaehashi opened this Issue · 7 comments

2 participants

@kmaehashi
Owner

In CI environment, about 1% of test failures are constantly detected while testing ./configure option variations. It almost always fails in rpc_mclient unit test:

http://ci.jubat.us/job/develop_configure/723/testReport/%28root%29/configure/__enable_debug___enable_zookeeper___enable_re2___enable_mecab/#footer

Graph
(Red bar indicates number of test failures)

@suma
Owner

Maybe we should move rpc_client_test to jubatest (such as client_test) from waf unittest.

@kmaehashi kmaehashi self-assigned this
@kmaehashi
Owner

I'm not sure if this is a bug or not; I'll heat-run this test in my local env and see what happens.

@kmaehashi kmaehashi modified the milestone: Near Future, 0.5.3
@kmaehashi
Owner

I tested this on my local environment:

build/jubatus/server/common/mprpc/rpc_client_test --gtest_filter=rpc_mclient.small --gtest_repeat=-1

and got this:

Repeating all tests (iteration 22) . . .

Note: Google Test filter = rpc_mclient.small
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from rpc_mclient
[ RUN      ] rpc_mclient.small
thread terminated with throwing an instance of 'mp::pthread_error'
  what():  failed to lock pthread mutex: Unknown error -22
terminate called after throwing an instance of 'mp::pthread_error'
  what():  failed to lock pthread mutex: Unknown error -22
zsh: abort (core dumped)  build/jubatus/server/common/mprpc/rpc_client_test  --gtest_repeat=-1

Still investigating on it.

@kmaehashi
Owner

Other variations I saw:

terminate called after throwing an instance of 'msgpack::rpc::system_error'
  what():  Connection reset by peer
terminate called after throwing an instance of 'std::runtime_error'
  what():  failed to resolve host name

(using "127.0.0.1" instead of "localhost" in test codes seems to work for the latter case)

@kmaehashi
Owner

I found that 3 file desciptor leak (2 epoll and 1 eventfd) when close() method of jubatus::server::common::mprpc::rpc_server is called while client is connected to the server. I also confirmed that destructor of mpio kernel does not run in this case. Seems like a bug in mpio, but more investigation is needed.

As for Jubatus usecase, this issue is NOT fatal, as we only call close() method when shutting down the process.

However, I'm still not clear why this happens in unit test programs; unit test processes are reinvoked every time, so fd leak should not happen.
To see what actually happening in CI server, I proposed to record stderr in waf tests (#726).

@kmaehashi
Owner

Discussion from the meeting on 2014-03-25:

  • It seems that the fd leak problem is not related to the test failure.
    • Apply #726 and see the actual error message in CI environment.
    • CI environment is not configured as sysctl -w net.ipv4.tcp_tw_reuse=1. This may be a root cause.
  • Raise another issue on jubatus-mpio for the leak problem.

I've changed milestone of this issue to Pending (until it reproduces in CI environment).

@kmaehashi kmaehashi modified the milestone: Pending, 0.5.3
@kmaehashi
Owner

It reproduced in the CI environment: http://ci.jubat.us/job/develop_configure/860/testReport/junit/%28root%29/configure/__enable_debug___disable_eigen___enable_ux/

The stderr was as follows:

terminate called after throwing an instance of 'mp::system_error'
  what():  bind failed: Address already in use

I think this indicates that not setting sysctl -w net.ipv4.tcp_tw_reuse=1 on CI server is the cause of these test failures.

I set sysctl -w net.ipv4.tcp_tw_reuse=1 on 2014-04-22, at 17:38. I'll close this issue once we confirm that these test failures disappear in CI environment (monitoring about 1 week may be enough).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.