Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error on accepting massive number of connection with grpc c++ server #7985

Closed
umegaya opened this issue Sep 5, 2016 · 2 comments
Closed

Comments

@umegaya
Copy link

umegaya commented Sep 5, 2016

hi, all.
I try to use gRPC C++ server as front end, that means it may accept huge number of connection in short time.
so now I test its ability to process incoming connection. problem is, if connection number increases, server seems to drop connection (seems no accept failure)

detail description of environment:
grpc: v1.0.0 branch
host: OSX El Capitan (10.11.6), 16GB Mem, Intel Core i7 2.6GHz (8 core)
server: Sync mode gRPC C++ running in ubuntu 16.10 based container (just copy server binary) with docker 1.12.0
client: gRPC nodeJS v0.12.7 running in host machine, all client run on same nodejs process, with node-fibers

each client has one instance of service object, which is created by grpc.load(proto_file).package_name.service_name(); and send echo request to server 1000 times, with ssl enabled.

if number of connection < 100, it seems no problem. but above 200 clients start to show error like following.

E0905 13:26:29.887599000 123145305219072 handshake.c:128] Security handshake failed: {"created":"@1473049589.887582000","description":"Handshake read failed","file":"../src/core/lib/security/transport/handshake.c","file_line":237,"referenced_errors":[{"created":"@1473049589.887580000","description":"FD shutdown","file":"../src/core/lib/iomgr/ev_poll_posix.c","file_line":427}]}

at the same time server reports like this:

E0905 04:26:29.998263585     491 handshake.c:128]            Security handshake failed: {"created":"@1473049589.998250914","description":"Handshake read failed","file":"src/core/lib/security/transport/handshake.c","file_line":237,"referenced_errors":[{"created":"@1473049589.988649722","description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235}]}

increasing number of client, around 500, almost half of clients terminate with error like above.
I try to change net.core.somaxconn to 1024 or 2048 (in container), because it seems to change s_max_accept_queue_size also, but no help.
I also try to run server on host environment, but result is same.

it may be a problem of my host's system setting, but currently I have no idea. does anyone have some hint for this problem?
regards,

@umegaya
Copy link
Author

umegaya commented Sep 5, 2016

EDIT: with further investigation, I found following fact:

  1. without ssl, also connection sometimes drops with error code == 14 (GRPC_STATUS_UNAVAILABLE?)
  2. GRPC_STATUS_UNAVAILABLE can be retry-able both with and without ssl. at last client can be connected to server and send RPC correctly.

so actual production environment above error should be harmless with some retrying.
then, this is estimated behavior?

@vjpai
Copy link
Member

vjpai commented Sep 6, 2016

By default, the ulimit -n (max FDs) value on the Mac is quite low. What is yours set to? Can you make that as large as the system will accept (I think 2048) and then see what happens? Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants