Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could not restore from backend restart #615

Closed
Thinkfly opened this issue Jun 15, 2016 · 9 comments
Closed

could not restore from backend restart #615

Thinkfly opened this issue Jun 15, 2016 · 9 comments

Comments

@Thinkfly
Copy link

Thinkfly commented Jun 15, 2016

i use nghttp 1.11.1 for grpc proxy.sometimes when i restart backend,nghttp could not auto restore. i must restart nghttp, it will be ok. how shoud i config the nghttp to solve this?thx.

my config like this:
backend=127.0.0.1,23100;/test.TestService/;proto=h2;no-tls;fall=1;rise=1
backend=192.168.0.18,23100;/test.TestService/;proto=h2;no-tls;fall=1;rise=1

the log is below:

15/Jun/2016:13:01:40 +0800 PID21693 [INFO] shrpx_client_handler.cc:853 [CLIENT_HANDLER:0x7f1ac1045000] Downstream address group_idx: 5
15/Jun/2016:13:01:40 +0800 PID21693 [INFO] shrpx_client_handler.cc:882 [CLIENT_HANDLER:0x7f1ac1045000] No working downstream address found
15/Jun/2016:13:01:40 +0800 PID21693 [INFO] shrpx_downstream.cc:542 [DOWNSTREAM:0x7f1ac10b0300] dconn_ is NULL
15/Jun/2016:13:01:40 +0800 PID21693 [INFO] shrpx_http2_upstream.cc:62 [UPSTREAM:0x7f1ac1049140] Stream stream_id=2741 is being closed
15/Jun/2016:13:01:40 +0800 PID21693 [INFO] shrpx_downstream.cc:160 [DOWNSTREAM:0x7f1ac10b0300] Deleting
15/Jun/2016:13:01:40 +0800 PID21693 [INFO] shrpx_downstream.cc:190 [DOWNSTREAM:0x7f1ac10b0300] Deleted

@tatsuhiro-t
Copy link
Member

Thank you for reporting this issue.
Fix committed via cddb411

@Thinkfly
Copy link
Author

Thinkfly commented Jun 17, 2016

@tatsuhiro-t ,thank you for help.but I tried commit cddb411 and i have 2 backend, when i restart all of backend one by one, i still get 503 ,is it "fall=1;rise=1" is a best practice? and how can i config nghttpx for backend HA.

@tatsuhiro-t
Copy link
Member

It works for me. I tested with 2 nghttpd as backend. Could you tell us the exact reproduction step?

fall/rise is recent addition, and I'm not sure which is the best practice. haproxy has similar feature, so perhaps, we can use their BCP?

The multiple --backend option is the answer for HA. I'm wondering why your case does not work.

@Thinkfly
Copy link
Author

I try restart backend one by one and repeat it, when restart the second round, it will happen.but now i try restart by delay a little of seconds , it's not happen again.Maybe i restart too faster before.Thx.

@tatsuhiro-t
Copy link
Member

If there is a moment that all backend servers are down, and a request just comes at that particular moment, nghttpx may return 503 since there is no working servers. Note that nghttpx takes some time to detect that backend server gets back to online.

@Thinkfly
Copy link
Author

How long it takes to detect the backend?

@tatsuhiro-t
Copy link
Member

nghttpx uses exponential backoff, and if it reached maximum (failed to connect a backend server 10 times in row), the interval of health check is ~130 seconds. I'm fine to add a new configuration to cap the maximum health check interval.

@tatsuhiro-t
Copy link
Member

Added --backend-max-backoff option:

  --backend-max-backoff=<DURATION>
              Specify  maximum backoff  interval.  This  is used  when
              doing health  check against offline backend  (see "fail"
              parameter  in --backend  option).   It is  also used  to
              limit  the  maximum   interval  to  temporarily  disable
              backend  when nghttpx  failed to  connect to  it.  These
              intervals are calculated  using exponential backoff, and
              consecutive failed attempts increase the interval.  This
              option caps its maximum value.
              Default: 2m

@tatsuhiro-t
Copy link
Member

Closing since originator reported the issue was fixed.
And now we offer the maximum timeout to change the health check interval, which makes this issue less happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants