Permalink
Browse files

ClientConnect retries should compare with actual elapsed time

Summary:
"retries" is supposed to roughly represend the number of seconds hh_client will
reattempt connections to a broken/busy/missing server.

In many scenarios, it doesn't do this at all.

For example, consider SMUtils.Monitor_socket_not_ready, which will most
commonly be triggered with
"hh_server -d my_repo && hh_client my_repo"

Since daemonizing finishes very quickly (and many hundreds of milliseconds
before the actual daemonized server starts listening on the socket), the
hh_client will get Monitor_socket_not_ready as long as the Monitor is
running but hasn't yet started listening in on the socket.

In this case, ClientConnect will retry "connect" immediately and rapidly
consume all "retries" within milliseconds and fail instead of retrying
for up to #"retries" seconds.

This fixes that.

We switch to using real elapsed time so "retries" really resembles
number of seconds we will retry for.

Add a short "sleepf 0.1" between some retries so we don't peg the
CPU at 100%. (note: a number of the recursive retries already have a time
delay built in, such as with "Unix.select" so don't need a sleep).

Reviewed By: ljw1004

Differential Revision: D6983696
  • Loading branch information...
alexchow authored and fredemmott committed Feb 14, 2018
1 parent 1ec4511 commit 65a0e9da47f7e0e7248c26af5dfbe98eb96c2d9f
Showing with 10 additions and 6 deletions.
  1. +10 −6 hphp/hack/src/client/clientConnect.ml
@@ -139,8 +139,9 @@ let rec wait_for_server_hello
~progress_callback
~start_time
~tail_env =
let elapsed_t = int_of_float (Unix.time () -. start_time) in
match retries with
| Some n when n < 0 ->
| Some n when elapsed_t > n ->
(if Option.is_some tail_env then
Printf.eprintf "\nError: Ran out of retries, giving up!\n");
raise Exit_status.(Exit_with Out_of_retries)
@@ -153,7 +154,7 @@ let rec wait_for_server_hello
(fun t -> print_wait_msg progress_callback start_time t);
wait_for_server_hello
~ic
~retries:(Option.map retries (fun x -> x - 1))
~retries
~progress_callback
~start_time
~tail_env
@@ -167,7 +168,7 @@ let rec wait_for_server_hello
| _ ->
Option.iter tail_env
(fun t -> print_wait_msg progress_callback start_time t);
wait_for_server_hello ic (Option.map retries (fun x -> x - 1))
wait_for_server_hello ic retries
progress_callback start_time tail_env
)
with
@@ -176,8 +177,9 @@ let rec wait_for_server_hello
raise Server_hung_up
let rec connect ?(first_attempt=false) env retries start_time tail_env =
let elapsed_t = int_of_float (Unix.time () -. start_time) in
match retries with
| Some n when n < 0 ->
| Some n when elapsed_t > n ->
Printf.eprintf "\nError: Ran out of retries, giving up!\n";
raise Exit_status.(Exit_with Out_of_retries)
| Some _
@@ -232,7 +234,8 @@ let rec connect ?(first_attempt=false) env retries start_time tail_env =
match e with
| SMUtils.Server_died
| SMUtils.Monitor_connection_failure ->
connect env (Option.map retries (fun x -> x - 1)) start_time tail_env
Unix.sleepf 0.1;
connect env retries start_time tail_env
| SMUtils.Server_missing ->
if env.autostart then begin
ClientStart.start_server { ClientStart.
@@ -263,7 +266,8 @@ let rec connect ?(first_attempt=false) env retries start_time tail_env =
let _, tail_msg = open_and_get_tail_msg start_time tail_env in
env.progress_callback (Some tail_msg);
HackEventLogger.client_connect_once_busy start_time;
connect env (Option.map retries pred) start_time tail_env
Unix.sleepf 0.1;
connect env retries start_time tail_env
| SMUtils.Monitor_establish_connection_timeout ->
(** This should only happen if the Monitor is being DDOSed or has
* wedged itself. To ameliorate inadvertent self DDOSing by hh_clients,

0 comments on commit 65a0e9d

Please sign in to comment.