Permalink
Browse files

Avoid multiple "Monitor_connection_failure" dialogs

Summary:
If we're not connected but the lockfile is present, then we'll try once a tick to reconnect. Normally this works in pretty short order. But sometimes (for reasons I don't know) the monitor gets stuck in a state where it just won't accept connections. The comments in serverMonitor suggest this is due to "DDOS" but I bet there are other circumstances.

Anyway, let's take that behavior as given.

There were two terrible user-consequences we observed:

(1) It pops up a dialog reporting the error message "hh_server: Monitor_connection_failed" as desired. A second later, it attempts to connect again, and dismisses the dialog, and then pops up another dialog with the same information. And so on for ever.

(2) Actually, due to a race condition, it didn't always dismiss the dialog. It only dismissed every other dialog. And so the dialogs kept piling up.

This diff fixes both diffs.

Bug (2) was due to the following race condition:

1. Imagine the previous error dialog was up; its handler's action (for when the user closes the dialog) is to clear the dialog flag in the Lost_server state
2. We call "do_lost_server". This calls "dismiss_ui" which sends a $/cancelRequest also clears the "dialog" field of Lost_env. Next it puts up an all new dialog in the "dialog" field, with its all new handler.
3. Nuclide handles $/cancelRequest and sends a response back to the server. This invokes the old handler. But the old handler's behavior was to clear out the "dialog" field, and even though that flag now belonged to the new dialog not the old one.

This diff fixes the bug by making a handler only clear out the "dialog" field if it's the right one.

The fix is actually implementation-specific, resting on the behavior of physical inequality (!=) of non-mutable things. It would be nice to write a proper fix. But I can't be bothered because it works, and because the fix for bug (1) renders this path unreachable.

Bug (1) is addressed as explained by the following comment:

   (* If we already display a progress indicator, and call do_lost_server     *)
   (* to display a progress indicator, then we won't do anything. Likewise,   *)
   (* if we already display an error dialog with ANY TEXT and Restart button, *)
   (* and call do_lost_server to display any other text in the error dialog,  *)
   (* it makes for a nicer UI to simply leave the old text up.                *)

Reviewed By: arxanas

Differential Revision: D6804293

fbshipit-source-id: 662294f06f9b784349560b02d15903268ab10c38
  • Loading branch information...
ljw1004 authored and hhvm-bot committed Jan 26, 2018
1 parent 76ad997 commit be1887a17e111992c679dcb2189c6e097cccfcd5
Showing with 24 additions and 2 deletions.
  1. +24 −2 hphp/hack/src/client/clientLsp.ml
@@ -1263,6 +1263,9 @@ let rec connect (state: state) : state =
(* running, or there's some other reason why the connection was refused *)
(* or timed-out and no lockfile is present; (2) the server was dormant *)
(* and had already received too many pending connection requests. *)
(* Exit_with Monitor_connection_failure: raised when the lockfile is *)
(* present but connection-attempt to the monitor times out - maybe it's *)
(* under DDOS, or maybe it's declining to answer new connections. *)
let stack = Printexc.get_backtrace () in
let (code, message, _data) = Lsp_fmt.get_error_info e in
let longMessage = Printf.sprintf "connect failed: %s [%i]\n%s" message code stack in
@@ -1326,8 +1329,16 @@ and do_lost_server (state: state) ?(allow_immediate_reconnect = true) (p: Lost_e
let open Lost_env in
let no_op = match p.explanation, state with
| Wait_required _, Lost_server { p = { explanation = Wait_required _; _ }; _ } -> true
| Wait_required _, Lost_server { progress; _ }
when progress <> Progress.None -> true
| Action_required _, Lost_server { actionRequired; _ }
when actionRequired <> ActionRequired.None -> true
| _ -> false in
(* If we already display a progress indicator, and call do_lost_server *)
(* to display a progress indicator, then we won't do anything. Likewise, *)
(* if we already display an error dialog with ANY TEXT and Restart button, *)
(* and call do_lost_server to display any other text in the error dialog, *)
(* it makes for a nicer UI to simply leave the old text up. *)
if no_op then state else
let state = dismiss_ui state in
@@ -1342,9 +1353,19 @@ and do_lost_server (state: state) ?(allow_immediate_reconnect = true) (p: Lost_e
in
(* These helper functions are for the dialog *)
let dialog_ref : ShowMessageRequest.t ref = ref ShowMessageRequest.None in
let clear_dialog_flag (state: state) : state =
match state with
| Lost_server lenv -> Lost_server { lenv with dialog = ShowMessageRequest.None; }
| Lost_server lenv ->
(* TODO(ljw): The following != test is "implementation-specific". *)
(* Goal is so that if we had one dialog up, then dismiss_ui it which *)
(* sends $/cancelRequest, then put up another dialog, then the editor *)
(* sends back a RequestCancelled error in response to the first dialog,*)
(* we don't want to clear the flag that's now there for the second *)
(* dialog. This != test achieves that. But ocaml only guarantees *)
(* behavior of != on mutable things, which ours is not... *)
if lenv.dialog != !dialog_ref then Lost_server lenv
else Lost_server { lenv with dialog = ShowMessageRequest.None; }
| _ -> state
in
let handle_error ~code:_ ~message:_ ~data:_ state =
@@ -1390,6 +1411,7 @@ and do_lost_server (state: state) ?(allow_immediate_reconnect = true) (p: Lost_e
MessageType.ErrorMessage msg ["Restart"] in
Progress.None, actionRequired, dialog
in
dialog_ref := dialog;
Lost_server { Lost_env.
p;
uris_with_unsaved_changes;

0 comments on commit be1887a

Please sign in to comment.