Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting a standalone Elixir runtime fails on Windows 10 #194

Closed
juhalehtonen opened this issue Apr 15, 2021 · 23 comments · Fixed by #201
Closed

Starting a standalone Elixir runtime fails on Windows 10 #194

juhalehtonen opened this issue Apr 15, 2021 · 23 comments · Fixed by #201
Labels
bug Something isn't working

Comments

@juhalehtonen
Copy link
Contributor

juhalehtonen commented Apr 15, 2021

As stated in the title, starting a standalone Elixir runtime / node seems to fail on Windows 10. When attempting to click on the "Connect" button in the Liveview UI, the following error comes up in the terminal:

iex(livebook_j35nq3mw@2080-OMG)1> [error] GenServer #PID<0.461.0> terminating
** (MatchError) no match of right hand side value: {:error, "Elixir process terminated unexpectedly"}
    (livebook 0.1.0) lib/livebook_web/live/session_live/elixir_standalone_live.ex:42: LivebookWeb.SessionLive.ElixirStandaloneLive.handle_event/3
    (phoenix_live_view 0.15.4) lib/phoenix_live_view/channel.ex:338: anonymous fn/3 in Phoenix.LiveView.Channel.view_handle_event/3
    (telemetry 0.4.2) c:/Users/juha/projects/elixir/livebook/deps/telemetry/src/telemetry.erl:262: :telemetry.span/3
    (phoenix_live_view 0.15.4) lib/phoenix_live_view/channel.ex:203: Phoenix.LiveView.Channel.handle_info/2
    (stdlib 3.8) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib 3.8) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib 3.8) proc_lib.erl:259: :proc_lib.wake_up/3
Last message: %Phoenix.Socket.Message{event: "event", join_ref: "32", payload: %{"event" => "init", "type" => "click", "value" => %{"value" => ""}}, ref: "34", topic: "lv:elixir_standalone_runtime"}
State: %{components: {%{}, %{}, 1}, join_ref: "32", serializer: Phoenix.Socket.V2.JSONSerializer, socket: #Phoenix.LiveView.Socket<assigns: %{current_runtime: nil, flash: %{}, live_action: nil, output: nil, session_id: "iamnw6urrqdqeugvmnbby44m3rvl2rr6"}, changed: %{}, endpoint: LivebookWeb.Endpoint, id: "elixir_standalone_runtime", parent_pid: #PID<0.456.0>, root_pid: #PID<0.456.0>, router: LivebookWeb.Router, view: LivebookWeb.SessionLive.ElixirStandaloneLive, ...>, topic: "lv:elixir_standalone_runtime", transport_pid: #PID<0.449.0>, upload_names: %{}, upload_pids: %{}}

The same behaviour can be encountered when calling Livebook.Runtime.ElixirStandalone.init() directly via iex. The process managing the startup of the node seems to immediately return with :DOWN, so something seems to be going wrong in the startup there.

What does work for me on my Windows 10 system is manually starting an "attached node" via iex --sname test.

For the sake of further debugging, I've called IO.inspect([elixir_path: elixir_path, node_name: node_name, eval: eval]) within the Livebook.Runtime.ElixirStandalone.start_elixir_node function when clicking the connect button. The results, as seen in the Windows 10 command prompt, are as follows:

[
  elixir_path: "c:/Program Files (x86)/Elixir/bin/elixir.bat",
  node_name: :"livebook_runtime_3ouxkwak@2OMG",
  eval: "(\n  init_ref = make_ref()\n  parent_process = {:livebook_parent_process_name_ek5c2dts, :\"livebook_j35nq3mw@2080-OMG\"}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n      manager_ref = Process.monitor(Livebook.Runtime.ErlDist.Manager)\n      receive do\n        {:DOWN, ^manager_ref, :process, _object, _reason} ->\n          :ok\n      end\n  after\n    10000 ->\n      :timeout\n  end\n)"
]

Perhaps the issue is caused by a Windows security policy that prevents certain things from being run here? I don't normally use Windows as my development environment, so I might be way off with my reasoning here.

@josevalim
Copy link
Contributor

@juhalehtonen thanks for reporting! What happens if you do this inside IEx on Windows?

iex> System.cmd(System.find_executable("elixir"), ["-e", "IO.inspect :HELLO"])

@juhalehtonen
Copy link
Contributor Author

What happens if you do this inside IEx on Windows?

iex> System.cmd(System.find_executable("elixir"), ["-e", "IO.inspect :HELLO"])
C:\Users\juha>iex
Interactive Elixir (1.11.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> System.cmd(System.find_executable("elixir"), ["-e", "IO.inspect :HELLO"])
{":HELLO\n", 0}

@juhalehtonen
Copy link
Contributor Author

Just for sake of sanity, I tested the same scenario under WSL2. Starting the standalone runtime worked perfectly there.

@josevalim
Copy link
Contributor

josevalim commented Apr 15, 2021

@juhalehtonen what about this?

iex> eval = "(\n  init_ref = make_ref()\n  parent_process = {:livebook_parent_process_name_ek5c2dts, :\"livebook_j35nq3mw@2080-OMG\"}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n      manager_ref = Process.monitor(Livebook.Runtime.ErlDist.Manager)\n      receive do\n        {:DOWN, ^manager_ref, :process, _object, _reason} ->\n          :ok\n      end\n  after\n    10000 ->\n      :timeout\n  end\n)"
iex> System.cmd(System.find_executable("elixir"), ["-e", eval])

If that also works, can you please put an IO.inspect here? https://github.com/elixir-nx/livebook/blob/main/lib/livebook/runtime/elixir_standalone.ex#L56 - then copy the args and pass that to System.cmd?

@juhalehtonen
Copy link
Contributor Author

@juhalehtonen what about this?

This seems to return just:

iex(livebook_zghjge7f@2080-OMG)3> eval = "(\n  init_ref = make_ref()\n  parent_process = {:livebook_parent_process_name_ek5c2dts, :\"livebook_j35nq3mw@2080-OMG\"}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n      manager_ref = Process.monitor(Livebook.Runtime.ErlDist.Manager)\n      receive do\n        {:DOWN, ^manager_ref, :process, _object, _reason} ->\n          :ok\n      end\n  after\n    10000 ->\n      :timeout\n  end\n)"
"(\n  init_ref = make_ref()\n  parent_process = {:livebook_parent_process_name_ek5c2dts, :\"livebook_j35nq3mw@2080-OMG\"}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n      manager_ref = Process.monitor(Livebook.Runtime.ErlDist.Manager)\n      receive do\n        {:DOWN, ^manager_ref, :process, _object, _reason} ->\n          :ok\n      end\n  after\n    10000 ->\n      :timeout\n  end\n)"
iex(livebook_zghjge7f@2080-OMG)4> System.cmd(System.find_executable("elixir"), ["-e", eval])
{"", 0}

can you please put an IO.inspect here

The existing Port.open does seem to return a Port when piped through IO.inspect:

iex(livebook_zghjge7f@2080-OMG)10> #Port<0.24>
iex(livebook_zghjge7f@2080-OMG)10> [error] GenServer #PID<0.701.0> terminating
** (MatchError) no match of right hand side value: {:error, "Elixir process terminated unexpectedly"}

then copy the args and pass that to System.cmd

What I did here was:

System.cmd(System.find_executable("elixir"), elixir_flags(node_name) ++ ["-e", eval]) |> IO.inspect()

Assuming I got the call right, the System.cmd variant seems to be returning the following:

iex(livebook_zghjge7f@2080-OMG)11> {"", 255}
iex(livebook_zghjge7f@2080-OMG)11> [error] GenServer #PID<0.761.0> terminating
** (ArgumentError) argument error
    :erlang.monitor(:port, {"", 255})
    (livebook 0.1.0) lib/livebook/runtime/standalone_init.ex:94: Livebook.Runtime.StandaloneInit.parent_init_sequence/3

@josevalim
Copy link
Contributor

Thanks a lot!

So if it is not the eval, one of the elixir_flags(node_name) is causing it to fail. Can you inspect the flags in particular? Maybe this:

System.cmd(System.find_executable("elixir"), IO.inspect(elixir_flags(node_name) ++ ["-e", eval]))

And then I would try calling the executable multiple times, removing the flags, until we find the problematic one. :)

@juhalehtonen
Copy link
Contributor Author

Thanks José! I tried fiddling with the different flags for a while, but no change in the flags passed seemed to make a difference. I'll get back to this more properly once I find the time, as Windows isn't quite my go-to :)

I started to wonder if this has something to do with the Windows-related issues brought up in https://elixirforum.com/t/port-open-eacces-on-windows/5442 , although they don't quite seem to match either.

@josevalim
Copy link
Contributor

@juhalehtonen thank you for the help so far!

I tried fiddling with the different flags for a while, but no change in the flags passed seemed to make a difference

Do you mean they all succeed? FWIW, System.cmd simply calls spawn_executable too, so I don't think the issue is with ports... but maybe we are missing some flags.

@josevalim
Copy link
Contributor

@juhalehtonen can you try removing the nouse_stdio flag and see if you have better luck?

@josevalim josevalim added the bug Something isn't working label Apr 15, 2021
@juhalehtonen
Copy link
Contributor Author

can you try removing the nouse_stdio flag and see if you have better luck?

I tried removing the nouse_stdio flag, but that didn't make a difference.

I also went through Livebook.Runtime.StandaloneInit.elixir_flags/1 and changed/removed flags one at a time while recompiling in between, but no luck - the original issue persists.

I also tried to use the cmd.exe /c path syntax as shown on a couple of Elixirforum threads (such as the one I linked), but didn't find a winning combination through that either.

@josevalim
Copy link
Contributor

Thanks for the report! This is crazy. :D Just to be clear, can you please try this:

    Port.open({:spawn_executable, elixir_path}, [
      :binary,
      :nouse_stdio,
      :hide,
      args: elixir_flags(node_name) ++ ["--eval", eval]
    ] |> IO.inspect(limit: :infinity))

That will inspect all options. Then go to iex and call the same as above, does it work?

And then if it works, please call Process.monitor(port) and then flush() on IEx. Thank you!

If you want to do a pairing session, let me know and I am definitely game. :)

@juhalehtonen
Copy link
Contributor Author

I replaced Livebook.Runtime.ElixirStandalone.start_elixir_node/3 with yours:

  defp start_elixir_node(elixir_path, node_name, eval) do
    # Here we create a port to start the system process in a non-blocking way.
    Port.open({:spawn_executable, elixir_path}, [
      :binary,
      :nouse_stdio,
      :hide,
      args: elixir_flags(node_name) ++ ["--eval", eval]
    ] |> IO.inspect(limit: :infinity))
  end

and then tried starting the runtime again. Note that here I had reverted any previous changes, using the latest from master sans the function change above:

iex(livebook_lbxd4ah4@2080-OMG)5> Livebook.Runtime.ElixirStandalone.init()
[
  :binary,
  :nouse_stdio,
  :hide,
  {:args,
   ["--sname", "livebook_runtime_mz6gwvii@2080-OMG", "--erl",
    "+sbwt none +sbwtdcpu none +sbwtdio none -elixir ansi_enabled true",
    "--hidden", "--cookie", "OXMOCVLXFIYDLMVPVMYE", "--eval",
    "(\n  init_ref = make_ref()\n  parent_process = {:livebook_parent_process_name_fruxc6ha, :\"livebook_lbxd4ah4@2080-OMG\"}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n      manager_ref = Process.monitor(Livebook.Runtime.ErlDist.Manager)\n      receive do\n        {:DOWN, ^manager_ref, :process, _object, _reason} ->\n          :ok\n      end\n  after\n    10000 ->\n      :timeout\n  end\n)"]}
]
{:error, "Elixir process terminated unexpectedly"}

Removing the :nouse_stdio flag, for example, results in the same outcome (this time called via the UI):

iex(livebook_lbxd4ah4@2080-OMG)4> [
  :binary,
  :hide,
  {:args,
   ["--sname", "livebook_runtime_zhbwy2t5@2080-OMG", "--erl",
    "+sbwt none +sbwtdcpu none +sbwtdio none -elixir ansi_enabled true",
    "--hidden", "--cookie", "OXMOCVLXFIYDLMVPVMYE", "--eval",
    "(\n  init_ref = make_ref()\n  parent_process = {:livebook_parent_process_name_j2qly75k, :\"livebook_lbxd4ah4@2080-OMG\"}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n      manager_ref = Process.monitor(Livebook.Runtime.ErlDist.Manager)\n      receive do\n        {:DOWN, ^manager_ref, :process, _object, _reason} ->\n          :ok\n      end\n  after\n    10000 ->\n      :timeout\n  end\n)"]}
]
iex(livebook_lbxd4ah4@2080-OMG)4> 09:38:04.304 [error] GenServer #PID<0.518.0> terminating
** (MatchError) no match of right hand side value: {:error, "Elixir process terminated unexpectedly"}

There has to be something very Windows-specific that's being missed here. Probably something very obvious-in-hindsight too :).

If you want to do a pairing session, let me know and I am definitely game. :)

I'm open to having a look at this together as well.

@josevalim
Copy link
Contributor

I am sending you an email. :D

@josevalim
Copy link
Contributor

Here is a minimal example that isolates that invocation and we can run on IEx without Livebook:

$ iex --sname foo

then

Process.register(self(), :hello)

port = Port.open(
  {:spawn_executable, System.find_executable("elixir")},
  [
    :binary,
    :nouse_stdio,
    :hide,
    {:args,
     ["--sname", "prefix-#{node()}", "--erl",
      "+sbwt none +sbwtdcpu none +sbwtdio none -elixir ansi_enabled true",
      "--hidden", "--cookie", Atom.to_string(Node.get_cookie()), "--eval",
      "(\n  init_ref = make_ref()\n  parent_process = {:hello, :\"#{node()}\"}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n    :ok\n  after\n    10000 -> :timeout\n  end\n)"]}
  ]
)

Port.monitor(port)
flush()

josevalim added a commit that referenced this issue Apr 16, 2021
1. Do not use nouse_stdio as it causes slowdowns when IEx
   is also running

2. Reduce the amount of generated random atoms by using the
   child_node as the name of the parent process

3. Do not pass quoted strings to Windows to eval, use argv
   instead

Closes #194.
@josevalim
Copy link
Contributor

@juhalehtonen, thank you so much for the pairing session! ❤️ 🙌 I have sent #201 that should address the issue, can you please try it out?

We found out the issue was due to the differences on how Unix and Windows handle escaped strings. The solution I found was to pass the injected parameters as argv instead of "interpolating" them.

@juhalehtonen
Copy link
Contributor Author

Thanks José! Unfortunately the underlying issue still seems to persist:

# For sake of sanity, pointing out correct branch

C:\Users\juha\projects\elixir\livebook (jv-windows -> origin)
λ iex -S mix phx.server
Compiling 4 files (.ex)
[Livebook] Application running at http://localhost:8080/?token=ipfloeqraju4wapyji2m6rzz7bszp3o6
Interactive Elixir (1.11.4) - press Ctrl+C to exit (type h() ENTER for help)

# Triggered via UI

iex(livebook_kalmhskk@2080-OMG)1> 11:11:36.895 [error] GenServer #PID<0.521.0> terminating
** (MatchError) no match of right hand side value: {:error, "Elixir process terminated unexpectedly"}
    (livebook 0.1.0) lib/livebook_web/live/session_live/elixir_standalone_live.ex:42: LivebookWeb.SessionLive.ElixirStandaloneLive.handle_event/3
    (phoenix_live_view 0.15.4) lib/phoenix_live_view/channel.ex:338: anonymous fn/3 in Phoenix.LiveView.Channel.view_handle_event/3
    (telemetry 0.4.2) c:/Users/juha/projects/elixir/livebook/deps/telemetry/src/telemetry.erl:262: :telemetry.span/3
    (phoenix_live_view 0.15.4) lib/phoenix_live_view/channel.ex:203: Phoenix.LiveView.Channel.handle_info/2
    (stdlib 3.8) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib 3.8) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib 3.8) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: %Phoenix.Socket.Message{event: "event", join_ref: "18", payload: %{"event" => "init", "type" => "click", "value" => %{"value" => ""}}, ref: "19", topic: "lv:elixir_standalone_runtime"}

# Triggered via iex

iex(livebook_kalmhskk@2080-OMG)1> Livebook.Runtime.ElixirStandalone.init()
{:error, "Elixir process terminated unexpectedly"}

@josevalim
Copy link
Contributor

Oh noes, can you do another pairing session later on?

@josevalim
Copy link
Contributor

In any case, can you try inspecting System.argv() here?

https://github.com/elixir-nx/livebook/pull/201/files#diff-17dab55a55952f1aff5b2e07aee22b5da046af9c365ecaaf88cbbdf7be943f31R128

Then call Livebook.Runtime.ElixirStandalone.init() and flush(). :)

@josevalim
Copy link
Contributor

josevalim commented Apr 16, 2021

And here is an updated version that isolates it:

Process.register(self(), :"hello-#{node()}")

port = Port.open(
  {:spawn_executable, System.find_executable("elixir")},
  [
    :binary,
    :hide,
    {:args,
     ["--sname", "hello-#{node()}", "--erl",
      "+sbwt none +sbwtdcpu none +sbwtdio none -elixir ansi_enabled true",
      "--hidden", "--cookie", Atom.to_string(Node.get_cookie()), "--eval",
      "(\n  [parent_node] = System.argv()\n  init_ref = make_ref()\n  parent_process = {node(), String.to_atom(parent_node)}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n  receive do\n    {:node_initialized, ^init_ref} ->\n    :ok\n  after\n    10000 -> :timeout\n  end\n)",
      "--",
      Atom.to_string(node())]}
  ]
)

Port.monitor(port)
flush()

@juhalehtonen
Copy link
Contributor Author

And here is an updated version that isolates it:

Running the isolated code gives us :DOWN with a state of :noproc.

C:\Users\juha
λ iex --sname foobar
Interactive Elixir (1.11.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(foobar@2080-OMG)1> Process.register(self(), :"hello-#{node()}")
true
iex(foobar@2080-OMG)2> port = Port.open(
  {:spawn_exec...(foobar@2080-OMG)2> utable, System.find_executable("elixir")},
  [
...(foobar@2080-OMG)2>  ...(foobar@2080-OMG)2>    :binary,
    ...(foobar@2080-OMG)2> :hide,
    {:arg...(foobar@2080-OMG)2> s,
     ["--snam...(foobar@2080-OMG)2> e", "hello-#{node()}", "--erl",
...(foobar@2080-OMG)2>       "+sbwt none +sbwtdcpu none +sbwtdio none -elixir ansi_enabled true",
     ...(foobar@2080-OMG)2>  "--hidden", "--cookie", Atom.to_string(Node.get_cookie()), "--eval",
      "(\n...(foobar@2080-OMG)2>   [parent_node] = System.argv()\n  init_ref = make_ref()\n  parent_process = {node(), String.to_atom(parent_node)}\n  send(parent_process, {:node_started, init_ref, node(), self()})\n
  receive do\n    {:node_initialized, ^init_ref} ->\n    :ok\n  after\n    10000 -> :timeout\n  end\n)",
...(foobar@2080-OMG)2>       "--",
...(foobar@2080-OMG)2>       Atom.to_string(node())]}
  ]
...(foobar@2080-OMG)2> )...(foobar@2080-OMG)2>
#Port<0.7>
iex(foobar@2080-OMG)3> Port.monitor(port)
#Reference<0.3842356013.1485570050.111182>
iex(foobar@2080-OMG)4> flush()
{:DOWN, #Reference<0.3842356013.1485570050.111182>, :port, #Port<0.7>, :noproc}
:ok
iex(foobar@2080-OMG)5> flush()
:ok

can you do another pairing session later on?

I have some free time during the next two hours or so, if that works for you :)

@josevalim
Copy link
Contributor

@juhalehtonen sounds good, i will be waiting on the same place!

josevalim added a commit that referenced this issue Apr 16, 2021
1. Do not use nouse_stdio as it causes slowdowns when IEx
   is also running

2. Reduce the amount of generated random atoms by using the
   child_node as the name of the parent process

3. Do not pass quoted strings nor newlines to Windows to eval,
    use argv instead

Closes #194.
@josevalim
Copy link
Contributor

The second issue was around newlines but it is addressed now. Thank you @juhalehtonen <3 and Windows should be good to go @samaaron!

@howard0su
Copy link
Contributor

This issue seems back again:
F:\livebook>elixir --cookie "COOKIEFORTESTS" -S mix test
.................................................

  1. test Runtime.disconnect/1 makes the node terminate (Livebook.Runtime.ElixirStandaloneTest)
    test/livebook/runtime/elixir_standalone_test.exs:32
    match (=) failed
    code: assert {:ok, %{node: node} = runtime} = Runtime.ElixirStandalone.init()
    left: {:ok, %{node: node} = runtime}
    right: {:error, "Elixir process terminated unexpectedly"}
    stacktrace:
    test/livebook/runtime/elixir_standalone_test.exs:33: (test)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants