Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not establish connection to compute node (Could not resolve hostname error) #20

Closed
alzaia opened this issue Jul 27, 2022 · 3 comments

Comments

@alzaia
Copy link

alzaia commented Jul 27, 2022

Hello, I am using the mila code command to establish a remote ssh connection with VSCode. It has worked without any problems in the past 3 weeks, but yesterday I started getting this error when running it.

The command I run from my terminal is:
mila code <path_to_my_cluster_code> --alloc --cpus-per-task 4 --gres gpu:1 --time=0-3:00:00 --partition unkillable

The issue likely comes from mila code and not from the cluster or ssh because it works fine when I ssh directly without using mila code. Also, I tried setting up the ssh host manually on VSCode (by first getting an salloc on the cluster, then opening VSCode and manually adding the compute node I was given as new host (e.g. ssh -J mila <username>@<nodename>)) and this works perfectly.

The connection fails with the following log on VSCode:

[[17:05:45.046] Log Level: 2
[17:05:45.046] remote-ssh@0.84.0
[17:05:45.047] darwin arm64
[17:05:45.055] SSH Resolver called for "ssh-remote+cn-c017.server.mila.quebec", attempt 1
[17:05:45.055] "remote.SSH.useLocalServer": true
[17:05:45.055] "remote.SSH.path": undefined
[17:05:45.055] "remote.SSH.configFile": undefined
[17:05:45.056] "remote.SSH.useFlock": true
[17:05:45.056] "remote.SSH.lockfilesInTmp": false
[17:05:45.056] "remote.SSH.localServerDownload": auto
[17:05:45.056] "remote.SSH.remoteServerListenOnSocket": false
[17:05:45.056] "remote.SSH.showLoginTerminal": false
[17:05:45.056] "remote.SSH.defaultExtensions": []
[17:05:45.056] "remote.SSH.loglevel": 2
[17:05:45.056] "remote.SSH.enableDynamicForwarding": true
[17:05:45.056] "remote.SSH.enableRemoteCommand": false
[17:05:45.056] "remote.SSH.serverPickPortsFromRange": {}
[17:05:45.056] "remote.SSH.serverInstallPath": {}
[17:05:45.058] SSH Resolver called for host: cn-c017.server.mila.quebec
[17:05:45.058] Setting up SSH remote "cn-c017.server.mila.quebec"
[17:05:45.060] Acquiring local install lock: /var/folders/ml/kpt1qf1s2ngfsbzl5ycgclz00000gp/T/vscode-remote-ssh-feed4297-install.lock
[17:05:45.060] Looking for existing server data file at /Users/aldozaimi/Library/Application Support/Code/User/globalStorage/ms-vscode-remote.remote-ssh/vscode-ssh-host-feed4297-3b889b090b5ad5793f524b5d1d39fda662b96a2a-0.84.0/data.json
[17:05:45.060] Using commit id "3b889b090b5ad5793f524b5d1d39fda662b96a2a" and quality "stable" for server
[17:05:45.062] Install and start server if needed
[17:05:45.063] PATH: /opt/homebrew/Caskroom/miniforge/base/envs/simulation_venv/bin:/opt/homebrew/Caskroom/miniforge/base/condabin:/opt/homebrew/bin:/opt/homebrew/sbin:/Library/Frameworks/Python.framework/Versions/3.10/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/aldozaimi/.cargo/bin
[17:05:45.063] Checking ssh with "ssh -V"
[17:05:45.066] > OpenSSH_8.6p1, LibreSSL 3.3.5

[17:05:45.067] askpass server listening on /var/folders/ml/kpt1qf1s2ngfsbzl5ycgclz00000gp/T/vscode-ssh-askpass-135acdbd1f53ce3ecf04f0db01904fbfecec5929.sock
[17:05:45.068] Spawning local server with {"serverId":1,"ipcHandlePath":"/var/folders/ml/kpt1qf1s2ngfsbzl5ycgclz00000gp/T/vscode-ssh-askpass-d3660326b270bf9a6b665c435ec56bc6679ee114.sock","sshCommand":"ssh","sshArgs":["-v","-T","-D","51100","-o","ConnectTimeout=15","cn-c017.server.mila.quebec"],"serverDataFolderName":".vscode-server","dataFilePath":"/Users/aldozaimi/Library/Application Support/Code/User/globalStorage/ms-vscode-remote.remote-ssh/vscode-ssh-host-feed4297-3b889b090b5ad5793f524b5d1d39fda662b96a2a-0.84.0/data.json"}
[17:05:45.068] Local server env: {"SSH_AUTH_SOCK":"/private/tmp/com.apple.launchd.4M5703FS2F/Listeners","SHELL":"/bin/zsh","DISPLAY":"1","ELECTRON_RUN_AS_NODE":"1","SSH_ASKPASS":"/Users/aldozaimi/.vscode/extensions/ms-vscode-remote.remote-ssh-0.84.0/out/local-server/askpass.sh","VSCODE_SSH_ASKPASS_NODE":"/Applications/Visual Studio Code.app/Contents/Frameworks/Code Helper.app/Contents/MacOS/Code Helper","VSCODE_SSH_ASKPASS_EXTRA_ARGS":"--ms-enable-electron-run-as-node","VSCODE_SSH_ASKPASS_MAIN":"/Users/aldozaimi/.vscode/extensions/ms-vscode-remote.remote-ssh-0.84.0/out/askpass-main.js","VSCODE_SSH_ASKPASS_HANDLE":"/var/folders/ml/kpt1qf1s2ngfsbzl5ycgclz00000gp/T/vscode-ssh-askpass-135acdbd1f53ce3ecf04f0db01904fbfecec5929.sock"}
[17:05:45.068] Spawned 65717
[17:05:45.134] > local-server-1> Spawned ssh, pid=65724
[17:05:45.136] stderr> OpenSSH_8.6p1, LibreSSL 3.3.5
[17:05:45.141] stderr> OpenSSH_8.6p1, LibreSSL 3.3.5
[17:05:45.154] stderr> kex_exchange_identification: Connection closed by remote host
[17:05:45.154] stderr> Connection closed by 172.16.2.25 port 2222
[17:05:45.154] stderr> kex_exchange_identification: Connection closed by remote host
[17:05:45.154] stderr> Connection closed by UNKNOWN port 65535
[17:05:45.155] > local-server-1> ssh child died, shutting down
[17:05:45.156] Local server exit: 0
[17:05:45.156] Received install output: local-server-1> Spawned ssh, pid=65724
OpenSSH_8.6p1, LibreSSL 3.3.5
OpenSSH_8.6p1, LibreSSL 3.3.5
kex_exchange_identification: Connection closed by remote host
Connection closed by 172.16.2.25 port 2222
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535
local-server-1> ssh child died, shutting down

[17:05:45.157] Failed to parse remote port from server output
[17:05:45.157] Resolver error: Error: 
	at Function.Create (/Users/aldozaimi/.vscode/extensions/ms-vscode-remote.remote-ssh-0.84.0/out/extension.js:1:585222)
	at Object.t.handleInstallOutput (/Users/aldozaimi/.vscode/extensions/ms-vscode-remote.remote-ssh-0.84.0/out/extension.js:1:583874)
	at Object.e [as tryInstallWithLocalServer] (/Users/aldozaimi/.vscode/extensions/ms-vscode-remote.remote-ssh-0.84.0/out/extension.js:1:624373)
	at processTicksAndRejections (node:internal/process/task_queues:96:5)
	at async /Users/aldozaimi/.vscode/extensions/ms-vscode-remote.remote-ssh-0.84.0/out/extension.js:1:643506
	at async Object.t.withShowDetailsEvent (/Users/aldozaimi/.vscode/extensions/ms-vscode-remote.remote-ssh-0.84.0/out/extension.js:1:647224)
	at async /Users/aldozaimi/.vscode/extensions/ms-vscode-remote.remote-ssh-0.84.0/out/extension.js:1:622845
	at async T (/Users/aldozaimi/.vscode/extensions/ms-vscode-remote.remote-ssh-0.84.0/out/extension.js:1:619351)
	at async Object.t.resolveWithLocalServer (/Users/aldozaimi/.vscode/extensions/ms-vscode-remote.remote-ssh-0.84.0/out/extension.js:1:622460)
	at async Object.t.resolve (/Users/aldozaimi/.vscode/extensions/ms-vscode-remote.remote-ssh-0.84.0/out/extension.js:1:644834)
	at async /Users/aldozaimi/.vscode/extensions/ms-vscode-remote.remote-ssh-0.84.0/out/extension.js:1:727082
[17:05:45.160] ------

Any ideas on how to resolve this?

@breuleux
Copy link
Member

I tried to run that same command and I am not getting any error.

It is failing systemically? Does it only happen on a particular node? If you get the allocation manually and then try to connect with mila code PATH --node NODE, does it work? What about code -nw --remote ssh-remote+NODE.server.mila.quebec PATH which is the command mila code runs after the allocation?

I looked up the error in the log and I'm seeing this issue: microsoft/vscode-remote-release#5111. There are a few suggestions there, one is to change the login shell, which seems doubtful to me, another is to remove $HOME/.vscode-server on the remote, which could make sense if e.g. it contains inconsistent state due to some issue or race condition.

@semihcanturk
Copy link

semihcanturk commented Aug 22, 2022

Hi, I'm having the same issue, both on 0.0.9 and latest master. Not specific to any node. Basically it can't resolve the following remote, which doesn't seem correct:
Unable to resolve resource vscode-remote://ssh-remote%2Bcn.server.mila.quebec/home/mila/s/semih.canturk/scratch

Yes, mila code PATH --node NODE works after manual salloc since this seems to provide the correct remote hostname.

@breuleux
Copy link
Member

breuleux commented Sep 1, 2022

This should be fixed as of 0.0.10.

@breuleux breuleux closed this as completed Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants