-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Appears ERLIDE can get stuck in a loop when the Linux system's network not properly configured #78
Comments
Thanks a lot for this report! You are correct that network misconfiguration can cause erlide to basically hang eclipse. I've made improvements in the last weeks, but it's difficult to cover all possible ways to misconfigure the network for all OSs and situations. Besides, "fixing" it for one setting creates problems in another, and fixing that one disturbs yet another, and so on... I think that the current solution only needs a way to communicate the situation to the user and prompt for fixing it, while not keep trying to connect. In the near future we will deploy a version that doesn't use an erlang backend for everything, so this kind of problems will dissapear; this is also why I don't want to spend too much time trying to fix the current situation. But of course hangings can't be tolerated, so I will look at it again. |
I have restricted the looping retries when we can't connect to the internal backend and added a message about what might be wrong.
|
The latest patch throws an error screen, even though i can create two nodes with erl -name node1 and erl -name node2 and they can ping each other. On the network screen, the java sees long and short names are identical. Should they be different? Because the top of the screen, short name is blah-desktop while long name is blah-desktop.domain.com |
Aaah, I think I will go crazy... It's so frustrating when I can't reproduce these things myself... What is the name seen by java? Is blah-desktop.domain.com in your hosts file and does it point to the correct IP address? |
I fixed the issue. Basically blah-desktop.domain.com was the FQDN that Erlide calculated it to be, but neither the DNS server nor my /etc/hosts knew about it. /etc/hostname and /etc/hosts only has blah-desktop. I am running Erlide in a VM on a corporate network and when I tried to ping blah-desktop.domain.com from shell, "unknown host" was my result. Once I added "127.0.01 blah-desktop.domain.com" to /etc/hosts I can start Erlide with no errors and I can run configurations with long name. Before I added blah-desktop.domain.com to /etc/hosts, i was still able to start two nodes with -name and ping each other. So it seems that maybe Java code needs the FQDN to be reachable. Or the readme needs to say that the FQDN needs to be reachable. |
In the latest releases, did you switch erlide from using -sname to -name by any chance? |
I'm glad you sorted it out! Yes, we try with long names first and if that fails, we try with short names. This is because it looks (from my empiric experience = the bug reports I got) that long names have a better chance to work. But by "fails" above, I mean that it's not possible to start Erlang nodes with long names at all. I should add something even for the case when no connection can be made... It's a quite complicated state machine! It is actually Erlang that uses blah-desktop.domain.com, we are now starting an erlang node and read the value it uses for the hostname. I will try to add some more helpful text to the docs. Thanks! |
Could you please look at https://github.com/erlide/erlide/wiki/Troubleshooting and tell me if it is helpful enough? I think I got the solution right this time, but of course it is difficult to test because I don't know how to misconfigure my machine the right way 😄 |
The instructions are pretty clear and they fix the issue. Thank you. I've been having the same problem. |
I hate to say it but the instructions are not clear to me for my case. Also, there is no "Troubleshooting" section in Erlang preferences ... I am not sure if that is a typo. Mapping the hostname I see after |
Yes, the page is called "Network", I corrected the wiki page. A nightly build of 0.17.4 is needed. Please send me a log, there should be some line at the start saying "Test foo@host... Not working" or similar. |
Solved as per direct message. |
Just in case someone else using OSX Lion finds this useful, oinksoft's solution (see above) stopped the erlide-caused hangs for me too: If your machine is called (say) dev then add the short and long forms to your /etc/hosts file: 127.0.0.1 localhost dev dev.local |
Earlier today I decided to remove and reinstall Eclipse (4.2.1) and reinstall ERLIDE. Before starting Eclipse I deleted the existing workspace/ and .eclipse/ folders. After bringing Eclipse up I installed ERLIDE from http://erlide.org/update_beta and then restarted Eclipse. I then verified that the Erlang runtime (Erlang R15B02) was defined.
When I tried to create my first I discovered that Eclipse kind of went off into "la-la land". After digging around for a bit I started Eclipse from a terminal session (eclipse --clean) and discovered that it was having great difficulty pinging the backend. Now possibly an important factor of that is that I am presently sitting in a McDonald's restaurant attached to their wireless network. In the erlide.log I saw:
15:59:45,202 F: (ErlRuntime.java:108) : # ping...c6d65_jpcustin_4f58c7_erlide@infosec-lab.mcd11686.dca.wayport.net main
15:59:54,553 S: (Backend.java:407) : error starting code server for erlang: timeout in erlang:whereis/1
15:59:54,554 S: (Backend.java:249) : Could not connect to backend! Please check runtime settings.
[root@infosec-lab ~]# more /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=infosec-lab.local
NTPSERVERARGS=iburst
When I looked at hosts I realized "the REAL problem":
[root@infosec-lab ~]# more /etc/hosts
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
Since I'm only using IPv4 I changed localdomain to local
and for my sanity's sake, simply logged out and back in. When I started Eclipse back up it seemed much happier:
16:10:14,556 F: (ErlangLaunchDelegate.java:159) : START node :> [/usr/lib64/erlang/bin/erl, -name, R15_c6d65_jpcustin_590a88@infosec-lab.local, -setcookie, erlide] ***
/home/jpcustin
16:10:14,560 F: (ErlangLaunchDelegate.java:178) : process is running
16:10:14,575 F: (ErtsProcess.java:27) : # create ErtsProcess: R15_c6d65_jpcustin_590a88
16:10:14,577 F: (ErlangLaunchDelegate.java:138) : Started erts: R15_c6d65_jpcustin_590a88 >> R15_c6d65_jpcustin_590a88
16:10:14,590 F: (ErlRuntime.java:292) : using cookie 'erlide...'6 (info: 'erlide')
16:10:14,594 F: (Backend.java:241) : R15_c6d65_jpcustin_590a88@infosec-lab.local: waiting connection to peer...
16:10:14,700 F: (ErlRuntime.java:108) : # ping...R15_c6d65_jpcustin_590a88@infosec-lab.local main
16:10:14,734 F: (ErlRuntime.java:122) : Node R15_c6d65_jpcustin_590a88@infosec-lab.local is up
16:10:14,734 F: (ErlRuntime.java:108) : # ping...R15_c6d65_jpcustin_590a88@infosec-lab.local main
16:10:14,966 F: (Backend.java:404) : code server started
16:10:14,967 F: (Backend.java:247) : connected!
I wanted to report this ONLY because in order to even determine this I had to use kill to stop Eclipse. I got my first clue on a possible explanation when I was digging around in the Windows > Preferences --> Erlang settings and ultimately when I clicked on Network I saw:
Host names used by the internal Erlang backends
Short name (-sname) and Long name (-sname)
and down at the bottom I saw "For reference, Java sees the following values:" and in there saw "infosec-lab.localdomain" but knew that based on how I build this system, it SHOULD have had ".local" and not ".localdomain".
Now call me impatient, but after waiting maybe 30-45 seconds I finally gave up and grabbed my "big stick" and beat Eclipse into submission. I don't know what would have happened if I had left it running, but it certainly SEEMED to be stuck looping. I would think that it would help to have some finite number of attempts and if unsuccessful then display some message in a dialog box and then either exit or allow the user to then interact with Eclipse.
The text was updated successfully, but these errors were encountered: