Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Appears ERLIDE can get stuck in a loop when the Linux system's network not properly configured #78

Closed
SenseiC opened this issue Oct 17, 2012 · 14 comments

Comments

@SenseiC
Copy link

SenseiC commented Oct 17, 2012

Earlier today I decided to remove and reinstall Eclipse (4.2.1) and reinstall ERLIDE. Before starting Eclipse I deleted the existing workspace/ and .eclipse/ folders. After bringing Eclipse up I installed ERLIDE from http://erlide.org/update_beta and then restarted Eclipse. I then verified that the Erlang runtime (Erlang R15B02) was defined.

When I tried to create my first I discovered that Eclipse kind of went off into "la-la land". After digging around for a bit I started Eclipse from a terminal session (eclipse --clean) and discovered that it was having great difficulty pinging the backend. Now possibly an important factor of that is that I am presently sitting in a McDonald's restaurant attached to their wireless network. In the erlide.log I saw:

15:59:45,202 F: (ErlRuntime.java:108) : # ping...c6d65_jpcustin_4f58c7_erlide@infosec-lab.mcd11686.dca.wayport.net main
15:59:54,553 S: (Backend.java:407) : error starting code server for erlang: timeout in erlang:whereis/1
15:59:54,554 S: (Backend.java:249) : Could not connect to backend! Please check runtime settings.

[root@infosec-lab ~]# more /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=infosec-lab.local
NTPSERVERARGS=iburst

When I looked at hosts I realized "the REAL problem":

[root@infosec-lab ~]# more /etc/hosts
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6

Since I'm only using IPv4 I changed localdomain to local

and for my sanity's sake, simply logged out and back in. When I started Eclipse back up it seemed much happier:

16:10:14,556 F: (ErlangLaunchDelegate.java:159) : START node :> [/usr/lib64/erlang/bin/erl, -name, R15_c6d65_jpcustin_590a88@infosec-lab.local, -setcookie, erlide] ***
/home/jpcustin
16:10:14,560 F: (ErlangLaunchDelegate.java:178) : process is running
16:10:14,575 F: (ErtsProcess.java:27) : # create ErtsProcess: R15_c6d65_jpcustin_590a88
16:10:14,577 F: (ErlangLaunchDelegate.java:138) : Started erts: R15_c6d65_jpcustin_590a88 >> R15_c6d65_jpcustin_590a88
16:10:14,590 F: (ErlRuntime.java:292) : using cookie 'erlide...'6 (info: 'erlide')
16:10:14,594 F: (Backend.java:241) : R15_c6d65_jpcustin_590a88@infosec-lab.local: waiting connection to peer...
16:10:14,700 F: (ErlRuntime.java:108) : # ping...R15_c6d65_jpcustin_590a88@infosec-lab.local main
16:10:14,734 F: (ErlRuntime.java:122) : Node R15_c6d65_jpcustin_590a88@infosec-lab.local is up
16:10:14,734 F: (ErlRuntime.java:108) : # ping...R15_c6d65_jpcustin_590a88@infosec-lab.local main
16:10:14,966 F: (Backend.java:404) : code server started
16:10:14,967 F: (Backend.java:247) : connected!

I wanted to report this ONLY because in order to even determine this I had to use kill to stop Eclipse. I got my first clue on a possible explanation when I was digging around in the Windows > Preferences --> Erlang settings and ultimately when I clicked on Network I saw:

Host names used by the internal Erlang backends

Short name (-sname) and Long name (-sname)

and down at the bottom I saw "For reference, Java sees the following values:" and in there saw "infosec-lab.localdomain" but knew that based on how I build this system, it SHOULD have had ".local" and not ".localdomain".

Now call me impatient, but after waiting maybe 30-45 seconds I finally gave up and grabbed my "big stick" and beat Eclipse into submission. I don't know what would have happened if I had left it running, but it certainly SEEMED to be stuck looping. I would think that it would help to have some finite number of attempts and if unsuccessful then display some message in a dialog box and then either exit or allow the user to then interact with Eclipse.

@vladdu
Copy link
Collaborator

vladdu commented Oct 17, 2012

Thanks a lot for this report! You are correct that network misconfiguration can cause erlide to basically hang eclipse. I've made improvements in the last weeks, but it's difficult to cover all possible ways to misconfigure the network for all OSs and situations. Besides, "fixing" it for one setting creates problems in another, and fixing that one disturbs yet another, and so on...

I think that the current solution only needs a way to communicate the situation to the user and prompt for fixing it, while not keep trying to connect.

In the near future we will deploy a version that doesn't use an erlang backend for everything, so this kind of problems will dissapear; this is also why I don't want to spend too much time trying to fix the current situation. But of course hangings can't be tolerated, so I will look at it again.

vladdu added a commit to vladdu/erlide_eclipse that referenced this issue Oct 18, 2012
@vladdu
Copy link
Collaborator

vladdu commented Oct 18, 2012

I have restricted the looping retries when we can't connect to the internal backend and added a message about what might be wrong.

It is likely that your network is misconfigured or uses 'strange' host names. Please check the Window->preferences->erlang->network page for hints about that.

Also, check if you can create and connect two erlang nodes on your machine using "erl -name foo1" and "erl -name foo2"

@AllYouCanAlex
Copy link

The latest patch throws an error screen, even though i can create two nodes with erl -name node1 and erl -name node2 and they can ping each other.

On the network screen, the java sees long and short names are identical. Should they be different? Because the top of the screen, short name is blah-desktop while long name is blah-desktop.domain.com

@vladdu
Copy link
Collaborator

vladdu commented Oct 22, 2012

Aaah, I think I will go crazy... It's so frustrating when I can't reproduce these things myself...

What is the name seen by java?

Is blah-desktop.domain.com in your hosts file and does it point to the correct IP address?

@AllYouCanAlex
Copy link

I fixed the issue. Basically blah-desktop.domain.com was the FQDN that Erlide calculated it to be, but neither the DNS server nor my /etc/hosts knew about it. /etc/hostname and /etc/hosts only has blah-desktop.

I am running Erlide in a VM on a corporate network and when I tried to ping blah-desktop.domain.com from shell, "unknown host" was my result.

Once I added "127.0.01 blah-desktop.domain.com" to /etc/hosts I can start Erlide with no errors and I can run configurations with long name.

Before I added blah-desktop.domain.com to /etc/hosts, i was still able to start two nodes with -name and ping each other. So it seems that maybe Java code needs the FQDN to be reachable.

Or the readme needs to say that the FQDN needs to be reachable.

@AllYouCanAlex
Copy link

In the latest releases, did you switch erlide from using -sname to -name by any chance?

@vladdu
Copy link
Collaborator

vladdu commented Oct 22, 2012

I'm glad you sorted it out!

Yes, we try with long names first and if that fails, we try with short names. This is because it looks (from my empiric experience = the bug reports I got) that long names have a better chance to work. But by "fails" above, I mean that it's not possible to start Erlang nodes with long names at all. I should add something even for the case when no connection can be made... It's a quite complicated state machine!

It is actually Erlang that uses blah-desktop.domain.com, we are now starting an erlang node and read the value it uses for the hostname. I will try to add some more helpful text to the docs.

Thanks!

@vladdu
Copy link
Collaborator

vladdu commented Oct 23, 2012

Could you please look at https://github.com/erlide/erlide/wiki/Troubleshooting and tell me if it is helpful enough?

I think I got the solution right this time, but of course it is difficult to test because I don't know how to misconfigure my machine the right way 😄

@rambocoder
Copy link

The instructions are pretty clear and they fix the issue. Thank you. I've been having the same problem.

@oinksoft
Copy link

I hate to say it but the instructions are not clear to me for my case. Also, there is no "Troubleshooting" section in Erlang preferences ... I am not sure if that is a typo.

Mapping the hostname I see after erl -name foo to 127.0.0.1 in /etc/hosts does not correct this issue. The same goes for mapping this hostname to my real IP.

@vladdu
Copy link
Collaborator

vladdu commented Nov 15, 2012

Yes, the page is called "Network", I corrected the wiki page. A nightly build of 0.17.4 is needed.

Please send me a log, there should be some line at the start saying "Test foo@host... Not working" or similar.

@vladdu
Copy link
Collaborator

vladdu commented Nov 15, 2012

Solved as per direct message.

@vladdu vladdu closed this as completed Nov 15, 2012
@jacobythwaites
Copy link

Just in case someone else using OSX Lion finds this useful, oinksoft's solution (see above) stopped the erlide-caused hangs for me too:

If your machine is called (say) dev then add the short and long forms to your /etc/hosts file:

127.0.0.1 localhost dev dev.local

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants