Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Losing internet connectivity after restarting the server #342

Closed
holta opened this issue Sep 29, 2017 · 19 comments
Closed

Losing internet connectivity after restarting the server #342

holta opened this issue Sep 29, 2017 · 19 comments
Milestone

Comments

@holta
Copy link
Member

holta commented Sep 29, 2017

@kananigit writes "actually it fails to raise the network interfaces so it's pretty bad, will have more info after testing on a different network"

@jvonau helped a TON during a live call yesterday: anything else I/we need to add to the ticket?

@holta holta added the bug label Sep 29, 2017
@holta holta added this to the 6.4-Sep milestone Sep 29, 2017
@holta holta added the question label Sep 29, 2017
@holta
Copy link
Member Author

holta commented Sep 29, 2017

@jvonau writes: Other than enabling wordpress from the gui and ICO [Install Configured Options], then the issues started... not really... Not sure if he changed the network role while in the gui cause the gui run does not provide a log file, and I forgot to ask for the /etc/iib/config_vars.yml file. As control test could you try selecting a different network mode and then ICO in the gui to see if that can break the current network routine?

@holta writes: Should we ask him to avoid the GUI (Admin Console you mean?) for now?

@jvonau writes: [Maybe] but functional testing should be performed on changing the network role from the GUI. If that change breaks the ansible run there is a usability issue that is difficult to recover from.

Try changing the network role. There might be a bigger issue lurking underneath that [@kananigit] has bubbled to the top by a complete fluke.

I'd try 'Appliance' to keep openvpn available if the run does succeed. But "Lan Controller" needs to be tested also.

I had him run ["cd /opt/iiab/iiab; ./iiab-network"] twice and grabbed the logfile, and the networking blew up the second pass.

@jvonau
Copy link
Contributor

jvonau commented Sep 29, 2017

@holta
Copy link
Member Author

holta commented Sep 29, 2017

@kananigit writes:

I did a fresh install on the pi 3 at my place, then brought it to my cousin's house. I hardwired it to the router and booted it up but was not getting the internet. i just ran ./iiab-network will see if that fixes anything.

Also decided to run a clean install on Ubuntu with [local_vars.yml's "iiab_home_url: /home" to "iiab_home_url: /wordpress"] homepage changed to wordpress, am running that here at my [cousin's] place crossing my fingers and hoping my home network [was the issue] and not my machines

@kananigit
Copy link

so ./iiab-network run on my pi just made me lose access to my server all together. This is the point it says failed to start raise network interfaces just like it did yesterday. So now i have got no access to the internet and my iiab server.

@holta
Copy link
Member Author

holta commented Sep 30, 2017

@kananigit asks over email if #182 ("unable to get static ip on debian 9") and/or hard-coding DHCP IP addresses into his home router (based on his RPi3 or Ubuntu machines' MAC addresses) are possibly relevant/necessary...to get him back on track with working Internet??

@jvonau
Copy link
Contributor

jvonau commented Sep 30, 2017

Brought over from xsce-devel as for some reason I can reply there any longer.

Very Preliminary Conclusion:
Joshua Kanani's RPi3 networking appears to have self-destructed (gateway to Internet) on the very 1st reboot, immediately after the stock 1-line installer http://download.iiab.io/6.4/rpi/load-vpn.txt completed (that script alone's been used ~100 times now, by ~10 people, so something just doesn't add up!)
I've never seen anything like it. TeamViewer & OpenVPN have nothing to do with one another -- and yet both are unavailable for me to connect in remotely after reboot -- which tells me with 95% certainty the DHCP handshake in his home never happened after RPi3 reboot.
This is the same pattern Joshua faced in August; hence hard-coding 192.168.0.99 into his router so the RPi3-connected-by-Ethernet-cable comes up as 192.168.0.99 every time...but does this really work with raw Raspian Lite and TeamViewer alone? Let try that later today, without a single thing installed on Raspbian Lite (or just TeamViewerHost).
After Joshua wakes up in the coming hours I'd like to know much more about the HW in his home. Worst case maybe there's is something to our month-old theories that the router in his home's DHCP is slightly non-standard and incompatible in some way? We just don't have the critical/basic facts yet it seems, after more than a month -- way too voodoo for sure. So as crazy as it sound (nobody wants this hanging over Joshua for additional months, whether or not others will ever face this!) the next step is likely start swapping ALL parts until we isolate the issue:
how exactly is Joshua Kanani preparing+customizing the SD card with Raspbian Lite?
swap in another router (different brand entirely)
swap in another RPi3
swap in another microSD card (different brand entirely)
swap in his cousin's home (top-to-bottom install of http://download.iiab.io/6.4/rpi/load-vpn.txt -- not pre-built in Josh

@jvonau
Copy link
Contributor

jvonau commented Sep 30, 2017

The last option above should prove or eliminate the home network as the source of the issue. From the description of the network that I uncovered during a voice call, perhaps a second router downstairs in between the main router and the problem rpi would be enough to see if the main router is affecting the rpi negatively. Now with the second router inline with the main one, there would be a different dhcp server serving the rpi, the results should differ as the rpi is not talking directly with the main router upstairs.

@holta
Copy link
Member Author

holta commented Sep 30, 2017

Josh's cousin has the same ISP (CenturyLink) with the same Actiontec C1000A (DSL) router provided by this ISP/CenturyLink — so these are not fully independent tests FWIW :/

http://internethelp.centurylink.com/internethelp/pdf/modems/datasheet-c1000a.pdf

@jvonau
Copy link
Contributor

jvonau commented Sep 30, 2017 via email

@jvonau
Copy link
Contributor

jvonau commented Sep 30, 2017

My comment in chat:

maybe it might be the hostname change from raspberrypi -> box that is messing with the dhcp lease assignment?

@holta
Copy link
Member Author

holta commented Oct 1, 2017

Obesrvations If Not Quite Conclusions:

  1. We've definitely isolated the failure to "./runansible" which prevents IIAB (RPi3 or classical laptop) from connecting to @kananigit's or his cousin's CenturyLink Actiontec C1000A (DSL) routers, after IIAB is rebooted — whereupon IIAB's eth0 (Ethernet cable to router) can suddenly no longer get an IP address.
  2. A freshly installed OS (like Raspbian Lite) does not have this problem. Likewise none of the steps prior to ./runansible (within load script http://download.iiab.io/6.4/rpi/load-vpn.txt) cause this failure (e.g. none of the pre-Ansible steps like apt, TeamViewer, emacs, git clone and installing Ansible 2.4.0 etc trigger the problem).
  3. The failure occurs even when Ansible's http://wiki.laptop.org/go/IIAB/local_vars.yml variables are all set to False within local_vars.yml (i.e. not a single IIAB service was installed or enabled).
  4. We did not run any of the steps in http://download.iiab.io/6.4/rpi/load-vpn.txt that follow ./runansible — as the failure to receive an IP address from DHCP was already obvious.
  5. Changing the IIAB/RPi3's hostname from "box" (within /etc/hostname and /etc/hosts and/or using "raspi-config") regrettably did not help :(
  6. Swapping SD cards (old/new, SanDisk/Samsung, fast/slow) did not help.
  7. Using A Different Router DOES Provide An eth0 IP address to IIAB/RPi3 essentially instantly, no reboot required. So long as the router has DHCP turned on.
  8. @kananigit will try to push further Sunday after he returns from work...but after 1-2 months here he may be running out of patience...preferring a working system over a complete diagnosis of why IIAB and CenturyLink Actiontec C1000A DSL routers refuse to dance together (but we might be able to help some of CenturyLink's 6 million broadband customers IF we/all/someone do figure this out...don't give up if other good ideas !?)

@holta
Copy link
Member Author

holta commented Oct 1, 2017

Sounds like CenturyLink (ISP) and/or Actiontec (manufacturer of C1000A router) configure their DHCP more strictly than anything else we've seen.

It would be lovely to know why — but in the interim, kudos to @kananigit who came up with this workaround (also for his cousin's home, and others in the same situation) getting folks back on their feet:

"[I inserted a] second [intermediary] router [to serve DHCP to IIAB/RPi3] and wolaaa!!! I have internet back on the [IIAB/RPi3].

I had to make sure the second [intermediary] router is on a different subnet and set its internet IP to one available IP on the primary [CenturyLink/ISP] router and also put the primary as the default gateway.

That's the only way I was able to have DHCP active on both the routers. Just plugging in the secondary to primary did not work and I honestly don't know how that is possible [but configuring settings within the second/intermediary router certainly works!]

My [earlier/failed attempt] involved turning off DHCP in the secondary router and assigning it an available IP from the primary, so all the handling of the DHCP was done by the primary, hence just an extension of the main network [hence IIAB devices were still blocked in this earlier configuration, due to CenturyLink/ActionTec's very strict DHCP]."

@jvonau
Copy link
Contributor

jvonau commented Oct 1, 2017

My next test would of been to # the 'require dhcp_server_identifier' option to the /etc/dhcpcd.conf file to see if that would eliminate the need for the second router.

@holta
Copy link
Member Author

holta commented Oct 1, 2017

My next test would of been to # [comment out] the 'require dhcp_server_identifier' option to the /etc/dhcpcd.conf file to see if that would eliminate the need for the second router.

@kananigit can you please try the idea above?!

@holta holta modified the milestones: 6.4-Sep, 6.5-Oct Oct 3, 2017
@holta
Copy link
Member Author

holta commented Oct 13, 2017

@kananigit: can you possibly give @jvonau's /etc/dhcpcd.conf idea above a quick shot in the coming week?

@kananigit
Copy link

kananigit commented Oct 13, 2017 via email

@kananigit
Copy link

I commented the 'require dhcp_server_identifier' option to the /etc/dhcpcd.conf file and used my main router but it was not able to fix the internet issue unfortunately

@jvonau
Copy link
Contributor

jvonau commented Oct 19, 2017 via email

@holta
Copy link
Member Author

holta commented Oct 20, 2017

Resolving for now given we found a workaround.

Please re-open as nec!

@holta holta closed this as completed Oct 20, 2017
holta added a commit that referenced this issue Jan 17, 2020
sync from georgejhunt:log-consume
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants