Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsmasq fails often in captive-portal (4-server-options) on multiple OS's #1469

Closed
holta opened this issue Feb 9, 2019 · 13 comments
Closed
Assignees
Labels
Milestone

Comments

@holta
Copy link
Member

holta commented Feb 9, 2019

Ubuntu 16.04 (intermittent) and Debian 10 (every time for @deldesir) are not the only ones.

On some distros it happens every time, whereas on others it's intermittent ("./iiab-install --reinstall" can sometimes work to overcome this failure in 4-server-options).

In all cases, iiab-install fails here in Stage 4 (4-server-options) when dnsmasq fails to start:
https://github.com/iiab/iiab/blob/master/roles/captive-portal/tasks/main.yml#L141-L145

Sometimes ./iiab-network ("./runrole network") fails in apparently the same way — with dnsmasq strangely unable to start. Confoundingly, "systemctl start dnsmasq" seems to run fine [certainly on Ubuntu 16.04] when run manually at the command-line !

Thank you @deldesir @jvonau @georgejhunt for continuing to monitor this pattern towards identifying the root cause — so we can put an end to this in coming weeks if not days!

Refs: #1182 #1364 #1387 and others

Possibly Unrelated: @jvonau asks if 127.0.0.53 in Ubuntu 18.04's /etc/resolv.conf has anything to do with this? https://github.com/iiab/iiab/blob/master/roles/sugarizer/tasks/main.yml#L181-L182

@holta holta added the bug label Feb 9, 2019
@holta holta added this to the 6.7 milestone Feb 9, 2019
@holta holta changed the title dnsmasq fails often in captive-portal / 4-server-options on multiple OS's dnsmasq fails often in captive-portal (4-server-options) on multiple OS's Feb 9, 2019
@holta
Copy link
Member Author

holta commented Feb 9, 2019

If we do not identify the root cause in coming days, we should consider a workaround so that IIAB 6.7 isn't stillborn/sabotaged across multiple OS's/distros (-:

@deldesir
Copy link
Contributor

deldesir commented Feb 9, 2019

For now installation goes without it when captive_portal_install: False and captive_portale_enabled: False un /etc/iiab/local_vars.yml.

"systemctl start dnsmasq" stills fails even when run manually [on Debian 10, unlike Ubuntu 16.04].

I'll wait installation to finish without captive portal and after I'll play with tasks of captive-portal role.

@holta
Copy link
Member Author

holta commented Feb 9, 2019

Thanks @deldesir! Please let us know if roles/network succeeds during the last stage of iiab-install, and when run manually as ./iiab-network (or "./runrole network") — in all cases making sure you're in /opt/iiab/iiab

(Or possibly dnsmasq has a serious glitch on Debian 10 for the moment?)

@deldesir
Copy link
Contributor

deldesir commented Feb 9, 2019

Not dnsmasq fault nor captive-portal.
netstat -lnp | grep ":53 " showed connmand took the port. Now dnsmasq start after I removed connman.

Now I am rerunning iiab-install --reinstall with captive portal set to true for both install and enabled.

@holta
Copy link
Member Author

holta commented Feb 9, 2019

ConnMan is an internet connection manager for embedded devices

Appropriately named ;)

How did you remove connman ?

Why is this installed as part of your Debian 10, do you know?

@holta
Copy link
Member Author

holta commented Feb 9, 2019

Why is this installed as part of your Debian 10, do you know?

Clarif: @deldesir installed this (connman package) no worries, now we know!

Larger question about IIAB fragility in general — as we try to make dnsmasq resilient across distros, what error reporting (e.g. checking what daemon might already be using port 53?) might best communicate different ways that dnsmasq fails across different distros?

@georgejhunt
Copy link
Contributor

georgejhunt commented Feb 9, 2019 via email

@holta
Copy link
Member Author

holta commented Feb 10, 2019

I think we should change the default to off for captive-portal on all hardware that is not rpi.

@georgejhunt how/where do you suggest this happen?

A couple related points:

  1. Right now captive-portal is True/True in MEDIUM-sized and BIG-sized local_vars.yml and we should keep stuff like that, to expand Captive Portal testing wherever possible!
  2. Folks in India (that @m-anish works with) very much want Captive Portal to work on Ubuntu 18.04 — is extending what we have on Raspbian to other distro(s) likely very hard?

@holta
Copy link
Member Author

holta commented Feb 11, 2019

@georgejhunt asks a thoughtful question: if we do suppress dnsmasq glitches within 4-server-options, does this really help if they'll (presumably) just reappear later during the roles/network stage?

@georgejhunt
Copy link
Contributor

georgejhunt commented Feb 11, 2019 via email

@holta
Copy link
Member Author

holta commented Feb 11, 2019

While similar in some ways, is this unrelated to #1452 in the end?

@holta
Copy link
Member Author

holta commented Mar 19, 2019

@jvonau this error no longer occurs on Debian 10 Buster according to @floydianslips at #1387

I also no longer see it on Ubuntu 16.04 (if we really still care about that, but FYI!)

Should we declare victory and close this?

@holta
Copy link
Member Author

holta commented May 4, 2019

This still happens on Ubuntu 16.04, but that is now more than 3 years old and it really is time for IIAB implementers to use more modern OS's.

@holta holta closed this as completed May 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants