Skip to content
This repository has been archived by the owner on Aug 22, 2021. It is now read-only.

AiLight entering config mode when MQTT unreachable #56

Closed
hamishfagg opened this issue Jul 11, 2019 · 21 comments
Closed

AiLight entering config mode when MQTT unreachable #56

hamishfagg opened this issue Jul 11, 2019 · 21 comments
Labels
bug bugfix Fix to resolve an internal or dependency bug release Issues that are set to be fixed in a release
Milestone

Comments

@hamishfagg
Copy link

Hi there,

Occasionally my mqtt server will be unreachable for a period of time due to server restarts/upgrades etc. When this happens AiLight enters config mode (the wifi AP comes up), and it stays like this forever until I restart the AiLight manually for it to reconnect.

Is there a way to stop this behavior? I'd like it to just wait until the server is up.

@hamishfagg
Copy link
Author

Come to think of it, I think there are two issues here.

  1. Wifi goes down. AiLight reaches WIFI_RECONNECT_TIMEOUT and goes into AP mode. I can fix this using this setting.

  2. MQTT goes unreachable wtih good wifi and AiLight gets 'stuck' not connected (I have to reboot it). I'm still looking into why this is.

@stelgenhof
Copy link
Owner

Hi,

Indeed the WIFI_RECONNECT_TIMEOUT directive can be set to a value that suits your situation best. I agree that the handling of these situation is not the best, but at the time this was kind of a good compromise :)

As far MQTT goes I haven't had any issues with that (being unavailable), so would need to do some more testing.

If you have some more information (i.e. logs, error messages, etc.) that would be great.

Cheers! Sacha

@hamishfagg
Copy link
Author

Yes I need to test more for the MQTT availability problem. It normally happens when my server crashes which is not often.

P.S. Can I set WIFI_RECONNECT_TIMEOUT to 0 for infinite? I tried to see how it's used exactly but couldn't figure it out.

@donkawechico
Copy link
Contributor

I've been having a very similar-sounding issue ever since updating my firmware (from some 2017 version to 0.61-dev).

When I restart my router, the bulbs show up on my wifi list, and then never reconnect to my router until I do a power cycle.

After the power cycle, they show up with their stock hostnames (e.g. "ESP_2EA955") even though they have custom hostnames in their settings. When I go to "http://ESP_2EA955/" and restart the bulb, they then appear correctly with their custom hostnames.

I'm going to try upping the WIFI_RECONNECT_TIMEOUT setting later tonight and will report back here. If I experience any lingering MQTT availability issues (as @IVData is experiencing) I'll be sure to report that as well.

@stelgenhof
Copy link
Owner

@donkawechico Thanks for the update. I haven't experienced it myself. Would be good to any feedback from your tests, so I can see what may go wrong or how to prevent this.

@donkawechico
Copy link
Contributor

donkawechico commented Aug 18, 2019

@stelgenhof I resolved the WiFi issue by setting the WIFI_RECONNECT_TIMEOUT to 180 seconds.

However, the bulbs do indeed appear to have lost their subscription to the MQTT topics, just like @IVData reports.

If I'm reading the light.ino code correctly, the bulb sees a MQTT_EVENT_DISCONNECT event and immediately unsubscribes from the command topic. Then, only if there's wifi at the moment of the disconnection, it will start a timed callback that reconnects after MQTT_RECONNECT_TIME seconds. But in this scenario, there isn't any wifi, so the reconnect callback never gets registered.

At first, I assumed there must be a path in the wifi connection routine that triggers an MQTT subscription, but I didn't see one. But, I'm also wildly unfamiliar with this code so I'm probably missing something.

My gut says this is a bug, and that the solution is to put something into setupWifi that triggers an mqtt subscription to the command topic.

I don't have any logs to show you, as I don't really know how to get them from the bulbs during the critical time when they're disconnected from wifi.

@donkawechico
Copy link
Contributor

Or perhaps the better solution is for light.ino (or _mqtt.ino?) to listen for the Wifi connected event and resubscribe to its topics?

@stelgenhof
Copy link
Owner

@donkawechico Thanks for the feedback! Indeed the implementation is a bit crude and apparently doesn't cover the situation when both WiFi and MQTT are not connected. Not so much of a bug but rather a lazy/limited implementation :)

I need to do a bit more testing/investigation on my side. My first thought is that I will need to do some small refactoring; likely will add some kind of retry process similar to Espurna is doing.

Cheers! Sacha

@donkawechico
Copy link
Contributor

donkawechico commented Sep 18, 2019

By the way, I think the issue here isn't necessarily (or at least, isn't totally) caused by AILight firmware. I think it may have something to do with MQTT server keepalive timeout expiring before Wifi comes back fully.

That said, I do think AiLight firmware can help the issue by fixing/tweaking the reconnect logic to force back the subscription.

One heavy-handed solution here (if one is hosting their mosquitto service on a rPi or other dedicated system) might be to convert the rPi into a hotspot and have the bulbs connect directly to that rather than to the wifi router. That way the internet can go down entirely and the bulbs would still happily stay connected to mosquitto.

@stelgenhof
Copy link
Owner

Hi all,

I managed to do some preliminary changes that will check at regular intervals if the connection to the MQTT server is still present.

If you like to try these out, please check the branch https://github.com/stelgenhof/AiLight/tree/feature/mqtt_reconnect and compile/upload this version.
Although I tested it and can confirm it works, be aware that issues may appear.

Cheers! Sacha

@stelgenhof stelgenhof added this to the 0.7 milestone Dec 23, 2019
@stelgenhof
Copy link
Owner

@IVData @donkawechico The 'develop' branch contains new changes that addresses this issue with MQTT. The code is not released yet but will be in the upcoming v0.7 version.

If you have time, would appreciate it if you could give this branch a try.

Cheers! Sacha

@hamishfagg
Copy link
Author

hamishfagg commented Jan 1, 2020

@stelgenhof unfortunately I have the same problem as the guy in #66. MQTT commands dont work at all after the change to dev branch. The webUI does work to turn the light on/change colors though.

EDIT: Just checked in a mqtt client and changing the light through the webUI does not result in any messages being sent to the mqtt server.

@stelgenhof stelgenhof added bug and removed enhancement labels Jan 2, 2020
@stelgenhof
Copy link
Owner

@IVData Just made some changes in the 'develop' branch. I could replicate the issue that MQTT wasn't working. Basically a connection to the MQTT broker was never made in the code due to a change I made previously to the order when the connections (WiFi and MQTT) are made. I reverted that and should work now. I tested it on two different bulbs and a ESP8266 module.

@donkawechico
Copy link
Contributor

@stelgenhof Thank you for continuing to work on this issue! I just returned from the holidays, and will try out the dev branch as soon as I have some spare time (which, sadly, is likely about 2 weeks from today).

BTW, adding a "restart" operation to the HTTP api would be super helpful for situations like this. Currently, I have to manually log in to each bulb and click "Restart" when they lose connection to broker.

@hamishfagg
Copy link
Author

@stelgenhof just installed the new version on both of my AiLights and they are working. Btw, src/html.gz.h is now missing, and it wouldn't compile without adding it back in from the older version.

Working for now, I will have to wait for another server issue to test the new features.

@stelgenhof
Copy link
Owner

@IVData Glad to hear it is working now!

As for the html.gz.h it is not in the develop branch of the repository as generally this file needs to be built during the compilation. If this file is missing, it means you haven't installed the NodeJS (NPM) dependencies yet. Just run the following command: npm install (or yarn install) in your repository folder.

Note that this html.gz.h will for sure be included in a normal release of the firmware, but for the develop branch I sometimes don't add it :)

I've also updated the Wiki to add a step about installing the NPM dependencies; it was missing.

Cheers! Sacha

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@hamishfagg
Copy link
Author

I've been using this new version ever since and I've had a few server hiccups since then. Not a single issue with the lights since the new version.

@stelgenhof
Copy link
Owner

@IVData Happy to hear that! There are still a few kinks in the new version though :)

@stelgenhof stelgenhof added the bugfix Fix to resolve an internal or dependency bug label Mar 4, 2020
@github-actions
Copy link

github-actions bot commented Apr 4, 2020

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@stelgenhof stelgenhof added release Issues that are set to be fixed in a release and removed no-issue-activity labels Apr 4, 2020
@stelgenhof
Copy link
Owner

Closing as considered resolved in v1.0.0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug bugfix Fix to resolve an internal or dependency bug release Issues that are set to be fixed in a release
Projects
None yet
Development

No branches or pull requests

3 participants