Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loses connection after some time without noticing #395

Closed
Flo-Kra opened this issue Feb 14, 2018 · 5 comments
Closed

loses connection after some time without noticing #395

Flo-Kra opened this issue Feb 14, 2018 · 5 comments

Comments

@Flo-Kra
Copy link

Flo-Kra commented Feb 14, 2018

Hi there!
I am am currently working on a ESP8266 (Wemos D1) based WiFi enabled thermostat using pubsubclients for the MQTT stuff. I already used it in some smaller projects with no major issues (except from some random disconnects, but the connection gets of course reestablished there every time). I started with the mqtt_reconnect_nonblocking.ino example but since it does block the sketch when the MQTT broker is down, I added some logic to increase the reconnect interval if the connection can´t be established 3 times.
Now I have the issue, that after some time - about 2 hours - the connection does not work any more, but the reconnect never takes place. It seems that the client.connected() method always returns true. It does not receive and publish values any more, but the rest of the sketch works as expected. I can even reconnect MQTT via a HTTP-call I implemented, and then it works again for ~2h.
Is there a bug, which leads to not returning the actual connection state in some cases? Any ideas how I could improve detection of a stalled connection? I already thought of publishing a "heartbeat" message every some minutes to one of the subcribed topics and reset the connection if it does not arrive for some time, but I don´t really like that as long as I don´t know what actually causes this issue.
I already read literally all issues on Github regarding connection problems and tried some of the hints described, but actually they were outdated, as the solution mostly recommended is now already in the examples as mqtt_reconnect_nonblocking.ino.

This is my current code for MQTT stuff where mqttClientInit() is called in setup() and mqttHandleConnection() is called in loop().

Maybe someone has an idea what I am doing wrong or what´s caused about? Does someone else experience the same issue?


bool mqttReconnect() {
  mqttclient.disconnect();
  delay(5);
  String mqttClientId = "ESP8266Client-";   // Create a random MQTT client ID
  mqttClientId += String(random(0xffff), HEX);
  Serial.print("connecting to MQTT broker");

  bool connRes;
  connRes = mqttclient.connect(mqttClientId.c_str());
  Serial.print("... attempt: ");
  Serial.println(mqttReconnectAttempts);

  if (connRes) {
    Serial.print("MQTT connected. Reconnects: ");
    Serial.println(mqttReconnects);
    mqttReconnects++;

    Serial.println("Subscribed to:");
    mqttclient.subscribe(mqtt_topic_in);
  }
  else {
    Serial.print("MQTT connect FAILED, rc=");
    Serial.println(mqttclient.state());
  }
  return mqttclient.connected();
}

void mqttClientInit() {
  mqttclient.setServer(mqtt_server, mqtt_port);
  mqttclient.setCallback(mqttCallback);
  mqttLastReconnectAttempt = 0;
  mqttReconnectAttempts = 0;
}

void mqttHandleConnection() {
  if ( WiFi.status() == WL_CONNECTED ) {
    if (!mqttclient.connected()) {
      unsigned int mqttReconnectAttemptDelay;
      if (mqttReconnectAttempts < 3) mqttReconnectAttemptDelay = 15000;
      else mqttReconnectAttemptDelay = 60000;

      if ((millis() - mqttLastReconnectAttempt) > mqttReconnectAttemptDelay) {
        mqttLastReconnectAttempt = millis();
        mqttReconnectAttempts++;
        if (mqttReconnect()) {
          mqttReconnectAttempts = 0;
        }
      }
    }
    else {
      mqttclient.loop();
    }
  }
}

thank you!
Florian

@Flo-Kra
Copy link
Author

Flo-Kra commented Feb 15, 2018

Meanwhile I found out that it´s actually not an issue with PubSubClient but with the WiFi connection.
It seems to drop after some time, but does not properly reconnect.
Whenever it happens the HTTP server on the ESP is still accessable, but WiFi.status() returns WL_DISCONNECTED instead of WL_CONNECTED from there on, so the PubSubClient reconnect routine never gets executed.
Possibly it is a bug of the WiFi manager library I am using in this sketch but I am still investigating. For now I´ll try without that WiFi manager as the ESP8266 core library should be able to take care of that itself - I just used the lib for initial configuration via AP mode which is not that important at the moment.

@Flo-Kra
Copy link
Author

Flo-Kra commented Feb 15, 2018

Just figured out, it´s a bug in the new ESP8266 Arduino core libraries v2.4.0....
I recently updated from 2.4.0-rc2 to 2.4.0 and only had these issues after that. Now downgraded again, recompiled and everything seems to work now.

More info here:
esp8266/Arduino#4166
esp8266/Arduino#4161

@Flo-Kra Flo-Kra closed this as completed Feb 15, 2018
@nedimhx
Copy link

nedimhx commented Nov 28, 2018

@Flo-Kra is there solution to fix this or only one to downgrade core ?

@Flo-Kra
Copy link
Author

Flo-Kra commented Nov 29, 2018

@Flo-Kra is there solution to fix this or only one to downgrade core ?

I think it was fixed as of core version 2.4.1 (current version is 2.4.2 now) but I didn´t yet test it - still using 2.4.0-rc2 for my own projects.

Also take a look here - I use this firmware on many devices and it runs very stable:
https://github.com/arendst/Sonoff-Tasmota/wiki/Theo's-Tasmota-Tips

The developer now recommends core 2.4.2 but some variants of the binaries he releases are still compiled with 2.3.0 as there seem to be new bugs in 2.4.2 (mainly affecting sleep mode, wifi seems to work again now). 2.4.1 has a memory leak bug in WiFiClient which was fixed with 2.4.2 according to the changelog ( https://github.com/esp8266/Arduino/releases ), so should be avoided.

@nedimhx
Copy link

nedimhx commented Nov 30, 2018

@Flo-Kra I'm using 2.4.2 but what I noticed it's not problem with WiFi .. I test it like this and what hapend is strange..

While NodeMCU is connected to mqtt and publishing informations... I have set to publish every information to separate topic ... topicname/DHTtemp, topicname/DHThum, topicname/DStemp...

Everything is working and once when I unplug lan cable from wifi router... NodeMCU is trying to publish about 4 seconds after no data packages at wifi connection, after that everything is stoped on one of publish line in code.. and other functions and lot of if else funcions are not working.. it seems like everything has stoped on pubsub client.publish when there is not internet connection..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants