-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spurious disconnect when Homie connects over TLS/SSL #46
Comments
The following sketch is basically a copy of your #include <Homie.h>
const int PIN_RELAY = 16;
HomieNode lightNode("light", "switch");
HomieNode pingNode("ping", "pong");
bool lightOnHandler(String message) {
if (message == "true") {
digitalWrite(PIN_RELAY, HIGH);
Homie.setNodeProperty(lightNode, "on", "true"); // Update the state of the light
Serial.println("Light is on");
} else if (message == "false") {
digitalWrite(PIN_RELAY, LOW);
Homie.setNodeProperty(lightNode, "on", "false");
Serial.println("Light is off");
} else {
return false;
}
return true;
}
bool pingHandler(String message) {
Serial.println("ping here");
Homie.setNodeProperty(pingNode, "pong", "Bonjour!");
return true;
}
void setup() {
pinMode(PIN_RELAY, OUTPUT);
digitalWrite(PIN_RELAY, LOW);
Homie.setFirmware("sub2-relay", "1.0.0");
lightNode.subscribe("on", lightOnHandler);
pingNode.subscribe("on", pingHandler);
Homie.registerNode(lightNode);
Homie.registerNode(pingNode);
Homie.setup();
}
void loop() {
Homie.loop();
} When run over a plain MQTT connection it works. When running over an TLS connection the device disconnects and reconnects randomly and seemingly doesn't react at all (neither to light nor to ping). I'm hoping this can help you diagnose the issue. |
The issue happening obviously because of the heap, I'll try to remove most of the dynamic allocations and instead allocate on the stack. The good thing is I am learning a lot resolving this issue! |
Could you please try again? It seems stable on my side after 2be5da0 |
I'm sorry to disappoint you, but no change here. In plain mode all works fine as far as I can tell. In TLS mode, the moment I try to talk to the device (e.g. relay on), it disconnects. |
So, testing with the above sketch on TLS again, this time a bit more carefully. The screenshot shows what happens initially: In fact, I don't even have to try and switch "light on"; the device disconnects by itself quite periodically, whereupon the broker logs
The first couple minutes of the socket errors seen on the broker in human readable times show this happens about every 5 seconds:
|
If I comment out the two lines in above sketch: // pingNode.subscribe("on", pingHandler);
// Homie.registerNode(pingNode); the device is much more stable: after the initial disconnect it stays up and I can talk to it. |
There are still some dynamic allocations done:
A solution would be to completely avoid these dynamic allocations, but it would mean setting hard limits in the code:
... etc. If this does not solve the issue, then the problem is that the framework is oversized for the ESP8266 |
I think it's a case of memory being cloberred somewhere ... When you said Limit the max friendly $name of a device, I thought "Hey!", and changed
followed by this: though not readily reproducible. |
I've been going backwards in largish steps to commit f6ee49a to see if this has anything to do with me showing up on the scene here. It doesn't. 😄 All the time we've spoken so far, I was testing with a Wemos-D1 mini. I didn't really expect it would make a difference, but the NodeMCU (Amica) behaves identically: the above sketch just won't behave properly. |
To be honest, when I implemented SSL I never tested it. SSL encryption takes up more than a quarter of the available heap, so that does not surprise me. |
Does the following information help at all? Of course I realize I'm comparing apples and potatoes; I'm just trying to be helpful... Screenshot of memory requirements for Screenshot of memory requirements for a simple MQTT over TLS using Imroy/pubsubclient: |
And you are helpful! But this piece of information shows stack usage, not heap, which is our problem here. I am working on moving everything to the stack in a clean way. The problem is this will be a breaking change. |
Well, in my possibly not very humble opinion, you should go ahead and break it so that it's fixed! 😄 |
You're right! |
It's not a breaking change, so good news, and the problematic sketch seems to work. Could you please test? |
By the way, maybe fingerprint is fixed too? |
The good news is, it is much better, the bad news is, it is not good. I have the following observations:
Looking slightly closer, I notice that you've changed the way subscriptions work. Contrary to previously where individual subscribes were issued, I now see a single wild-card subscription:
I think Homie is literally killing itself! My previous experience with PubSubClient on Arduino shows that if I send out a few messages "en block" it just dies. In this particular case, Homie is sending out several publishes, eg. during setupHandler, and they're being bounced back to the device; I think it then just dies. I will try, time permitting, to observe this a little closer, but I believe that ought to be the reason... Be that as it may, congrats so far! |
What would probably help a bit to mitigate the issue is to narrow-down the wildcard to something like this:
This would prevent most publishes from a Homie device being sent back to the same Homie device. I'm 3/4 or 7/8 convinced the device is hanging itself up due, not to your code, but to PubSubClient. |
You're right, the wildcard subscription change is due to an issue with PubSubClient when subscribing to 5 topics or more. Subscribing to only |
I've had a Homie device running since my last comment (approx 24h) on TLS, and it's still publishing three messages every 10s:
I do note, however, that it periodically disconnects and then reconnects. The broker logs
|
I can't optimize things more than this, if it's still an issue, it has to be solved in PubSubClient or in the Arduino for ESP8266 SSL implementation. 😕 |
I have an idea. By default, the ESP runs at 80MHz. I have seen projects running at 160MHz when using SSL. With twice the power of calculation, maybe the |
As long as the chip then doesn't burn a hole through my desk, I'd say "let's try it"! ;-) |
Could you please try your "heavy" sketch? |
One word: SHIP IT! (Ok, those were two words, but you get the drift.)
You've done a tremendous job, thank you. I would like to recommend adding something like this to the Wiki: FingerprintAdding a TLS fingerprint effectively pins the device to a particular certificate. Furthermore, as currently implemented by WifiSecureClient(), both
enabling fingerprint in Homie will work only if |
Great! We made it, thanks a lot for your tests, it helped so much. I also added a note on the docs. Note that the Wiki is no more used, because the Wiki was up-to-date with the latest git rev, and so people using an older version were not able to have instructions related to their version. So now docs are embedded into the repo. 😉 |
You said you were monitoring for 24h, is it still alright? |
I didn't want to bother you with it ... ;-) Yes, it's basically alright. I've had the following reconnects during the past 24h, but much better than it was:
So, I'd say it's fine. |
Awesome! Maybe the 2.2.0 release of the Arduino for ESP8266 will fix things even more, but I can't do more in Homie. Can we close? |
Fermé! ;-) |
As first reported in #44 I'm seeing a disconnect when Homie connects via TLS (this does not occur in plain connections). Screenshot:
Server-side, the MQTT broker identifies an SSL handshake error, even after it sees the initial publishes from the device:
It would appear to me there's something corrupt going over the wire.
After the second connection succeeds, the Homie device carries on happily publishing for hours on end. This behaviour is reproducible.
The text was updated successfully, but these errors were encountered: