-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random Disconnect (IDFGH-1968) #102
Comments
I'm running into similar issues where From some basic debugging I did I've found that this is seemingly caused by concurrent calls to It seems that we may both be facing the issue described in #90 and maybe others, and while not identical, I do think it basically boils down do the fact that the MQTT client is not thread-safe yet, yet does some thread operations itself. It seems that this recent commit might fix that issue though as, if I understand it correctly, it introduces locking and prevents two threads/tasks from writing to it at the same time: 752953d Update: I've done some preliminary testing and it seems that this commit fixes the issue I was seeing, I am no longer able to break the connection by rapidly sending requests. |
More info on the issue: @bverhoeven we could be on something... I had two (similar, but different) issues. First a little information:
The 1st problem was related to the async nature of the MQTT lib.
The 2nd is worse, and not yet resolved. The connection is dropping if I receive a QoS2 message (again) and during the 4-way handshake I decided to publish some data (again QoS2).
This event should only be triggered after completing the handshake of the selected QoS. I will look at the commit you mention, and if it solves this second issue i will backport it to the ESP-MQTT_FOR_IDF_3.1 tag. But I still think that a different approach is needed... |
Hi @homeit-hq It seems the issues you are experiencing are caused by the library not being thread safe in the referenced version. The version you are using with 3.1 has many issues which have been resolved for IDF 3.2 and it's not so easy to backport due to updates in tcp_transport which became a component in 3.2. However the commit 752953d does seem to be cherry-picked. Have you tried to test it with this change? |
FYI, let me attach a patch here applying this "locking" commit to 0001-added-mqtt-api-locks-so-methods-can-be-executed-from.patch.txt |
Hi @homeit-hq Any update about this issue? Have you had a chance to test the patch? Or better than that, use IDF 3.2? |
Just checking in to say that I am experiencing very similar random disconnects in the ESP IDF v3.3 I'm running a standard mosquitto broker on the Raspberry Pi, have tried connecting via MQTT and WebSockets, and on both connection types I get this error after a random number of publishes or message receiveds:
I do not send a lot of messages (it's somewhere in the 1Hz range), but it does appear fairly frequently and is quite a nuisance since reconnecting takes a while with default settings, and no way to change reconnect delay aside from tampering with the mqtt_config.h I also sometimes notice that a lot of ESP32s disconnect simultaneously, however, this does NOT happen with any other MQTT client connected to the same broker. |
I have the same problem, disconnects every few hours. Log from my ESP:
Notice the Read error or end of stream. At the same time on my Mosquitto log from my Raspberry Pi:
Notice the Socket error on client ESP32_c10560, disconnecting. Using:
Moved away from knolleary/pubsubclient because that would lock up my ESP32. This library has it's own problems though... |
Let's try test.mosquitto.org then. After 2 hours running fine, suddenly the watchdog:
What's going on here? |
@wijnsema, since this isn't an actual disconnect from MQTT it's probably worth another Issue. Yours could be some kind of weird deadlock situation between the two threads, but since RSSI is involved it might not be MQTTs fault? |
@wijnsema @Xasin Thanks for sharing these disconnecttion problems. Will soon update submodules for all IDF releases as said #135 |
Thanks for picking this up @david-cermak! Just started testing with master of esp-mqtt, within esp-idf 3.3. Test running for 10 minutes now, so far so good... |
@david-cermak, I fear my ESP still has problems with the network. Here's the log: I made sure to re-flash the ESP with the latest esp-mqtt version pulled from master. The problem always seems to occur when sending a packet and can be exacerbated by either sending or receiving more packets in a short amount of time. The packets I send/receive on a regular basis are mostly QOS 0, however, a few QOS 1 packets are present. |
@Xasin Can you please check logs from the broker to see the reason for disconnection? |
I'm checking the mosquitto log right now, just restarted it with
Idea: Although I am just noticing in the timestamps of the log ... |
This should not be an issue as control packets seem frequent enough so PINGREQs are not needed (see keep-alive spec) So you are actually saying the messages are published (no errors) according to the ESP log, but some of them won't appear in the broker's log -- so will not be delivered, correct? |
Just an interim score from my testlab: no disconnects after 4 hours, this version definitely seems to be an improvement! I will keep this running for now. |
It seems so, yes.
The ESP MQTT layer definitely stops sending out messages for 15 seconds, including any Keep-Alive signals. There does not seem to be a packet delivery failure though, which makes this even weirder. |
Nope, my own code seems to be working fine. Edit |
@Xasin This library does not silently drop packets, if no message appears in the log, it means published data written ok to the socket. One exception, though, when the client is not connected and QoS>0, then messages are not sent out to network, just stored in outbox (to be resent later) -- but not your case as the client IS connected. Could you please insert a debug message to mqtt_write_data() to check the data are successfully written to the network? |
After running for more than 22 hours without disconnects I dare to say this is stable now. Hopefully the updates and fixes of this library quickly make it to the mainsteam versions of ESP-IDF because frankly, without them this library is unusable IMHO. Thanks for helping @david-cermak and @Xasin good luck with fixing your disconnects. |
A first assessment of this shows that the MQTT library might not be at fault. I'll try running the MQTT connection to a public test broker to rule out the server as the source of the problem - but keep in mind that no other MQTT client suffers these kinds of disconnects on the exact same broker. |
Alright, I think I gotta move to a different culprit when it comes to this problem. This could be a weird artefact of the WiFi network that I am using, during which the ESP does not actually lose WiFi connection, but does loose connection with the MQTT server itself. For clarification, the WiFi AP I am using is the network hosted by the Raspberry Pi Zero itself, which is running in Dual AP/STA mode and is simultaneously connected to the local WiFi (giving it internet access) Sorry for the inconvenience, and thanks a lot for the good help here. |
Thank you @Xasin and @wijnsema for your updates, good to hear that random disconnects got resolved! |
Closing as this was resolved and the referring commit was backported to all IDF releases |
TAG: ESP-MQTT_FOR_IDF_3.1
Hi, i'm using this lib to connect to a MQTT broker using SSL.
The problem i'm facing is that sometimes i get random disconnects.
Sometimes i get
TRANS_SSL: mbedtls_ssl_write error, errno=Success MQTT_CLIENT: Error write data or timeout, written len = -27648
other i get the
TRANS_SSL: mbedtls_ssl_write error, errno=Bad file number MQTT_CLIENT: Error write data or timeout, written len = -27648
This is the log (full verbose mode) for the "Success" error.
I've checked and I'm only calling/using the lib through the same task.
Can you help me?
thanks
The text was updated successfully, but these errors were encountered: