-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESP-MQTT publish / disconnect race condition (IDFGH-573) #2975
Comments
Hi @mikaelkanstrup Current version of mqtt library does not yet support publishing from a different thread. This should be fixed in next few weeks. |
@david-cermak Oh ok. Thanks a lot for clarifying! Do you want me to close this issue as this "works as designed"? I did a naive patch protecting just the abort_connection + publish case with a semaphore. I think that'll do for me until official support is there. |
Please keep this open to help tracking the issue. It would be closed once a fix is merged. Yes, adding a mutex to protect the api should be enough to workaround the issue for now. |
closes #67, closes #90, closes espressif/esp-idf#2975
…ng/receiving different data and references esp-mqtt commits to pass these tests testing conditions: transports (tcp, ssl, ws..) qos (0, 1, 2) short repeated messages (packed packets) oversized messages (fragmented packets) publish from a different thread Closes espressif/esp-idf#2870 by means of including commit 815623d from esp-mqtt Closes espressif/esp-idf#2975 by means of including commit 752953d from esp-mqtt Closes espressif/esp-idf#2850 by means of including commits df455d2 17fd713 from esp-mqtt
…ng/receiving different data and references esp-mqtt commits to pass these tests testing conditions: transports (tcp, ssl, ws..) qos (0, 1, 2) short repeated messages (packed packets) oversized messages (fragmented packets) publish from a different thread Closes espressif/esp-idf#2870 by means of including commit 815623d from esp-mqtt Closes espressif/esp-idf#2975 by means of including commit 752953d from esp-mqtt Closes espressif/esp-idf#2850 by means of including commits df455d2 17fd713 from esp-mqtt
…ng/receiving different data and references esp-mqtt commits to pass these tests testing conditions: transports (tcp, ssl, ws..) qos (0, 1, 2) short repeated messages (packed packets) oversized messages (fragmented packets) publish from a different thread Closes espressif/esp-idf#2870 by means of including commit 815623d from esp-mqtt Closes espressif/esp-idf#2975 by means of including commit 752953d from esp-mqtt Closes espressif/esp-idf#2850 by means of including commits df455d2 17fd713 from esp-mqtt
Environment
git describe --tags
to find it): v3.3-beta1-212-g1ad8b96xtensa-esp32-elf-gcc --version
to find it): 1.22.0-80-g6c4433aProblem Description
I'm experiencing crashes on wifi disconnect in an application that uses ESP-MQTT. Digging into the implementation of ESP-MQTT and mqtt_client.c I trace these crashes down to a race condition between ESP-MQTT internal task mqtt_task and my application task.
In a simple flow the application runs:
Looking at the sources of mqtt_client.c when esp_mqtt_client_start is called an internal task called mqtt_task is started. The mqtt_task is handling initialization, mqtt connect, reconnect, states, ping etc..
When the mqtt connection is established the mqtt_task is mostly waiting in a mqtt_process_receive call.
I'm struggling to understand how the API is meant to be used without running into race conditions between the application task and the internal mqtt_task.
Simplified I have an application that looks like this:
In my scenario the wifi disconnect flow is problematic where my application task and the internal mqtt_task race for:
mqtt_task:
mqtt_process_receive returns an error due to wifi disconnect and calls esp_mqtt_abort_connection which in turn updates the internal state, closes down the transport etc.
my app task:
Checks that mqtt is connected (based on mqtt callback handler status) and calls esp_mqtt_client_publish
Having my application deployed on several devices with coredump function enabled I find lots of crashes with a callstack like:
From what I can tell there's no thread protection at all in mqtt_client.c. Before digging deeply into esp_transport, mbedtls etc to try and find the cause for the crashes I'd like to know how publishing using ESP-MQTT API is meant to be safely done?
All ESP-IDF example code perform the esp_mqtt_client_publish inside mqtt_event_handler meaning in the context of the internal mqtt_task so here there's no race. But that's not really feasible for any application publishing more than one message per connection.
Expected Behavior
Publishing from my application task (the one calling esp_mqtt_client_start) is safe.
Actual Behavior
Publishing from my application task (the one calling esp_mqtt_client_start) is racing with internal mqtt_task and sometimes causing crashes on wifi disconnect.
Steps to reproduce
I've not been able to create a simple example application to trigger the crash but looking at the source code I either misunderstood the API completely or there's a design flaw making it unsafe/unusable.
The text was updated successfully, but these errors were encountered: