Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT broker disconnection during large messages leads to endless data events (IDFGH-6859) #8482

Closed
Xtensa2C opened this issue Feb 28, 2022 · 7 comments
Assignees
Labels
Resolution: Done Issue is done internally Status: Resolved Issue is done internally

Comments

@Xtensa2C
Copy link

Environment

  • Development Kit: none
  • Module or chip used: ESP32-WROOM-32
  • IDF version: v4.3.1-dev
  • Build System: ESP IDF VSC Extension v1.3.0
  • Operating System: Windows 10
  • Using an IDE?: Visual Studio Code
  • Power Supply: USB

Problem Description:

While receiving large messages via MQTT with size > MQTT input buffer the data will be split in multiple data events.
However if the connection to the broker is interrupted (e.g. by teminating the broker) after the transfer started no mqtt error will occur but an infinite number of data events is triggered with data size = -1 and the last data received.

Expected Behavior

In case of a connection error an MQTT error event should be triggered and no further data events should occur.

Actual Behavior

The controller is trapped in an infinite loop of data events.

Steps to reproduce

  1. Flash the MQTT WS example (or the code below)
  2. Send a large MQTT message ~1MB and terminate the broker after starting the transfer

Additional remarks

It has been observed that this error does not occur if the connection is terminated close to the end of the transmission.

Example Code:

/* MQTT over Websockets Example

   This example code is in the Public Domain (or CC0 licensed, at your option.)

   Unless required by applicable law or agreed to in writing, this
   software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
   CONDITIONS OF ANY KIND, either express or implied.
*/
#include <stdio.h>
#include <stdint.h>
#include <stddef.h>
#include <string.h>
#include "esp_wifi.h"
#include "esp_system.h"
#include "nvs_flash.h"
#include "esp_event.h"
#include "esp_netif.h"
#include "protocol_examples_common.h"

#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "freertos/semphr.h"
#include "freertos/queue.h"

#include "lwip/sockets.h"
#include "lwip/dns.h"
#include "lwip/netdb.h"

#include "esp_log.h"
#include "mqtt_client.h"

static const char *TAG = "MQTTWS_EXAMPLE";

char mydata[20] = "";

static esp_err_t mqtt_event_handler_cb(esp_mqtt_event_handle_t event)
{
    esp_mqtt_client_handle_t client = event->client;
    int msg_id;
    // your_context_t *context = event->context;
    switch (event->event_id) {
        case MQTT_EVENT_CONNECTED:
            ESP_LOGI(TAG, "MQTT_EVENT_CONNECTED");
            msg_id = esp_mqtt_client_subscribe(client, "/test", 0);
            ESP_LOGI(TAG, "sent subscribe successful, msg_id=%d", msg_id);

            break;
        case MQTT_EVENT_DISCONNECTED:
            ESP_LOGI(TAG, "MQTT_EVENT_DISCONNECTED");
            break;
        case MQTT_EVENT_SUBSCRIBED:
            ESP_LOGI(TAG, "MQTT_EVENT_SUBSCRIBED, msg_id=%d", event->msg_id);
            break;
        case MQTT_EVENT_UNSUBSCRIBED:
            ESP_LOGI(TAG, "MQTT_EVENT_UNSUBSCRIBED, msg_id=%d", event->msg_id);
            break;
        case MQTT_EVENT_PUBLISHED:
            ESP_LOGI(TAG, "MQTT_EVENT_PUBLISHED, msg_id=%d", event->msg_id);
            break;
        case MQTT_EVENT_DATA:
            ESP_LOGI(TAG, "MQTT_EVENT_DATA");
            printf("TOPIC=%.*s\r\n", event->topic_len, event->topic);
            printf("DATA Size =%d\r\n", event->data_len);
            printf("DATA=%.*s\r\n", event->data_len, event->data);
            break;
        case MQTT_EVENT_ERROR:
            ESP_LOGI(TAG, "MQTT_EVENT_ERROR");
            break;
        default:
            ESP_LOGI(TAG, "Other event id:%d", event->event_id);
            break;
    }
    return ESP_OK;
}

static void mqtt_event_handler(void *handler_args, esp_event_base_t base, int32_t event_id, void *event_data) {
    ESP_LOGD(TAG, "Event dispatched from event loop base=%s, event_id=%d", base, event_id);
    mqtt_event_handler_cb(event_data);
}

static void mqtt_app_start(void)
{
    const esp_mqtt_client_config_t mqtt_cfg = {
        .host = "192.168.0.5",
        .port = 9001,
        .transport = MQTT_TRANSPORT_OVER_WS,
        .client_id = "ESP",
        .keepalive = 30,          
        .buffer_size = 2*1024,
        .out_buffer_size = 1024,
    };

    esp_mqtt_client_handle_t client = esp_mqtt_client_init(&mqtt_cfg);
    esp_mqtt_client_start(client);
    esp_mqtt_client_register_event(client, ESP_EVENT_ANY_ID, mqtt_event_handler, client);

}

void app_main(void)
{
    ESP_LOGI(TAG, "[APP] Startup..");
    ESP_LOGI(TAG, "[APP] Free memory: %d bytes", esp_get_free_heap_size());
    ESP_LOGI(TAG, "[APP] IDF version: %s", esp_get_idf_version());

    esp_log_level_set("*", ESP_LOG_INFO);
    esp_log_level_set("MQTT_CLIENT", ESP_LOG_VERBOSE);
    esp_log_level_set("MQTT_EXAMPLE", ESP_LOG_VERBOSE);
    esp_log_level_set("TRANSPORT_TCP", ESP_LOG_VERBOSE);
    esp_log_level_set("TRANSPORT_SSL", ESP_LOG_VERBOSE);
    esp_log_level_set("TRANSPORT_WS", ESP_LOG_VERBOSE);
    esp_log_level_set("TRANSPORT", ESP_LOG_VERBOSE);
    esp_log_level_set("OUTBOX", ESP_LOG_VERBOSE);

    ESP_ERROR_CHECK(nvs_flash_init());
    ESP_ERROR_CHECK(esp_netif_init());
    ESP_ERROR_CHECK(esp_event_loop_create_default());

    /* This helper function configures Wi-Fi or Ethernet, as selected in menuconfig.
     * Read "Establishing Wi-Fi or Ethernet Connection" section in
     * examples/protocols/README.md for more information about this function.
     */
    ESP_ERROR_CHECK(example_connect());

    mqtt_app_start();
}

Error log after terminating the transfer at the beginning:

.
.
.
TOPIC=
DATA Size =-1
DATA=�%�V�* �{V  �
���
E (290115) TRANS_TCP: tcp_poll_read select error 0, errno = Success, fd = 54
I (290125) MQTTWS_EXAMPLE: MQTT_EVENT_DATA
TOPIC=
DATA Size =-1
DATA=�%�V�* �{V  �
���
E (290135) TRANS_TCP: tcp_poll_read select error 0, errno = Success, fd = 54
I (290135) MQTTWS_EXAMPLE: MQTT_EVENT_DATA
TOPIC=
DATA Size =-1
DATA=�%�V�* �{V  �
���
E (290145) TRANS_TCP: tcp_poll_read select error 0, errno = Success, fd = 54
I (290155) MQTTWS_EXAMPLE: MQTT_EVENT_DATA
TOPIC=
DATA Size =-1
DATA=�%�V�* �{V  �
���
E (290165) TRANS_TCP: tcp_poll_read select error 0, errno = Success, fd = 54
I (290175) MQTTWS_EXAMPLE: MQTT_EVENT_DATA
TOPIC=
DATA Size =-1
DATA=�%�V�* �{V  �
���
E (290185) TRANS_TCP: tcp_poll_read select error 0, errno = Success, fd = 54
I (290185) MQTTWS_EXAMPLE: MQTT_EVENT_DATA
TOPIC=
DATA Size =-1
DATA=�%�V�* �{V  �
���
E (290195) TRANS_TCP: tcp_poll_read select error 0, errno = Success, fd = 54
I (290205) MQTTWS_EXAMPLE: MQTT_EVENT_DATA

Error log after terminating the transfer close to the end:

.
.
.
DATA=)�
I (23835) MQTTWS_EXAMPLE: MQTT_EVENT_DATA
TOPIC=
DATA Size =1648
DATA=�����Z˨���e��&$���% ����j��
I (23845) MQTTWS_EXAMPLE: MQTT_EVENT_DATA
TOPIC=
DATA Size =1440
DATA=�
I (23855) MQTTWS_EXAMPLE: MQTT_EVENT_DATA
TOPIC=
DATA Size =2048
DATA=����� �
I (23865) MQTTWS_EXAMPLE: MQTT_EVENT_DATA
TOPIC=
DATA Size =2048
DATA=믈����
I (23875) MQTTWS_EXAMPLE: MQTT_EVENT_DATA
TOPIC=
DATA Size =1729
DATA=�
E (25325) TRANS_TCP: tcp_poll_read select error 104, errno = Connection reset by peer, fd = 54
E (25325) MQTT_CLIENT: Poll read error: 119, aborting connection
I (25325) MQTTWS_EXAMPLE: MQTT_EVENT_DISCONNECTED
I (35335) MQTTWS_EXAMPLE: Other event id:7
E (35355) TRANS_TCP: [sock=54] delayed connect error: Connection reset by peer
E (35355) TRANSPORT_WS: Error connecting to host 192.168.0.5:9001
E (35365) MQTT_CLIENT: Error transport connect
@espressif-bot espressif-bot added the Status: Opened Issue is new label Feb 28, 2022
@github-actions github-actions bot changed the title MQTT broker disconnection during large messages leads to endless data events MQTT broker disconnection during large messages leads to endless data events (IDFGH-6859) Feb 28, 2022
@Elledici
Copy link

I stumbled over this problem for the same reson as Xtensa2C, while testing for robustness of OTA over MQTT.

The problem is in file mqtt_client.c, function deliver_publish(), lines 1004:

  • msg_data_len = esp_transport_read(client-> transport, (char *)client->mqtt_state.in_buffer,

and 1007:

  • if (msg_data_len <= 0) {

where msg_data_len is size_t (unsigned type) and esp_transport_read() returns an int which is negative in case of error.

A quick and dirty solution is forcing a conversion in line 1007:

  • if ((int)msg_data_len <= 0) {

but this quite hides the problem even more.

A better solution is defining msg_data_len and msg_topic_len as int and changing accordingly function signatures of mqtt_get_publish_topic() and mqtt_get_publish_data() (which, by the way, internally use only ints through other slippery implicit conversions).
Moreover msg_topic_len and msg_data_len are eventually copied to client->event.topic_len and client->event.data_len which are both ints.

@Elledici
Copy link

Sorry, forgot to specify that I'm referring to IDF v4.4 and first found the problem with IDF v4.3-beta3-131.

@gabsuren
Copy link
Collaborator

@Elledici thanks for reporting. The issues is in process of resolvent and will be fixed soon.

@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed and removed Status: In Progress Work is in progress labels Mar 24, 2022
david-cermak pushed a commit to espressif/esp-mqtt that referenced this issue Apr 12, 2022
@AxelLin
Copy link
Contributor

AxelLin commented Apr 16, 2022

@david-cermak
I think this needs a urgent fix for stable branches, otherwise it's just unstable when esp_transport_read() returns negative value.

@AxelLin
Copy link
Contributor

AxelLin commented May 12, 2022

@Elledici thanks for reporting. The issues is in process of resolvent and will be fixed soon.

The fix is still not available in any esp-idf branches.

@espressif-bot espressif-bot added Resolution: Done Issue is done internally Status: Resolved Issue is done internally and removed Status: Reviewing Issue is being reviewed labels May 27, 2022
david-cermak pushed a commit to espressif/esp-mqtt that referenced this issue Jun 7, 2022
@AxelLin
Copy link
Contributor

AxelLin commented Jun 19, 2022

v4.3.3 does not include this fix.
No idea why the backport fix is so slow. (This was fixed in master 2 months ago).

espressif-bot pushed a commit that referenced this issue Jun 22, 2022
* Update submodule: git log --oneline 985078affa8a2d2b56b87c8e6455252850f895c6..27eb4726067465c5c67d4ecdca5ddccd26f02580

Detailed description of the changes:
* MQTT: Fix signature matching for integer values (Backport to idf_v4.x)
  - See merge request espressif/esp-mqtt!133
  - Closes #8482
  - MQTT: Fix signature matching for integer values (espressif/esp-mqtt@f162002)
* ci: Deploy idf_v4.x branch to GitHub (espressif/esp-mqtt@ee5ecad)
@AxelLin
Copy link
Contributor

AxelLin commented Jul 13, 2022

@david-cermak v4.3 branch still needs fix.

espressif-bot pushed a commit that referenced this issue Jul 23, 2022
* Update submodule: git log --oneline 985078affa8a2d2b56b87c8e6455252850f895c6..27eb4726067465c5c67d4ecdca5ddccd26f02580

Detailed description of the changes:
* MQTT: Fix signature matching for integer values (Backport to idf_v4.x)
  - See merge request espressif/esp-mqtt!133
  - Closes #8482
  - MQTT: Fix signature matching for integer values (espressif/esp-mqtt@f162002)
* ci: Deploy idf_v4.x branch to GitHub (espressif/esp-mqtt@ee5ecad)
david-cermak added a commit to espressif/esp-mqtt that referenced this issue Dec 16, 2022
* Update submodule: git log --oneline 4874bab..64f88b4

Detailed description of the changes:
* Fix documentation of config struct
  - See merge request espressif/esp-mqtt!129
  - See commit e31834c
* Changes the moment we update keepalive_tick.
  - See merge request espressif/esp-mqtt!127
  - See commit 2c2e6f3
* MQTT: Fix signature matching for some integer values
  - See merge request espressif/esp-mqtt!128
  - Closes espressif/esp-idf#8482
  - MQTT: Fix signature matching for integer values (6b794e4)
* Make the mqtt submodule logging tags lower case
  - See merge request espressif/esp-mqtt!122
  - See commit fb3184c

Closes espressif/esp-idf#8482
Closes espressif/esp-idf#8550
egnor pushed a commit to egnor/esp-mqtt that referenced this issue Dec 23, 2022
* Update submodule: git log --oneline 4874bab..64f88b4

Detailed description of the changes:
* Fix documentation of config struct
  - See merge request espressif/esp-mqtt!129
  - See commit espressif@e31834c
* Changes the moment we update keepalive_tick.
  - See merge request espressif/esp-mqtt!127
  - See commit espressif@2c2e6f3
* MQTT: Fix signature matching for some integer values
  - See merge request espressif/esp-mqtt!128
  - Closes espressif/esp-idf#8482
  - MQTT: Fix signature matching for integer values (espressif/esp-mqtt@6b794e4)
* Make the mqtt submodule logging tags lower case
  - See merge request espressif/esp-mqtt!122
  - See commit espressif@fb3184c

Closes espressif/esp-idf#8482
Closes espressif/esp-idf#8550
egnor pushed a commit to egnor/esp-mqtt that referenced this issue Dec 23, 2022
* Update submodule: git log --oneline 4874bab..64f88b4

Detailed description of the changes:
* Fix documentation of config struct
  - See merge request espressif/esp-mqtt!129
  - See commit espressif@e31834c
* Changes the moment we update keepalive_tick.
  - See merge request espressif/esp-mqtt!127
  - See commit espressif@2c2e6f3
* MQTT: Fix signature matching for some integer values
  - See merge request espressif/esp-mqtt!128
  - Closes espressif/esp-idf#8482
  - MQTT: Fix signature matching for integer values (espressif/esp-mqtt@6b794e4)
* Make the mqtt submodule logging tags lower case
  - See merge request espressif/esp-mqtt!122
  - See commit espressif@fb3184c

Closes espressif/esp-idf#8482
Closes espressif/esp-idf#8550
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Resolved Issue is done internally
Projects
None yet
Development

No branches or pull requests

6 participants