Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: MQTT over TLS runs for a minute then stops publishing #3496

Open
bmidgley opened this issue Mar 26, 2024 · 4 comments
Open

[Bug]: MQTT over TLS runs for a minute then stops publishing #3496

bmidgley opened this issue Mar 26, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@bmidgley
Copy link

Category

Other

Hardware

T-Beam

Firmware Version

2.2.24

Description

Publishing on mqtt using TLS runs briefly (under a minute), then stops publishing. It appears to be a memory issue.

Relevant log output

INFO  | 13:59:58 366 [mqtt] Using TLS-encrypted session
INFO  | 13:59:58 366 [mqtt] Attempting to connect directly to MQTT server xx, port: 8883, username: xx, password: xx
[366957][D][ssl_client.cpp:176] start_ssl_client(): WARNING: Skipping SSL Verification. INSECURE!
[366958][E][ssl_client.cpp:37] _handle_error(): [start_ssl_client():264]: (-32512) SSL - Memory allocation failed
[366965][E][WiFiClientSecure.cpp:144] connect(): start_ssl_client: -32512
ERROR | 13:59:58 366 [mqtt] Failed to contact MQTT server directly (3/5)...
@bmidgley bmidgley added the bug Something isn't working label Mar 26, 2024
@bmidgley
Copy link
Author

bmidgley commented Mar 30, 2024

The t-deck (2.3.2) isn't having this problem with tls, so the extra ram in the s3 might be what fixes it.

I tried both the t-deck and t-beam in 2.3.2 and the t-deck works but the t-beam works off and on. I'll keep testing.

@oherrala
Copy link

oherrala commented Apr 20, 2024

I have LILYGO T3-S3 for 868 MHz and with 2.3.2, 2.3.3, 2.3.4 and 2.3.61 the connection attempt to MQTT server immediately reboots the node. MQTT server is using TLS.

This is what I see from USB serial port:

DEBUG | ??:??:?? 10 WiFi-Event 4: Connected to access point
DEBUG | ??:??:?? 11 WiFi-Event 7: Obtained IP address: <IP REDACTED>
INFO  | ??:??:?? 11 Starting network services
INFO  | ??:??:?? 11 mDNS responder started
INFO  | ??:??:?? 11 mDNS Host: Meshtastic.local
INFO  | ??:??:?? 11 Starting NTP time client
DEBUG | ??:??:?? 11 Initializing Web Server ...
INFO  | ??:??:?? 11 Starting Secure Web Server...
INFO  | ??:??:?? 11 Starting Insecure Web Server...
INFO  | ??:??:?? 11 Web Servers Ready! :-) 
INFO  | ??:??:?? 11 API server listening on TCP port 4403
INFO  | ??:??:?? 11 [WebServerThread] Using TLS-encrypted session
INFO  | ??:??:?? 11 [WebServerThread] Attempting to connect directly to MQTT server <IP REDACTED>, port: 8883, username: <USER REDACTED>, password: <PW REDACTED>

After the last line the serial connection is lost and the device reboots itself to same state and reboots again and again.

Without MQTT over TLS the client at least attempts to do connection:

DEBUG | ??:??:?? 10 WiFi-Event 4: Connected to access point
DEBUG | ??:??:?? 10 WiFi-Event 7: Obtained IP address: <IP REDACTED>
INFO  | ??:??:?? 10 Starting network services
INFO  | ??:??:?? 10 mDNS responder started
INFO  | ??:??:?? 10 mDNS Host: Meshtastic.local
INFO  | ??:??:?? 10 Starting NTP time client
DEBUG | ??:??:?? 10 Initializing Web Server ...
INFO  | ??:??:?? 10 Starting Secure Web Server...
INFO  | ??:??:?? 10 Starting Insecure Web Server...
INFO  | ??:??:?? 10 Web Servers Ready! :-) 
INFO  | ??:??:?? 10 API server listening on TCP port 4403
INFO  | ??:??:?? 10 [WebServerThread] Using non-TLS-encrypted session
INFO  | ??:??:?? 10 [WebServerThread] Attempting to connect directly to MQTT server <IP REDACTED>, port: 1883, username: <USER REDACTED>, password: <PW REDACTED>
DEBUG | ??:??:?? 10 [WifiConnect] Updating NTP time from 0.pool.ntp.org
DEBUG | ??:??:?? 11 [WifiConnect] NTP Request Success - Setting RTCQualityNTP if needed
DEBUG | ??:??:?? 11 [WifiConnect] Upgrading time to quality 3
DEBUG | 14:51:49 11 [WifiConnect] Read RTC time as 1713624709
ERROR | 14:51:50 13 Failed to contact MQTT server directly (1/5)...
INFO  | 14:51:51 14 [mqtt] Using non-TLS-encrypted session
INFO  | 14:51:51 14 [mqtt] Attempting to connect directly to MQTT server <IP REDACTED>, port: 1883, username: <USER REDACTED>, password: <PW REDACTED>
ERROR | 14:51:54 17 [mqtt] Failed to contact MQTT server directly (2/5)...

But since the cleartext MQTT is not supported by the server the connection fails.

Footnotes

  1. Metadata: { "firmwareVersion": "2.3.6.7a3570a", "deviceStateVersion": 22, "canShutdown": true, "hasWifi": true, "hasBluetooth": true, "role": "ROUTER_CLIENT", "positionFlags": 811, "hwModel": "TLORA_T3_S3", "hasEthernet": false, "hasRemoteHardware": false }

@oherrala
Copy link

Greetings from @Tarlab! With Heltec V3 and 2.3.4 firmware we got the following log with backtrace:

INFO  | ??:??:?? 8 Using TLS-encrypted session
INFO  | ??:??:?? 8 Attempting to connect directly to MQTT server <IP REDACTED>, port: 8883, username: <USER REDACTED>, password: <PW REDACTED>
[  8261][D][ssl_client.cpp:176] start_ssl_client(): WARNING: Skipping SSL Verification. INSECURE!
Guru Meditation Error: Core  1 panic'ed (Unhandled debug exception). 
Debug exception reason: Stack canary watchpoint triggered (arduino_events) 
Core  1 register dump:
PC      : 0x4037881b  PS      : 0x00060336  A0      : 0x80378c4c  A1      : 0x3fcb7410  
A2      : 0x000000e0  A3      : 0x00000804  A4      : 0x00000000  A5      : 0x00000000  
A6      : 0x00005004  A7      : 0x3fcb00dc  A8      : 0x3fcb7740  A9      : 0x3fcb7550  
A10     : 0x3fcb00dc  A11     : 0x3fcb78ec  A12     : 0x3fce9710  A13     : 0x3fcb7a7c  
A14     : 0x00000000  A15     : 0x3fcb7a54  SAR     : 0x00000018  EXCCAUSE: 0x00000001  
EXCVADDR: 0x00000000  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0x00000000  


Backtrace: 0x40378818:0x3fcb7410 0x40378c49:0x3fcb7440 0x4037c9a4:0x3fcb7460 0x420c83e9:0x3fcb7480 0x420c5bbf:0x3fcb74a0 0x420c58cf:0x3fcb74c0 0x420c0ff3:0x3fcb74e0 0x420c1bde:0x3fcb7550 0x420c26a3:0x3fcb7740 0x420c2a0d:0x3fcb78a0 0x420c2
a37:0x3fcb78d0 0x420c0bd8:0x3fcb7930 0x420c0f2a:0x3fcb79d0 0x420c0f52:0x3fcb7a10 0x420c6c1d:0x3fcb7a30 0x420c6c4f:0x3fcb7a50 0x420c6841:0x3fcb7b20 0x420ba077:0x3fcb7b50 0x420b7ecd:0x3fcb7bf0 0x420b7f07:0x3fcb7c10 0x42060751:0x3fcb7c30 0x4
205f7ee:0x3fcb7ec0 0x4205f8a3:0x3fcb7f10 0x4205f972:0x3fcb7f50 0x4203b439:0x3fcb7f70 0x4203b5ad:0x3fcb7fb0 0x4202d737:0x3fcb7fe0 0x42020659:0x3fcb8050 0x4205d7dc:0x3fcb80a0 0x4205dc69:0x3fcb8270


ELF file SHA256: c96d9d6e075bb841

When we opened the backtrace we got following:

0x40378818: heap_caps_malloc_base at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/heap/heap_caps.c:138
0x40378818: heap_caps_malloc_base at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/heap/heap_caps.c:99
0x40378c49: heap_caps_calloc_base at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/heap/heap_caps.c:466
0x40378c49: heap_caps_calloc at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/heap/heap_caps.c:475
0x4037c9a4: esp_mbedtls_mem_calloc at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/mbedtls/port/esp_mem.c:25

The last line is really close at pointing at this:

https://github.com/espressif/esp-idf/blob/3d813afa010b62fce162d02d00fa978c0c87255c/components/mbedtls/port/esp_mem.c#L14C7-L34

@oherrala
Copy link

Maybe updating dependencies is the first step before continuing any debugging further. Meshtastic is currently running platformio/espressif32 version 6.3.2 and latest is 6.6.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants