Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIFI Disconnect with Reason 34 ?! (IDFGH-4410) #6241

Closed
alorbach opened this issue Dec 10, 2020 · 17 comments
Closed

WIFI Disconnect with Reason 34 ?! (IDFGH-4410) #6241

alorbach opened this issue Dec 10, 2020 · 17 comments
Assignees
Labels
Awaiting Response awaiting a response from the author Resolution: Done Issue is done internally Status: Done Issue is done internally

Comments

@alorbach
Copy link

alorbach commented Dec 10, 2020

I am on ESP-IDF release 3.3 with arduino-esp32 master branch

I have a sketch that receives about 60 UDP packages per seconds, with a data size of up to 1024 bytes per packet.
At the same time its streaming DTLS data (UDP) with about 25 packets per second with about 70 bytes erach packet.

I have more than enough memory free:

Free: 141KB
Heap: 182KB

At a random point WIFI becomes unstable, it disconnects with reason 34 which I can not find any documentation on!

%debug: 1:4803491: wpsWiFiEvent:Disconnected: Disconnected from station reason: 34 (other reason), reconnect handling...

The Wifi error codes go from 0 to 24 and from 200 to 205 as described here:
https://github.com/espressif/esp-idf/blob/0b71a0a46d67cce681ec55973b020d950d8596bd/docs/en/api-guides/wifi.rst#wi-fi-reason-code

So what does this error code mean?

This is my reconnect handling after the disconnect

WiFi.disconnect(true);
WiFi.mode(WIFI_OFF);
WiFi.mode(WIFI_MODE_STA);
WiFi.begin();

After the problem has happened, this repeats every few seconds until I reset:

%debug: 0:4803874: esp32debug: I (4805664) wifi:
%debug: 0:4803874: esp32debug: new:<13,2>, old:<13,0>, ap:<255,255>, sta:<13,2>, prof:1
%debug: 0:4803878: esp32debug: I (4805668) wifi:
%debug: 0:4803882: esp32debug: state: init -> auth (b0)
%debug: 0:4803895: esp32debug: I (4805684) wifi:
%debug: 0:4803895: esp32debug: state: auth -> assoc (0)
%debug: 0:4803907: esp32debug: I (4805697) wifi:
%debug: 0:4803907: esp32debug: state: assoc -> run (10)
%debug: 0:4803952: esp32debug: I (4805742) wifi:
%debug: 0:4803953: esp32debug: connected with wifi, aid = 5, channel 13, 40D, bssid = dc:39:6f:51:74:e4
%debug: 0:4803958: esp32debug: I (4805748) wifi:
%debug: 0:4803963: esp32debug: security type: 3, phy: bgn, rssi: -66
%debug: 0:4804038: esp32debug: I (4805828) wifi:
%debug: 0:4804039: esp32debug: pm start, type: 0
%debug: 0:4804039: esp32debug: I (4805829) wifi:
%debug: 0:4804043: esp32debug: AP's beacon interval = 102400 us, DTIM period = 1
%debug: 1:4804055: wpsWiFiEvent: STA Connected to: wifi

@github-actions github-actions bot changed the title WIFI Disconnect with Reason 34 ?! WIFI Disconnect with Reason 34 ?! (IDFGH-4410) Dec 10, 2020
@alorbach
Copy link
Author

alorbach commented Dec 14, 2020

Another step further, debugging lwip shows me where the mem alloc fails

lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
lwip_getsockopt(57, SOL_SOCKET, SO_ERROR) = 12
ssl_client_udp error: send_ssl_data failed with res 0XFFFFFFB2, errno 12, retry after vTaskDelay!
lwip_send(57, data=0x3ffec5f4, size=71, flags=0x0)
lwip_sendto(57, data=0x3ffec5f4, short_size=71, flags=0x0 to=0.0.0.0 port=0
ssl_client_udp error: send_ssl_data failed with res 0XFFFFFFB2, errno 12, tear down session!
lwip_close(57)
lwip_close: is_tcp=0

@alorbach
Copy link
Author

Even with LWIP Stats enable an pbub debug enabled, nothing which explains the ERR_MEM:

lwip_send(52, data=0x3ffee904, size=71, flags=0x0)
lwip_sendto(52, data=0x3ffee904, short_size=71, flags=0x0 to=0.0.0.0 port=0
pbuf_alloc(length=71)
pbuf_alloc(length=71) == 0x3ffd6e0c
netconn_send: sending 71 bytes
pbuf_header: old 0x3ffd6e70 new 0x3ffd6e68 (8)
pbuf_header: old 0x3ffd6e68 new 0x3ffd6e54 (20)
pbuf_header: old 0x3ffd6e54 new 0x3ffd6e46 (14)
pbuf_free(0x3ffd6e0c)
pbuf_free: deallocating 0x3ffd6e0c
lwip_send(52, data=0x3ffee904, size=71, flags=0x0)
lwip_sendto(52, data=0x3ffee904, short_size=71, flags=0x0 to=0.0.0.0 port=0
pbuf_alloc(length=71)
pbuf_alloc(length=71) == 0x3ffd6e0c
netconn_send: sending 71 bytes
pbuf_header: old 0x3ffd6e70 new 0x3ffd6e68 (8)
pbuf_header: old 0x3ffd6e68 new 0x3ffd6e54 (20)
pbuf_header: old 0x3ffd6e54 new 0x3ffd6e46 (14)
pbuf_free(0x3ffd6e0c)
pbuf_free: deallocating 0x3ffd6e0c
lwip_send(52, data=0x3ffee904, size=71, flags=0x0)
lwip_sendto(52, data=0x3ffee904, short_size=71, flags=0x0 to=0.0.0.0 port=0
pbuf_alloc(length=71)
pbuf_alloc(length=71) == 0x3ffd6e0c
netconn_send: sending 71 bytes
pbuf_header: old 0x3ffd6e70 new 0x3ffd6e68 (8)
pbuf_header: old 0x3ffd6e68 new 0x3ffd6e54 (20)
pbuf_header: old 0x3ffd6e54 new 0x3ffd6e46 (14)
pbuf_free(0x3ffd6e0c)
pbuf_free: deallocating 0x3ffd6e0c
lwip_getsockopt(52, SOL_SOCKET, SO_ERROR) = 12

LINK
xmit: 3
recv: 3
fw: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

ETHARP
xmit: 24
recv: 279
fw: 0
drop: 429
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 429
opterr: 0
err: 0
cachehit: 15310

IP_FRAG
xmit: 0
recv: 0
fw: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

IPv6 FRAG
xmit: 0
recv: 0
fw: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

IP
xmit: 15507
recv: 20934
fw: 0
drop: 1009
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

ND
xmit: 3
recv: 0
fw: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

IPv6
xmit: 3
recv: 524
fw: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

IGMP
xmit: 0
recv: 7
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
proterr: 0
rx_v1: 0
rx_group: 0
rx_general: 7
rx_report: 0
tx_join: 0
tx_leave: 0
tx_report: 0

MLDv1
xmit: 0
recv: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
proterr: 0
rx_v1: 0
rx_group: 0
rx_general: 0
rx_report: 0
tx_join: 0
tx_leave: 0
tx_report: 0

ICMP
xmit: 53
recv: 42
fw: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

ICMPv6
xmit: 0
recv: 0
fw: 0
drop: 0
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 0
opterr: 0
err: 0
cachehit: 0

UDP
xmit: 15316
recv: 19747
fw: 0
drop: 11
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 11
opterr: 0
err: 0
cachehit: 19

TCP
xmit: 87
recv: 130
fw: 0
drop: 1
chkerr: 0
lenerr: 0
memerr: 0
rterr: 0
proterr: 1
opterr: 0
err: 0
cachehit: 112

SYS
sem.used: 0
sem.max: 0
pbuf_alloc(length=102)

@alorbach
Copy link
Author

It seems to fail at a simple malloc call (traced back to mem_malloc),
could this be heap fragmentation?
https://github.com/espressif/esp-lwip/blob/61d840ff4778f4946c8743f7e412345abcd537f1/src/core/mem.c#L251

@YouDONG-ESP
Copy link
Contributor

@alorbach It's possible, but it is unlikely to happen if your packet size remains the same.

I didn't find reason code 34 in our code, is that a hexadecimal number? If so, 22 is refer to INVALID_RSN_IE_CAP, it may happens with very few ap, dose it happen with other ap?

@alorbach
Copy link
Author

@YouDONG-ESP Yes it happens with other AP as well, I have three different AP's available for testing.
However I think that the root cause is, that the mem_malloc calls inside LWIP fail at some random point and causing all kinds of network issues afterwards.

@alorbach
Copy link
Author

I had added heap_caps_print_heap_info before and after the failure,:

BEFORE:

Heap summary for capabilities 0x00000804:
At 0x3ffbdb60 len 84 free 0 allocated 36 min_free 0
largest_free_block 0 alloc_blocks 1 free_blocks 0 total_blocks 1
At 0x3ffc1bb4 len 2408 free 0 allocated 2024 min_free 0
largest_free_block 0 alloc_blocks 22 free_blocks 0 total_blocks 22
At 0x3ffbdb28 len 52 free 0 allocated 4 min_free 0
largest_free_block 0 alloc_blocks 1 free_blocks 0 total_blocks 1
At 0x3ffb8000 len 6688 free 0 allocated 6480 min_free 0
largest_free_block 0 alloc_blocks 11 free_blocks 0 total_blocks 11
At 0x3ffb0000 len 25480 free 0 allocated 24944 min_free 0
largest_free_block 0 alloc_blocks 31 free_blocks 1 total_blocks 32
At 0x3ffae6e0 len 6192 free 0 allocated 5920 min_free 0
largest_free_block 0 alloc_blocks 14 free_blocks 1 total_blocks 15
At 0x3ffaff10 len 240 free 0 allocated 120 min_free 0
largest_free_block 0 alloc_blocks 5 free_blocks 1 total_blocks 6
At 0x3ffb6388 len 7288 free 0 allocated 6920 min_free 0
largest_free_block 0 alloc_blocks 21 free_blocks 0 total_blocks 21
At 0x3ffb9a20 len 16648 free 24 allocated 15476 min_free 8
largest_free_block 0 alloc_blocks 68 free_blocks 4 total_blocks 72
At 0x3ffca158 len 89768 free 11592 allocated 70876 min_free 208
largest_free_block 5184 alloc_blocks 439 free_blocks 58 total_blocks 497
At 0x3ffe0440 len 15072 free 13568 allocated 1436 min_free 16
largest_free_block 10840 alloc_blocks 1 free_blocks 2 total_blocks 3
At 0x3ffe4350 len 113840 free 76204 allocated 37536 min_free 76204
largest_free_block 75560 alloc_blocks 3 free_blocks 2 total_blocks 5
Totals:
free 101388 allocated 171772 min_free 76436 largest_free_block 75560

AFTER FAIL:

heap summary for capabilities 0x00000804:
At 0x3ffbdb60 len 84 free 0 allocated 36 min_free 0
largest_free_block 0 alloc_blocks 1 free_blocks 0 total_blocks 1
At 0x3ffc1bb4 len 2408 free 0 allocated 2024 min_free 0
largest_free_block 0 alloc_blocks 22 free_blocks 0 total_blocks 22
At 0x3ffbdb28 len 52 free 0 allocated 4 min_free 0
largest_free_block 0 alloc_blocks 1 free_blocks 0 total_blocks 1
At 0x3ffb8000 len 6688 free 0 allocated 6480 min_free 0
largest_free_block 0 alloc_blocks 11 free_blocks 0 total_blocks 11
At 0x3ffb0000 len 25480 free 0 allocated 24944 min_free 0
largest_free_block 0 alloc_blocks 31 free_blocks 1 total_blocks 32
At 0x3ffae6e0 len 6192 free 0 allocated 5920 min_free 0
largest_free_block 0 alloc_blocks 14 free_blocks 1 total_blocks 15
At 0x3ffaff10 len 240 free 0 allocated 120 min_free 0
largest_free_block 0 alloc_blocks 5 free_blocks 1 total_blocks 6
At 0x3ffb6388 len 7288 free 0 allocated 6920 min_free 0
largest_free_block 0 alloc_blocks 21 free_blocks 0 total_blocks 21
At 0x3ffb9a20 len 16648 free 24 allocated 15476 min_free 8
largest_free_block 0 alloc_blocks 68 free_blocks 4 total_blocks 72
At 0x3ffca158 len 89768 free 12016 allocated 70468 min_free 208
largest_free_block 4332 alloc_blocks 438 free_blocks 58 total_blocks 496
At 0x3ffe0440 len 15072 free 11280 allocated 3708 min_free 16
largest_free_block 10840 alloc_blocks 2 free_blocks 2 total_blocks 4
At 0x3ffe4350 len 113840 free 76204 allocated 37536 min_free 76204
largest_free_block 75560 alloc_blocks 3 free_blocks 2 total_blocks 5
Totals:
free 99524 allocated 173636 min_free 76436 largest_free_block 75560

@alorbach
Copy link
Author

@alorbach It's possible, but it is unlikely to happen if your packet size remains the same.

I didn't find reason code 34 in our code, is that a hexadecimal number? If so, 22 is refer to INVALID_RSN_IE_CAP, it may happens with very few ap, dose it happen with other ap?

The main UDP buffers used are the same, about 500 bytes for incoming (RAW Data) and about 70 bytes for outgoing (DTLS)

@YouDONG-ESP
Copy link
Contributor

Could you provide a code to reproduce this issue?

@alorbach
Copy link
Author

Well you will need a DTLS receiving Server.
I am currently testing a few things and I plan to build against esp-idf 4.2 today to check if the problem happens there as well.

If I am not able to reproduce the issue, I will try to greate a minimal sketch that reproduces the issue somehow.

@alorbach
Copy link
Author

@YouDONG-ESP:

Interesting development! The problem has not totally gone but somehow it is better now - meaning Wifi (LWIP) is not becoming unstable anymore after the first ERR_MEM occurred.

So I have a Class that does thwe DTLS UDP Streaming which uses HTTPClient from arduino-esp32, and I was creating / deleting HTTPClient objects dynamically after the ERR_MEM occurred.

I remembered that I had trouble (strange exceptions) when deleting dynamic HTTPClient objects before in my OTA Updatecode. So as a workaround, I made the HTTPClient Object static in my OTA update code which was a sufficient at this time for me.

Now I did the same to my Class that does the DTLS Streaming, I made the HTTPClient object static at the top of the Class code like this:
HTTPClient mHttp;

And it appears to have fixed the problem mostly - I am running DTLS Streaming now since about 30 minutes without any ERR_MEM till yet.

So I think HTTPClient is messing something badly up in the destructor.

@me-no-dev
Copy link
Member

Please give a minimal example that shows the problem.

@Alvin1Zhang Alvin1Zhang added the Awaiting Response awaiting a response from the author label Sep 16, 2021
@0xFEEDC0DE64
Copy link
Contributor

we have a customer with an esp32 based product of ours and the esp32 is not able to connect to his network, but works literally everywhere else. I get the error 34 (not in hex, but decimal), maybe this is the same issue here? #7593

I cant reproduce it in our office and the customer says he has a devolo "mesh network setup", where only the access points talk mesh to each other and send out a access point (then usable by the customer). But I dont get the whole mesh thematic here :)

@alorbach
Copy link
Author

@0xFEEDC0DE64 I have got the problem more or less under control, it happens very very seldom.
What I can tell you is, if you have any invalid memory access somewhere like even writting 1 byte to much into a char array, this error will happen sooner than earlier.

I believe this is a following error of issues like this, and the hardest error's to find btw.

@SJFriedl
Copy link

FWIW, I can repro WIFI disconnect with reason 34 with a lot of pings to the IP address, and it appears that reason 34 is documented in the 802.11 standard as: "Disassociated because excessive number of frames need to be acknowledged, but are not acknowledged due to AP transmissions and/or poor channel conditions".

Searching for "WLAN_STATUS_DENIED_POOR_CHANNEL_CONDITIONS" will bring up plenty of references.

Asking for a reconnect (assuming the poor channel conditions have recovered) works fine for me.

@mhdong
Copy link
Collaborator

mhdong commented Mar 15, 2022

hi @alorbach this issue may have fixed at new version, please checked with the newest esp-idf master.

@AxelLin
Copy link
Contributor

AxelLin commented Mar 15, 2022

hi @alorbach this issue may have fixed at new version, please checked with the newest esp-idf master.

#7593 (comment)

@espressif-bot espressif-bot added the Status: In Progress Work is in progress label May 24, 2022
@Alvin1Zhang
Copy link
Collaborator

Thanks for reporting, feel free to reopen.

@espressif-bot espressif-bot added Resolution: Done Issue is done internally Status: Done Issue is done internally and removed Status: In Progress Work is in progress labels Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Response awaiting a response from the author Resolution: Done Issue is done internally Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

10 participants