Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frequent Interrupt WDT timeout on core 0. BLE+BT (IDFGH-401) #2542

Closed
2 tasks
Lolektahor opened this issue Oct 10, 2018 · 37 comments
Closed
2 tasks

frequent Interrupt WDT timeout on core 0. BLE+BT (IDFGH-401) #2542

Lolektahor opened this issue Oct 10, 2018 · 37 comments

Comments

@Lolektahor
Copy link

Lolektahor commented Oct 10, 2018

sdkconfig.zip

Environment

  • Development Kit: |none]
  • Kit version (for WroverKit/PicoKit/DevKitC):
  • Core (if using chip or module): ESP-WROOM32
  • IDF version (git rev-parse --short HEAD to get the commit id.): 221eced
  • Development Env: Make|Eclipse
  • Operating System: Windows
  • Power Supply: external 3.3V

Problem Description

frequent Interrupt WDT timeout on core 0 .

I'm Using BT + Wifi together. my device is constantly scanning for advertisers, and can connect to one peripheral while scanning. I also use MQTT library.
I constantly get Interrupt wdt timeout on CPU0. it usually takes between 1-20 min. I tried tracing the code, but couldn't find the file specified: lld_pdu.c .
my code is based on the multilink example. I couldn't see any code line running before the error occurs, nor any BLE, Wifi or MQTT event. I tried debugging HCI, GATTC, MQTT and Wifi and couldn't see any repetition of events while the error occurs.
will be happy to get a direction for debugging this problem.

Expected Behavior

run for days

Actual Behavior

run for a few minutes

Steps to repropduce

can't supply a reproduce sequence, as the error is random

Code to reproduce this issue

can't point to the exact part of code that produces the error

Debug Logs

ASSERT_PARAM(0 10), in lld_pdu.c at line 519
Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)
Core 0 register dump:
PC : 0x401953d6 PS : 0x00060334 A0 : 0x800d3f99 A1 : 0x3ffbbbc0
0x401953d6: esp_pm_impl_waiti at C:/msys32/home/lolekt/esp/esp-idf/components/esp32/pm_esp32.c:474

A2 : 0x00000000 A3 : 0x00000001 A4 : 0x8009202e A5 : 0x3ffcd8e0
A6 : 0x00000003 A7 : 0x00060023 A8 : 0x800d311e A9 : 0x3ffbbb90
A10 : 0x00000000 A11 : 0x00000001 A12 : 0x8009202e A13 : 0x3ffd5d10
A14 : 0x00000003 A15 : 0x00060023 SAR : 0x00000000 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x00000000 LEND : 0x00000000 LCOUNT : 0x00000000

Backtrace: 0x401953d6:0x3ffbbbc0 0x400d3f96:0x3ffbbbe0 0x40091479:0x3ffbbc00 0x40091e25:0x3ffbbc20
0x401953d6: esp_pm_impl_waiti at C:/msys32/home/lolekt/esp/esp-idf/components/esp32/pm_esp32.c:474

0x400d3f96: esp_vApplicationIdleHook at C:/msys32/home/lolekt/esp/esp-idf/components/esp32/freertos_hooks.c:86

0x40091479: prvIdleTask at C:/msys32/home/lolekt/esp/esp-idf/components/freertos/tasks.c:3564

0x40091e25: vPortTaskWrapper at C:/msys32/home/lolekt/esp/esp-idf/components/freertos/port.c:403

Core 1 register dump:
PC : 0x400848ad PS : 0x00060034 A0 : 0x80087b26 A1 : 0x3ffbe750
0x400848ad: r_assert_param at ??:?

A2 : 0x00000001 A3 : 0x00000000 A4 : 0x00000000 A5 : 0x60008054
A6 : 0x3ffbdbf8 A7 : 0x60008050 A8 : 0x800848ad A9 : 0x3ffbe730
A10 : 0x00000004 A11 : 0x00000000 A12 : 0x6000804c A13 : 0xffffffff
A14 : 0x00000000 A15 : 0xfffffffc SAR : 0x00000004 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x400847e5 LEND : 0x400847ec LCOUNT : 0x00000000
0x400847e5: r_assert_param at ??:?

0x400847ec: r_assert_param at ??:?

Backtrace: 0x400848ad:0x3ffbe750 0x40087b23:0x3ffbe770 0x400865cd:0x3ffbe7a0 0x400871ba:0x3ffbe7c0 0x400880f6:0x3ffbe7e0 0x40089
227:0x3ffbe800 0x40082645:0x3ffbe820 0x401953d3:0x00000000
0x400848ad: r_assert_param at ??:?

0x40087b23: r_lld_pdu_rx_handler at ??:?

0x400865cd: r_lld_evt_end at ??:?

0x400871ba: r_lld_evt_end_isr at ??:?

0x400880f6: r_rwble_isr at ??:?

0x40089227: r_rwbtdm_isr_wrapper at intc.c:?

0x40082645: _xt_lowint1 at C:/msys32/home/lolekt/esp/esp-idf/components/freertos/xtensa_vectors.S:1105

0x401953d3: esp_pm_impl_waiti at C:/msys32/home/lolekt/esp/esp-idf/components/esp32/pm_esp32.c:474

Other items if possible

  • [v] sdkconfig file (attach the sdkconfig file from your project folder)
  • elf file in the build folder (note this may contain all the code details and symbols of your project.)
  • coredump (This provides stacks of tasks.)
@Alvin1Zhang Alvin1Zhang changed the title frequent Interrupt WDT timeout on core 0. BLE+BT [TW#26690] frequent Interrupt WDT timeout on core 0. BLE+BT Oct 11, 2018
@TianHao-Yoursmake
Copy link

@Lolektahor , The error seems not happen in official esp-idf.
Could you provide me the bt library version and wifi library version in log? The version is printed when initialise bluetooth and wifi in log.

@Lolektahor
Copy link
Author

wifi firmware version: 75aab56
BT controller compile version [a348a1e]

tried fetching the sdk again. same versions, same error

@Lolektahor
Copy link
Author

I think that the MQTT causes the error.
I canceled all MQTT functions and it doesn't crash...

@Lolektahor
Copy link
Author

Why did you close it? there was no answer yet...

@Alvin1Zhang
Copy link
Collaborator

Sorry, mishandled this issue.

@Alvin1Zhang Alvin1Zhang reopened this Oct 24, 2018
@TianHao-Yoursmake
Copy link

@Lolektahor , we have found something about this problem. We're testing it. Any update, I will notice you. Thanks.

@projectgus projectgus changed the title [TW#26690] frequent Interrupt WDT timeout on core 0. BLE+BT frequent Interrupt WDT timeout on core 0. BLE+BT (IDFGH-401) Mar 12, 2019
@Alvin1Zhang
Copy link
Collaborator

@TianHao-Espressif Any updates? Thanks.

@enricop
Copy link

enricop commented Oct 17, 2019

still present on release/v4.0 while working with BLE GATT and Wifi is on.
please push master updates also on this branch.
thank you

@TianHao-Yoursmake
Copy link

@enricop , as our test several months ago, it should be fixed in c960bcb.
And could paste your log and describe your reproduce scenario? Thanks.

@enricop
Copy link

enricop commented Oct 21, 2019

hi @TianHao-Espressif. These are all the crash logs we experienced:

It always happens at lld_pdu.c at line 527

log 1:

picturemessage_h2ndb4fj zjo

log 2:

I (331642) BLE_GATT_service: ESP_GATTS_CONF_EVT, status = 0, attr_handle 54
I (384772) BLE_GATT_service: GATT_WRITE_EVT, handle = 57, value len = 135
I (384772) ble_rxtx: received message with CID 0x01, PUSH_FILE (phone)
in: cfg_data_new_file {
cid: 0x01, PUSH_FILE (phone)
f_k: setup_params.json
plen: 131
payload: [0x78 0x9c .. 0x3a 0x56]
}
I (384782) ble_rxtx: starting a stream for file setup_params.json of compressed length 131
I (384792) ble_rxtx: sending message with CID 0x01, NEW_FILE_RES (p406)
out: cfg_file_response {
cid: 0x01, NEW_FILE_RES (p406)
res: 0x0, SUCCESS
f_k: setup_params.json
}
I (384812) ble_rxtx: answering with a payload of length 3
I (384822) BLE_GATT_service: ESP_GATTS_CONF_EVT, status = 0, attr_handle 54
ASSERT_PARAM(0 10), in lld_pdu.c at line 527
Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)
Core 0 register dump:
PC : 0x40088386 PS : 0x00060734 A0 : 0x8008baec A1 : 0x3ffbf420
0x40088386: r_assert_param at ??:?

A2 : 0x00000001 A3 : 0x00000000 A4 : 0x00000000 A5 : 0x60008054
A6 : 0x3ffbe0a0 A7 : 0x60008050 A8 : 0x80088381 A9 : 0x3ffbf400
A10 : 0x00000004 A11 : 0x00000000 A12 : 0x6000804c A13 : 0xffffffff
A14 : 0x00000000 A15 : 0xfffffffc SAR : 0x00000004 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x400882b9 LEND : 0x400882c0 LCOUNT : 0x00000000
0x400882b9: r_assert_param at ??:?

0x400882c0: r_assert_param at ??:?

Core 0 was running in ISR context:
EPC1 : 0x40009203 EPC2 : 0x00000000 EPC3 : 0x00000000 EPC4 : 0x40088386
0x40088386: r_assert_param at ??:?

ELF file SHA256: 7269f3af4f5010bc13015895af0700da6d8a6111d8e591803ef6c4b6fba269d5

Backtrace: 0x40088383:0x3ffbf420 0x4008bae9:0x3ffbf440 0x4008b299:0x3ffbf480 0x4008ac76:0x3ffbf4a0 0x4008c006:0x3ffbf4c0 0x4008ce7b:0x3ffbf4e0 0x4008655e:0x3ffbf500 0x402a62af:0x3ffbc420 0x400d2872:0x3ffbc440 0x40098479:0x3ffbc460 0x40097305:0x3ffbc480
0x40088383: r_assert_param at ??:?
0x4008bae9: r_lld_pdu_rx_handler at ??:?
0x4008b299: r_lld_evt_end at ??:?
0x4008ac76: r_lld_evt_end_isr at ??:?
0x4008c006: r_rwble_isr at ??:?
0x4008ce7b: r_rwbtdm_isr_wrapper at intc.c:?
0x4008655e: _xt_lowint1 at /home/esp/esp-homekit-sdk/esp-idf/components/freertos/xtensa_vectors.S:1153
0x400d2872: esp_register_freertos_tick_hook_for_cpu at /home/esp/esp-homekit-sdk/esp-idf/components/esp_common/src/freertos_hooks.c:93
0x40098479: vTaskStartScheduler at /home/esp/esp-homekit-sdk/esp-idf/components/freertos/tasks.c:2086
0x40097305: xQueueGiveFromISR at /home/esp/esp-homekit-sdk/esp-idf/components/freertos/queue.c:1344

log 3:

I (34063) BLE_GATT_service: 54 f5 e6 64 a0 40
ASSERT_PARAM(0 9), in lld_pdu.c at line 527
Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)
Core 0 register dump:
PC : 0x4008838a PS : 0x00060734 A0 : 0x8008baf0 A1 : 0x3ffbf420
0x4008838a: r_assert_param at ??:?

A2 : 0x00000001 A3 : 0x00000000 A4 : 0x00000000 A5 : 0x60008054
A6 : 0x3ffbe0a0 A7 : 0x60008050 A8 : 0x80088385 A9 : 0x3ffbf400
A10 : 0x00000004 A11 : 0x00000000 A12 : 0x6000804c A13 : 0xffffffff
A14 : 0x00000000 A15 : 0xfffffffc SAR : 0x00000004 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x400882bd LEND : 0x400882c4 LCOUNT : 0x00000000
0x400882bd: r_assert_param at ??:?

0x400882c4: r_assert_param at ??:?

Core 0 was running in ISR context:
EPC1 : 0x40009203 EPC2 : 0x00000000 EPC3 : 0x00000000 EPC4 : 0x4008838a
0x4008838a: r_assert_param at ??:?

ELF file SHA256: 72013a25db751e6ff8dd2fbae71e183dfd53e0cedd40f88ca37145ab06faf725

Backtrace: 0x40088387:0x3ffbf420 0x4008baed:0x3ffbf440 0x400485a5:0x3ffbf480 |<-CORRUPTED
0x40088387: r_assert_param at ??:?

0x4008baed: r_lld_pdu_rx_handler at ??:?

we are using BLE_Mesh + Wifi on release/v4.0 branch. It happens expecially while we receive data with BLE GATT

I attach sdkconfig.txt

After setting CONFIG_ESP_INT_WDT_TIMEOUT_MS to 5000 the issue still occurs.

@enricop
Copy link

enricop commented Oct 22, 2019

another one:

I (1083526) BLE_GATT_service: ESP_GATTS_CONNECT_EVT, conn_id = 0
I (1083526) BLE_GATT_service: 63 d7 ce 15 04 1f
I (1084716) BLE_GATT_service: ESP_GATTS_MTU_EVT, MTU 517
ASSERT_PARAM(0 9), in lld_pdu.c at line 527
Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)
Core 0 register dump:
PC : 0x40088385 PS : 0x00060634 A0 : 0x8008baf0 A1 : 0x3ffbf420
0x40088385: r_assert_param at ??:?

A2 : 0x00000001 A3 : 0x00000000 A4 : 0x00000000 A5 : 0x60008054
A6 : 0x3ffbe0a0 A7 : 0x60008050 A8 : 0x80088385 A9 : 0x3ffbf400
A10 : 0x00000004 A11 : 0x00000000 A12 : 0x6000804c A13 : 0xffffffff
A14 : 0x00000000 A15 : 0xfffffffc SAR : 0x00000004 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x400882bd LEND : 0x400882c4 LCOUNT : 0x00000000
0x400882bd: r_assert_param at ??:?

0x400882c4: r_assert_param at ??:?

Core 0 was running in ISR context:
EPC1 : 0x40009203 EPC2 : 0x00000000 EPC3 : 0x00000000 EPC4 : 0x40088385
0x40088385: r_assert_param at ??:?

ELF file SHA256: 1010beb47d9aad4a798f538a8402bb0965369c1a2c4b64148307b57303f90500

Backtrace: 0x40088382:0x3ffbf420 0x4008baed:0x3ffbf440 0x400485a5:0x3ffbf480 |<-CORRUPTED
0x40088382: r_assert_param at ??:?

0x4008baed: r_lld_pdu_rx_handler at ??:?

Rebooting...
ets Jun 8 2016 00:22:57

@TianHao-Yoursmake
Copy link

@enricop , Could configure both CONFIG_BTDM_CONTROLLER_PINNED_TO_CORE=1 and CONFIG_BLUEDROID_PINNED_TO_CORE=1 to make bluetooth task pinned to CPU core 1? It may be a temporary solution to fix this problem. Then could you report the test result, thanks.

@enricop
Copy link

enricop commented Oct 22, 2019

hi @TianHao-Espressif , so should we disable FREERTOS_UNICORE ? (we have not tested our code in multicore mode)

@enricop
Copy link

enricop commented Oct 28, 2019

hi @TianHao-Espressif . After running extensive tests we are NOT able to run in dualcore mode, and we have the same issue.

The error below happens frequently while working over BLE and Wifi is on. (UNICORE mode)

it was reproduced with latest release/v4.0 updates

I (52715) BLE_GATT_service: ESP_GATTS_MTU_EVT, MTU 517
ASSERT_PARAM(0 9), in lld_pdu.c at line 533
Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)
Core 0 register dump:
PC : 0x400883d6 PS : 0x00060f34 A0 : 0x8008bc78 A1 : 0x3ffbf4b0
0x400883d6: r_assert_param at ??:?

A2 : 0x00000001 A3 : 0x00000000 A4 : 0x00000000 A5 : 0x60008054
A6 : 0x3ffbe118 A7 : 0x60008050 A8 : 0x800883d1 A9 : 0x3ffbf490
A10 : 0x00000004 A11 : 0x00000000 A12 : 0x6000804c A13 : 0xffffffff
A14 : 0x00000000 A15 : 0xfffffffc SAR : 0x00000004 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x40088309 LEND : 0x40088310 LCOUNT : 0x00000000
0x40088309: r_assert_param at ??:?

0x40088310: r_assert_param at ??:?

Core 0 was running in ISR context:
EPC1 : 0x40009203 EPC2 : 0x00000000 EPC3 : 0x00000000 EPC4 : 0x400883d6
0x400883d6: r_assert_param at ??:?

ELF file SHA256: 81f29b3a585e92b15d5e55ab7f4bb5f0f44f4b35325f3e69e96c5e90bf4f538a

Backtrace: 0x400883d3:0x3ffbf4b0 0x4008bc75:0x3ffbf4d0 0x400485a5:0x3ffbf510 |<-CORRUPTED
0x400883d3: r_assert_param at ??:?

0x4008bc75: r_lld_pdu_rx_handler at ??:?

Rebooting...
ets Jun 8 2016 00:22:57

Any suggestion? (we can't pin BLE to core 1 because dualcore schedule requires too much memory for our scenario)

Thanks for your support

@TianHao-Yoursmake
Copy link

TianHao-Yoursmake commented Nov 1, 2019

@enricop , it need time to debug. Could you please use dual-core as the temporary solution? And could you provide the test code which can reproduce it? Thanks.

@enricop
Copy link

enricop commented Nov 1, 2019

@TianHao-Espressif thanks for your support. We have very low free memory in our project and enabling dual-core takes more memory, so if we do this we are stuck into another problem.
We have set int_wdt timeout to 5000 but the error still occurs.

For now we cannot provide test code to reproduce the problem, we are using a custom protocol over BLE GATT to transfer/receive data. Note that we have BLE_MESH and Wifi stacks running in the meanwhile.
The crash happens frequently when we connect with a phone to the BLE GATT server of the esp32 node.

@enricop
Copy link

enricop commented Nov 4, 2019

hi @TianHao-Espressif we got around the low memory problem, tried to switch to dual-core mode, but had some wifi sta association issues.

Notice that on October 28th we were running in dual-core mode and the interrupt watchdog timeout expired again. Unfortunately we didn't save the logs.

@mcilloni
Copy link

mcilloni commented Nov 4, 2019

Hi @TianHao-Espressif,
We noticed that trying to connect from two separate devices at the same time (in our case Android devices running nRF Connect) reliably manages to reproduce this issue:

I (12944) BLE_GATT_service: ESP_GATTS_CONNECT_EVT, conn_id = 0
I (12944) BLE_GATT_service: 79 a3 99 81 f4 c7 
I (12944) ble_rxtx: new BLE_RXTX connection started
I (13664) BLE_GATT_service: ESP_GATTS_MTU_EVT, MTU 517
I (26834) BLE_GATT_service: ESP_GATTS_CONNECT_EVT, conn_id = 1
I (26834) BLE_GATT_service: 5b 63 18 48 e4 7d 
I (26834) ble_rxtx: new BLE_RXTX connection started
ASSERT_PARAM(0 10), in lld_pdu.c at line 533
Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0)
Core 0 register dump:
PC      : 0x40088406  PS      : 0x00060734  A0      : 0x8008bca8  A1      : 0x3ffbf6d0  
0x40088406: r_assert_param at ??:?

A2      : 0x00000001  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x60008054  
A6      : 0x3ffbe11c  A7      : 0x60008050  A8      : 0x80088401  A9      : 0x3ffbf6b0  
A10     : 0x00000004  A11     : 0x00000000  A12     : 0x6000804c  A13     : 0xffffffff  
A14     : 0x00000000  A15     : 0xfffffffc  SAR     : 0x00000004  EXCCAUSE: 0x00000005  
EXCVADDR: 0x00000000  LBEG    : 0x40088339  LEND    : 0x40088340  LCOUNT  : 0x00000000  
0x40088339: r_assert_param at ??:?

0x40088340: r_assert_param at ??:?

Core 0 was running in ISR context:
EPC1    : 0x40009203  EPC2    : 0x00000000  EPC3    : 0x00000000  EPC4    : 0x40088406
0x40088406: r_assert_param at ??:?


ELF file SHA256: 9d10b0c06be042964201d40cc9a2401faef1ff40d04a05df06f600e25d5267eb

Backtrace: 0x40088403:0x3ffbf6d0 0x4008bca5:0x3ffbf6f0 0x4008b455:0x3ffbf730 0x4008b18d:0x3ffbf750 0x4008c1c2:0x3ffbf770 0x4008d037:0x3ffbf790 0x40086492:0x3ffbf7b0 0x402b743b:0x3ffbc6c0 0x400d29f2:0x3ffbc6e0 0x40092a85:0x3ffbc700 0x40091911:0x3ffbc720
0x40088403: r_assert_param at ??:?

0x4008bca5: r_lld_pdu_rx_handler at ??:?

0x4008b455: r_lld_evt_end at ??:?

0x4008b18d: r_lld_evt_end_isr at ??:?

0x4008c1c2: r_rwble_isr at ??:?

0x4008d037: r_rwbtdm_isr_wrapper at intc.c:?

0x40086492: _xt_lowint1 at /home/marco/esp-homekit-sdk/esp-idf/components/freertos/xtensa_vectors.S:1153

0x402b743b: esp_pm_impl_waiti at /home/marco/esp-homekit-sdk/esp-idf/components/esp32/pm_esp32.c:484

0x400d29f2: esp_vApplicationIdleHook at /home/marco/esp-homekit-sdk/esp-idf/components/esp_common/src/freertos_hooks.c:63

0x40092a85: prvIdleTask at /home/marco/esp-homekit-sdk/esp-idf/components/freertos/tasks.c:3382 (discriminator 1)

0x40091911: vPortTaskWrapper at /home/marco/esp-homekit-sdk/esp-idf/components/freertos/port.c:143


Rebooting...

In this case, the Wi-Fi adapter is also up (running in soft-AP mode with no clients connected).

@TianHao-Yoursmake
Copy link

@enricop @mcilloni , We're working on reproducing the problem. If possible, I hope you can provide the test code, it can save time. Thanks.

@AbnerFederer
Copy link

Hi @mcilloni , could you please describe the issue more detailed? Thank you.

@enricop
Copy link

enricop commented Nov 12, 2019

one more log:

ASSERT_PARAM(1 10), in lld_pdu.c at line 533
Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0)
Core 0 register dump:
PC      : 0x40088405  PS      : 0x00060034  A0      : 0x8008bcac  A1      : 0x3ffbf4b0  
0x40088405: r_assert_param at ??:?

A2      : 0x00000001  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x60008054  
A6      : 0x3ffbe118  A7      : 0x60008050  A8      : 0x80088405  A9      : 0x3ffbf490  
A10     : 0x00000004  A11     : 0x00000000  A12     : 0x6000804c  A13     : 0xffffffff  
A14     : 0x00000000  A15     : 0xfffffffc  SAR     : 0x00000004  EXCCAUSE: 0x00000005  
EXCVADDR: 0x00000000  LBEG    : 0x4008833d  LEND    : 0x40088344  LCOUNT  : 0x00000000  
0x4008833d: r_assert_param at ??:?
0x40088344: r_assert_param at ??:?

Core 0 was running in ISR context:
EPC1    : 0x40009203  EPC2    : 0x00000000  EPC3    : 0x00000000  EPC4    : 0x40088405
0x40088405: r_assert_param at ??:?

ELF file SHA256: f76fd9b02995b35027f5c4f3a1761616d1b487acffce942ea327545a8727237f

Backtrace: 0x40088402:0x3ffbf4b0 0x4008bca9:0x3ffbf4d0 0x4008b459:0x3ffbf510 0x4008b191:0x3ffbf530 0x4008c1c6:0x3ffbf550 0x4008d03b:0x3ffbf570 0x40086496:0x3ffbf590 0x402b971f:0x3ffbc6c0 0x400d29e6:0x3ffbc6e0 0x40098b3d:0x3ffbc700 0x400979c9:0x3ffbc720
0x40088402: r_assert_param at ??:?
0x4008bca9: r_lld_pdu_rx_handler at ??:?
0x4008b459: r_lld_evt_end at ??:?
0x4008b191: r_lld_evt_end_isr at ??:?
0x4008c1c6: r_rwble_isr at ??:?
0x4008d03b: r_rwbtdm_isr_wrapper at intc.c:?
0x40086496: _xt_lowint1 at /home/esp32/esp-homekit-sdk/esp-idf/components/freertos/xtensa_vectors.S:1153
0x402b971f: esp_pm_impl_waiti at /home/esp32/esp-homekit-sdk/esp-idf/components/esp32/pm_esp32.c:484
0x400d29e6: esp_vApplicationIdleHook at /home/esp32/esp-homekit-sdk/esp-idf/components/esp_common/src/freertos_hooks.c:63
0x40098b3d: prvIdleTask at /home/esp32/esp-homekit-sdk/esp-idf/components/freertos/tasks.c:3382 (discriminator 1)
0x400979c9: vPortTaskWrapper at /home/esp32/esp-homekit-sdk/esp-idf/components/freertos/port.c:143

@enricop
Copy link

enricop commented Nov 12, 2019

the crash is caused by an attempt from the mobile phone to connect two times to the GATT:

I (5661) BLE_GATT_service: ESP_GATTS_CONNECT_EVT, conn_id = 0
I (5661) BLE_GATT_service: 7f de aa 4d ec 8c 
I (5661) ble_rxtx: new BLE_RXTX connection started
I (5871) BLE_GATT_service: ESP_GATTS_CONNECT_EVT, conn_id = 1
I (5871) BLE_GATT_service: 57 f5 55 b2 df af 
I (5871) ble_rxtx: new BLE_RXTX connection started
I (6661) BLE_GATT_service: ESP_GATTS_MTU_EVT, MTU 517
I (7201) BLE_GATT_service: ESP_GATTS_MTU_EVT, MTU 517

@enricop
Copy link

enricop commented Nov 12, 2019

now we've seen that it happens sporadically at any time while no existing connection is present

@TianHao-Yoursmake
Copy link

@enricop , did you do BLE scan and advertising simultaneously ? If so , I suggest you disable "BTDM_CONTROLLER_FULL_SCAN_SUPPORTED" in menuconfig. Then take a look if the problem still happen? Thanks.

@enricop
Copy link

enricop commented Nov 13, 2019

hi @TianHao-Espressif
our device use the following features:

  • it is a ble_mesh node with many client models (one for each server model on the network). Basically all the other nodes on the network publish messages to our node.
  • it is provisioned with this method: (esp_ble_mesh_node_prov_enable(ESP_BLE_MESH_PROV_GATT))
  • so it has provisioning support using GATT (PB-GATT) and it has ble_mesh Proxy protocol support
  • we use also the BLE GATT Server for running a custom application protocol

I suppose it never does active scan. Only passive? And it use advertising for the GATT Server.

Our sdkconfig.txt

Which options should we use? In particular:

CONFIG_BTDM_BLE_SCAN_DUPL
CONFIG_BTDM_CTRL_FULL_SCAN_SUPPORTED
CONFIG_BTDM_COEX_BLE_ADV_HIGH_PRIORITY

CONFIG_BT_BLE_HOST_QUEUE_CONG_CHECK
CONFIG_BT_BLE_ACT_SCAN_REP_ADV_SCAN

Thank you

@enricop
Copy link

enricop commented Nov 13, 2019

#2350 #2672

@AbnerFederer
Copy link

AbnerFederer commented Nov 15, 2019

Hi @enricop,

I'm using ESP32-DevKitC (embedded with ESP32-WROVER-B) to reproduce but I failes.
I'm using ESP32-DevKitC to start a Wi-Fi SoftAP and connect with two separate phones (iphone and Xiaomi).
In the meanwhile, ESP32-DevKitC is running GATTC to scan for GATTS on Xiaomi with GATTS writing char to GATTC.

On this occasion, it works without assert.

So I suggest you could follow the advice by @TianHao-Espressif to disable BTDM_CONTROLLER_FULL_SCAN_SUPPORTED in menuconfig, it really helps us to diagnose this issue.

Thank you.

@enricop
Copy link

enricop commented Nov 15, 2019

hi @AbnerFederer ,
thx. We are testing, since yesterday, with BTDM_CONTROLLER_FULL_SCAN_SUPPORTED disabled.

@enricop
Copy link

enricop commented Nov 15, 2019

I'm using ESP32-DevKitC to start a Wi-Fi SoftAP and connect with two separate phones (iphone and Xiaomi).
In the meanwhile, ESP32-DevKitC is running GATTC to scan for GATTS on Xiaomi with GATTS writing char to GATTC.

We are using the wrover-b as : Wifi Station + BLE GATT Server.

@AbnerFederer
Copy link

AbnerFederer commented Nov 18, 2019

I'm using ESP32-DevKitC to start a Wi-Fi SoftAP and connect with two separate phones (iphone and Xiaomi).
In the meanwhile, ESP32-DevKitC is running GATTC to scan for GATTS on Xiaomi with GATTS writing char to GATTC.

We are using the wrover-b as : Wifi Station + BLE GATT Server.

Hi @enricop,

I test your application case today and it works correctly (wrover-b as Wifi + BLE GATT Server).

Xiaomi runs GATT Client to read and write char while ESP32 maintains connecting with iphone which acts as AP.

Maybe my test demo is too easy to reproduce the issue. If you can send us your demo for pressure test, that would be better.

Thank you.

@enricop
Copy link

enricop commented Nov 18, 2019

with BTDM_CONTROLLER_FULL_SCAN_SUPPORTED disabled the issue didn't occur in the last 4 days

@AbnerFederer
Copy link

Hi @enricop

OK, that's great.

We will continue to debug for this and inform you if any improvement.

If the issue happens again, just let us know.

Thank you.

@enricop
Copy link

enricop commented Jan 23, 2020

@AbnerFederer

from today we re-enabled BTDM_CONTROLLER_FULL_SCAN_SUPPORTED to improve ble_mesh scan performace. We will let you know if the timeout still occurs.

@enricop
Copy link

enricop commented Jan 27, 2020

Hi all,

after just a quick test with latest release/v4.0 and BTDM_CTRL_FULL_SCAN_SUPPORTED enabled, the interrupt watchdog timeout occurrs very often:

Screenshot_20200127_123918

full log here: wtd_timeout.txt

@enricop
Copy link

enricop commented Feb 25, 2020

hi all,
does this commit fix anything regarding this bug?
thanks for your support

@GYC-Espressif
Copy link
Contributor

Hello, This issue has been resolved at commit b04e643 in the last release/v4.0. You can try to update to solve this problem. Let me know if you have any questions. Thank you.

@Alvin1Zhang
Copy link
Collaborator

Thanks for reporting and updates. Will close the ticket now, feel free to reopen if the issue still happens. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants