Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when WIFI enabled but not connected and DMX messages coming in #13

Closed
chaosloth opened this issue Mar 9, 2022 · 2 comments
Closed

Comments

@chaosloth
Copy link
Contributor

Issue

Random but consistent crash when processing DMX messages when not connected to WIFI.

When there are no incoming DMX messages then no crash occurs and the ESP continues to attempt to connect in the background.

Similarly, if WIFI is connected and DMX messages do or do not arrive then no crash occurs.

Likewise if WIFI is connected and DMX message are being received, and then WIFI disconnects then the crash does occur.

I've tried multiple hardware (various ESP boards) along with different incoming DMX message framerates, I can get this to occur in as low as 2 fps. Although it occurs more frequently at high frame rates - I suspect this is a function of the ISR being called.

Steps to reproduce:

  1. Using esp_dmx v1.1.3
  2. Configure WIFI to connect to an AP (i.e. ESP has credentials stored)
  3. AP is offline or not in range
  4. Stream incoming DMX messages (UART 1 in my case)
  5. After between 0 - 40 seconds CPU will crash

Example code

As of writing the current master is the same as this tagged release (which replicates the issue) https://github.com/chaosloth/Connotron_DMX_Gateway/releases/tag/v0.0.2-beta

Notes

I've seen discussion on the intewebs regarding Cache disabled but cached memory region accessed exceptions with the proposed fix as marking ISR functions with IRAM_ATTR. I see that the ISR functions are indeed annotated and that the function called out in the exception below is an inline function which should also inherit this.

Exception - Example 1

Decoded

ESP exception decoder points to dmx_hal_write_txinfo function, however I have also seen the exception point to dmx_hal_get_rxfifo_len too. The above decodes as follows:

PC: 0x400d9836: dmx_hal_write_txfifo at /Users/cc/Documents/Arduino/libraries/esp_dmx-1.1.3/src/dmx/hal.h line 369
EXCVADDR: 0x00000000

Decoding stack results
0x400d9833: dmx_hal_write_txfifo at /Users/cc/Documents/Arduino/libraries/esp_dmx-1.1.3/src/dmx/hal.h line 367

Dump

23:40:39.489 -> 
23:40:39.489 -> Core  1 register dump:
23:40:39.489 -> PC      : 0x400d9836  PS      : 0x00060035  A0      : 0x80081322  A1      : 0x3ffbf16c  
23:40:39.575 -> A2      : 0x00000000  A3      : 0x3ffb2938  A4      : 0x3ff50000  A5      : 0x3ffbf1b4  
23:40:39.575 -> A6      : 0x3ffc4e68  A7      : 0x84000254  A8      : 0x0000001c  A9      : 0x00000078  
23:40:39.575 -> A10     : 0x6002e000  A11     : 0x00000000  A12     : 0x8008ed14  A13     : 0x3ffbaad0  
23:40:39.575 -> A14     : 0x00000003  A15     : 0x00060023  SAR     : 0x00000000  EXCCAUSE: 0x00000007  
23:40:39.575 -> EXCVADDR: 0x00000000  LBEG    : 0x40084749  LEND    : 0x40084751  LCOUNT  : 0x00000026  
23:40:39.575 -> 
23:40:39.575 -> 
23:40:39.575 -> Backtrace:0x400d9833:0x3ffbf16c |<-CORRUPTED
23:40:39.575 -> 
23:40:39.575 -> 
23:40:39.575 -> 
23:40:39.575 -> 
23:40:39.575 -> ELF file SHA256: 0000000000000000
23:40:39.575 -> 
23:40:39.575 -> Rebooting...
23:40:39.575 -> ets Jun  8 2016 00:22:57
23:40:39.575 -> 
23:40:39.575 -> rst:0x3 (SW_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
23:40:39.575 -> configsip: 0, SPIWP:0xee
23:40:39.575 -> clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
23:40:39.575 -> mode:DIO, clock div:1
23:40:39.575 -> load:0x3fff0030,len:1324
23:40:39.575 -> ho 0 tail 12 room 4
23:40:39.575 -> load:0x40078000,len:13508
23:40:39.575 -> load:0x40080400,len:3604
23:40:39.575 -> entry 0x400805f0
23:40:39.822 -> [     3][E][WiFiGeneric.cpp:586] wifiLow⸮f⸮⸮⸮%⸮⸮ѡ): esp_wifi_init 4353
23:40:39.861 -> [     5][D][esp32-hal-cpu.c:211] setCpuFrequencyMhz(): PLL: 480 / 2 = 240 Mhz, APB: 80000000 Hz

Exception - Example 2

Decoded

This exception occured after reboot, pointing to a slightly different part of the code but similar in that it occures in the HAL code.

PC: 0x4014ffe4: dmx_hal_get_rxfifo_len at /Users/cc/Documents/Arduino/libraries/esp_dmx-1.1.3/src/dmx/hal.h line 64
EXCVADDR: 0x00000000

Decoding stack results
0x4014ffe1: WiFiUDP::remotePort() at /Users/cc/Library/Arduino15/packages/esp32/hardware/esp32/2.0.2/libraries/WiFi/src/WiFiUdp.cpp line 280

Dump

00:22:24.988 -> rst:0x3 (SW_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
00:22:24.988 -> configsip: 0, SPIWP:0xee
00:22:24.988 -> clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
00:22:24.988 -> mode:DIO, clock div:1
00:22:24.988 -> load:0x3fff0030,len:1324
00:22:24.988 -> ho 0 tail 12 room 4
00:22:24.988 -> load:0x40078000,len:13508
00:22:24.988 -> load:0x40080400,len:3604
00:22:24.988 -> entry 0x400805f0
00:22:25.204 -> [     3][E][WiFiGeneric.cpp:586] wifiLow⸮ff⸮⸮%⸮⸮ѡ): esp_wifi_init 4353
00:22:25.242 -> [     5][D][esp32-hal-cpu.c:211] setCpuFrequencyMhz(): PLL: 480 / 2 = 240 Mhz, APB: 80000000 Hz
00:22:25.427 -> 
00:22:25.427 -> Starting ConnoDMX Gateway on ESP32_DEV WIFI Manager: ESPAsync_WiFiManager v1.12.0
00:22:25.469 -> Normal mode. Entering WIFI_STA mode
00:22:25.469 -> [   247][D][WiFiGeneric.cpp:831] _eventCallback(): Arduino Event: 0 - WIFI_READY
00:22:25.537 -> [   338][D][WiFiGeneric.cpp:831] _eventCallback(): Arduino Event: 2 - STA_START
00:22:25.540 -> Guru Meditation Error: Core  1 panic'ed (Cache disabled but cached memory region accessed). 
00:22:25.586 -> 
00:22:25.586 -> Core  1 register dump:
00:22:25.586 -> PC      : 0x4014ffe4  PS      : 0x00060035  A0      : 0x40084c5c  A1      : 0x3ffbf18c  
00:22:25.586 -> A2      : 0x3ffb90bc  A3      : 0x3ffbdcc8  A4      : 0x00000000  A5      : 0x3ffbdcc4  
00:22:25.586 -> A6      : 0x00000001  A7      : 0x3ffbdcc4  A8      : 0x800813c8  A9      : 0x3ffbf16c  
00:22:25.586 -> A10     : 0x3ff50000  A11     : 0x0001819d  A12     : 0x800840d0  A13     : 0x3ffbaae0  
00:22:25.586 -> A14     : 0x3ffc4e68  A15     : 0x84000254  SAR     : 0x00000000  EXCCAUSE: 0x00000007  
00:22:25.659 -> EXCVADDR: 0x00000000  LBEG    : 0x40084749  LEND    : 0x40084751  LCOUNT  : 0x00000027  
00:22:25.659 -> 
00:22:25.659 -> 
00:22:25.659 -> Backtrace:0x4014ffe1:0x3ffbf18c |<-CORRUPTED

Exception - Example 3

Decoded

PC: 0x4015004c: dmx_hal_get_rxfifo_len at /Users/cc/Documents/Arduino/libraries/esp_dmx-1.1.3/src/dmx/hal.h line 74
EXCVADDR: 0x00000000

Decoding stack results
0x40150049: dmx_hal_get_rxfifo_len at /Users/cc/Documents/Arduino/libraries/esp_dmx-1.1.3/src/dmx/hal.h line 74

Dump

00:23:05.840 -> Guru Meditation Error: Core  1 panic'ed (Cache disabled but cached memory region accessed). 
00:23:05.840 -> 
00:23:05.840 -> Core  1 register dump:
00:23:05.840 -> PC      : 0x4015004c  PS      : 0x00060035  A0      : 0x800813f5  A1      : 0x3ffbf15c  
00:23:05.840 -> A2      : 0x000002bb  A3      : 0x00000078  A4      : 0x00000078  A5      : 0x00000201  
00:23:05.840 -> A6      : 0x3ffb91a0  A7      : 0x84000254  A8      : 0x00000243  A9      : 0x00000080  
00:23:05.878 -> A10     : 0x01002b8f  A11     : 0x00000000  A12     : 0x8008ed14  A13     : 0x3ffbaad0  
00:23:05.878 -> A14     : 0x00000003  A15     : 0x00060223  SAR     : 0x00000000  EXCCAUSE: 0x00000007  
00:23:05.878 -> EXCVADDR: 0x00000000  LBEG    : 0x40084749  LEND    : 0x40084751  LCOUNT  : 0x00000027  
00:23:05.878 -> 
00:23:05.878 -> 
00:23:05.878 -> Backtrace:0x40150049:0x3ffbf15c |<-CORRUPTED
00:23:05.878 -> 
00:23:05.878 -> 
00:23:05.878 -> 
00:23:05.878 -> 
00:23:05.878 -> ELF file SHA256: 0000000000000000
00:23:05.912 -> 
00:23:05.912 -> Rebooting...
00:23:05.912 -> ets Jun  8 2016 00:22:57
00:23:05.912 -> 
00:23:05.912 -> rst:0x3 (SW_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
00:23:05.912 -> configsip: 0, SPIWP:0xee
00:23:05.912 -> clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
00:23:05.912 -> mode:DIO, clock div:1
00:23:05.912 -> load:0x3fff0030,len:1324
00:23:05.912 -> ho 0 tail 12 room 4
00:23:05.912 -> load:0x40078000,len:13508
00:23:05.912 -> load:0x40080400,len:3604
00:23:05.912 -> entry 0x400805f0
@chaosloth
Copy link
Contributor Author

Found issue, have opened pull request #14

@someweisguy
Copy link
Owner

Great find and great writeup! It's been a busy week for me, so I really appreciate you taking a look into this. I'll go ahead and merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants