Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UART0 Slow Baud Reset Issue on Arduino ESP32 | Update : Suggests reset issue while writing Core register dump for CPU 1 at UART0 Baud < 6207 #7092

Open
1 task done
gracengeer opened this issue Aug 6, 2022 · 14 comments
Assignees
Labels
Area: ESP-IDF related ESP-IDF related issues Status: Needs investigation We need to do some research before taking next steps on this issue Type: Bug 🐛 All bugs

Comments

@gracengeer
Copy link

gracengeer commented Aug 6, 2022

Board

ESP32 Dev Module

Device Description

None

Hardware Configuration

None

Version

v2.0.4

IDE Name

Arduino IDE, esp-idf on Visual Studio Code

Operating System

Windows 11

Flash frequency

80MHz

PSRAM enabled

no

Upload speed

921600

Description

ESP32 UART0 freezing and stuck at low baud rate and is unable to recover on WDT reset for exception errors

I am working on an ESP32 application which requires use of all three UARTs at low baud rate with UART0 baud rate strictly needed to be 4800 due to master IC specification.

To summarize, the behavior looks like following,

Arduino version for ESP32 : v1.0.4 with underlying esp-idf : v3.2.3 for Case 1 and Case 2
esp-idf : v3.2 for Case 3

Chip: ESP32- WROOM-32D
Flash : 16M (3M App + 9M FATFs)
Flash baud : 921600
CPU Freq : 240 MHz
Flash Freq : 80 MHz
Flash Mode : DIO
Debug Level : Info/Debug

Same behavior has been observed with latest Arduino version for ESP32 with esp-idf : v4.4.2 so it looks like have not been reported/fixed yet.

Case 1 : Arduino code on Arduino IDE : 4800 baud on UART0, ESP32 freezes and unable to undergo soft reset with partial output on Core 0 dump debug
Case 2 : idf code on Arduino IDE : 4800 baud on UART0, ESP32 freezes and unable to undergo soft reset with partial output on Core 0 dump debug
Case 3 : idf code on esp-idf : 4800 baud on UART0, ESP32 can undergo soft reset with full output on Core 0 dump debug

On further debugging, I found that the cut-off baud rate on UART0 to be 6207, below which the ESP32 freezes for any of the exception errors and is unable to reset. So it looks like very specific to the baud rate that is set

Sketch

############################
idf code running in Arduino IDE and idf platorm
############################

#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_system.h"
#include "esp_log.h"
#include "driver/uart.h"
#include "string.h"
#include "driver/gpio.h"


static const int RX_BUF_SIZE = 1024;

void setup() 
{
    const uart_config_t uart_config0 = {
        .baud_rate = 4800,
        .data_bits = UART_DATA_8_BITS,
        .parity = UART_PARITY_DISABLE,
        .stop_bits = UART_STOP_BITS_1,
        .flow_ctrl = UART_HW_FLOWCTRL_DISABLE,
        //.source_clk = UART_SCLK_APB, 
    };


    uart_driver_install(UART_NUM_0, RX_BUF_SIZE * 2, 0, 0, NULL, 0);
    uart_param_config(UART_NUM_0, &uart_config0);
    uart_set_pin(UART_NUM_0, UART_PIN_NO_CHANGE, UART_PIN_NO_CHANGE, UART_PIN_NO_CHANGE, UART_PIN_NO_CHANGE);

}

void loop() 
{
    const float value= 1000/0;
    ESP_LOGI("None", "Read value : %f ", value);
}

############################
Arduino code in Arduino IDE:
############################
void setup() 
{
     Serial.begin(4800);     
}

void loop() 
{
    float value= 1000/0;
    Serial.println(value);
    delay(1);
}

Debug Message

==================================================
UART0 @ 9600 baud Arduino code in Arduino - Core Debug Level : Info
==================================================
---------------------------
Serial out @ 9600 baud
---------------------------
Ԋ�Θ⸮�)1Q1⸮�⸮⸮⸮)⸮!⸮1A⸮Guru Meditation Error: Core  1 panic'ed (IntegerDivideByZero). Exception was unhandled.
Core 1 register dump:
PC      : 0x400d0c18  PS      : 0x00060f30  A0      : 0x800d1f0c  A1      : 0x3ffb1f90  
A2      : 0x00000000  A3      : 0x3ffb0060  A4      : 0x00000020  A5      : 0x80000020  
A6      : 0x00000008  A7      : 0x00000001  A8      : 0x000003e8  A9      : 0x3ffb1f40  
A10     : 0x00000000  A11     : 0x00002580  A12     : 0x0800001c  A13     : 0x00000003  
A14     : 0x00000001  A15     : 0x00000000  SAR     : 0x0000001f  EXCCAUSE: 0x00000006  
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000  

Backtrace: 0x400d0c18:0x3ffb1f90 0x400d1f09:0x3ffb1fb0 0x40088215:0x3ffb1fd0

Rebooting...

---------------------------
Serial out @ 115200 baud
---------------------------
ets Jun  8 2016 00:22:57

rst:0x10 (RTCWDT_RTC_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:8896
load:0x40080400,len:5816
entry 0x400806ac

rst:0xc (SW_CPU_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:8896
load:0x40080400,len:5816
entry 0x400806ac

rst:0xc (SW_CPU_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:8896
load:0x40080400,len:5816
entry 0x400806ac


==================================================
UART0 @ 4800 baud Arduino code in Arduino - Core Debug Level : Info
==================================================
---------------------------
Serial out @ 4800 baud
---------------------------
*T4⸮⸮٥Guru Meditation Error: Core  1 panic'ed (IntegerDivideByZero). Exception was unhandled.
Core 1 register dump:
PC      : 0x400d0c18  PS      : 0x00060d30  A0      : 0x800d1f0c  A1      : 0x3ffb1f90  
A2      : 0x00000000  A3      : 0x3ffb0060  A4      : 0x00000020  A5      : 0x80000020  
A6      : 0x00000008  A7      : 0x00000001  A8      : 0x000003e8  A9      : 0x3ffb1f40  
A10     : 0x00000000  A11     : 0x000012c0  A12     : 0x0800001c  A13     : 0x00000003  
A14 
(Frozen with partial serial output)
---------------------------
Serial out @ 115200 baud
---------------------------
ets Jun  8 2016 00:22:57

rst:0x10 (RTCWDT_RTC_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:8896
load:0x40080400,len:5816
entry 0x400806ac
(Frozen here)

Other Steps to Reproduce

None

I have checked existing issues, online documentation and the Troubleshooting Guide

  • I confirm I have checked existing issues, online documentation and Troubleshooting guide.
@gracengeer gracengeer added the Status: Awaiting triage Issue is waiting for triage label Aug 6, 2022
@SuGlider SuGlider self-assigned this Aug 7, 2022
@SuGlider
Copy link
Collaborator

SuGlider commented Aug 7, 2022

@gracengeer - It all sounds a bit confusing...

I see this error:
Guru Meditation Error: Core 1 panic'ed (IntegerDivideByZero). Exception was unhandled.

And also this example code:
float value= 1000/0;

Both seem linked to each other... Which has nothing to do with the Serial port or its baudrate.

@SuGlider SuGlider added Type: Question Only question and removed Status: Awaiting triage Issue is waiting for triage labels Aug 7, 2022
@gracengeer
Copy link
Author

gracengeer commented Aug 7, 2022

@SuGlider - Thanks for assigning. I tried to emulate a Divide by Zero error on purpose using the sample code to demonstrate the issue.

For references, please try to compare the debug output I have added at 4800 baud set on UART0 vs 9600 baud set on UART0 in code, the Core 0 debug output which is linked to UART0 gets stuck at some point without printing out the complete backtrace for 4800 baud set on UART0. It is at this moment the ESP32 is frozen completely. This problem occurs until the UART0 baud is 6207 and above where ESP32 can reset by itself after getting this exception error.

Also to add, WDT reset even doesn't help to recover error/exceptions for lower bauds on UART0 at runtime.

@SuGlider SuGlider added Type: Bug 🐛 All bugs Area: ESP-IDF related ESP-IDF related issues and removed Type: Question Only question labels Aug 7, 2022
@SuGlider
Copy link
Collaborator

SuGlider commented Aug 7, 2022

@gracengeer - Thanks, issue confirmed.

Indeed, it seems that ESP32 can't recover in Soft Reset after an Exception when its UART 0 has been set to any baudrate lower than or equal to 4800.

It seems to be an issue from IDF, or something else (ROM SW?), and not with Arduino.
@VojtechBartoska - Could you please check this with IDF team?

@SuGlider
Copy link
Collaborator

SuGlider commented Aug 7, 2022

@VojtechBartoska - I tested it with 2400, 4800 and 9600.
I also used esp_restart() to check if the issue was in the boot, but it works fine.
It seems related to Guru Mediation dumping process... something may timeout and set ESP32 in a bad state.

using baudrate 2400 -- system halts after incomplete Guru Mediation Error message

Guru Meditation Error: Core  1 panic'ed (Unhandled debug exception). 
Debug exception reason: BREAK instr 
Core  1 register dump:
PC      : 0x400d11a0  PS      : 0x00060836  A0      : 0x800d1bed  A1      : 0x3ffb27f0  
A2      : 0x00000

using baudrate 4800 -- system halts after incomplete Guru Mediation Error message

Guru Meditation Error: Core  1 panic'ed (Unhandled debug exception). 
Debug exception reason: BREAK instr 
Core  1 register dump:
PC      : 0x400d11a0  PS      : 0x00060836  A0      : 0x800d1bed  A1      : 0x3ffb27f0  
A2      : 0x00000000  A3      : 0x3ffc1170  A4      : 0x00004e20  A5      : 0x00000004  
A6      : 0x3ffb8874  A7      : 0x80000001  A8      : 0x800d11a0  A9      : 0x3ffb27b0  
A10     : 0x3ffc1170  A11     : 0x00000000  A12     : 0x0800001c  A13     : 0

using baudrate 9600 -- system resets and restarts after complete Guru Mediation Error message

Guru Meditation Error: Core  1 panic'ed (Unhandled debug exception). 
Debug exception reason: BREAK instr 
Core  1 register dump:
PC      : 0x400d11a0  PS      : 0x00060a36  A0      : 0x800d1bed  A1      : 0x3ffb27f0  
A2      : 0x00000000  A3      : 0x3ffc1170  A4      : 0x00004e20  A5      : 0x00000004  
A6      : 0x3ffb8874  A7      : 0x80000001  A8      : 0x800d11a0  A9      : 0x3ffb27b0  
A10     : 0x3ffc1170  A11     : 0x00000000  A12     : 0x0800001c  A13     : 0xffffffff  
A14     : 0xffffffff  A15     : 0x00000000  SAR     : 0x0000001d  EXCCAUSE: 0x00000001  
EXCVADDR: 0x00000000  LBEG    : 0x4008710a  LEND    : 0x40087115  LCOUNT  : 0x00000000  


Backtrace:0x400d119d:0x3ffb27f00x400d1bea:0x3ffb2820 




ELF file SHA256: 0000000000000000

Rebooting...

@gracengeer
Copy link
Author

gracengeer commented Aug 8, 2022

@SuGlider @VojtechBartoska - I have tried to emulate this using ESP-IDF on IDF versions v4.4.1 and v3.2 earlier to compare with Arduino and it is able to undergo reset without any issue, so definitely doesn't look bad on the ESP-IDF. I wonder if it is coming through there some configuration setting under sdkconfig file for Arduino?

Test Case: UART0 set to 4800 baud

##############################################################################
Serial debug out @ 4800 baud - For Core 0 dump (suggests bad data on printing)
##############################################################################

Core 0 register dump:
PC : 0x400d4f10 PS : 0x00060830 A0 : 0x800e77da A1 : 0x3ffb5930
A2 : 0x00000000 A3 : 0x00000001 A4 : 0x00000001 A5 : 0x00000001
A6 : 0x00000000 A7 : 0x00060f23 A8 : 0x000003e8 A9 : 0x3ffb58f0
A10 : 0x00000000 A11 : 0x3ffb2404 A12 : 0x00004e20 A13 : 0x383f8000
A14 @����@��� ;��)��$^��8��Ty���p)�Fa�U$i%���4�e(����*�y�]%,M���֟���)��|M��Q�$$ilF�7���̇��x�

##############################################################################
Serial debug out @ 115200baud
##############################################################################

rst:0x7 (TG0WDT_SYS_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:6660
load:0x40078000,len:14848
ho 0 tail 12 room 4
load:0x40080400,len:3792
entry 0x40080694
�[0;32mI (171) boot: ESP-IDF v4.4.2 2nd stage bootloader�[0m
�[0;32mI (171) boot: compile time 12:14:07�[0m
�[0;32mI (172) boot: chip revision: 1�[0m
�[0;32mI (175) boot_comm: chip revision: 1, min. bootloader chip revision: 0�[0m
�[0;32mI (182) boot.esp32: SPI Speed : 40MHz�[0m
�[0;32mI (187) boot.esp32: SPI Mode : DIO�[0m
�[0;32mI (192) boot.esp32: SPI Flash Size : 16MB�[0m
�[0;33mW (196) boot.esp32: PRO CPU has been reset by WDT.�[0m
�[0;33mW (202) boot.esp32: WDT reset info: PRO CPU PC=0x400d1ada�[0m
�[0;33mW (208) boot.esp32: WDT reset info: APP CPU PC=0x400e7122 (waiti mode)�[0m
�[0;32mI (215) boot: Enabling RNG early entropy source...�[0m
�[0;32mI (221) boot: Partition Table:�[0m
�[0;32mI (224) boot: ## Label Usage Type ST Offset Length�[0m
�[0;32mI (232) boot: 0 nvs WiFi data 01 02 00009000 00006000�[0m
�[0;32mI (239) boot: 1 phy_init RF data 01 01 0000f000 00001000�[0m
�[0;32mI (247) boot: 2 factory factory app 00 00 00010000 00100000�[0m
�[0;32mI (254) boot: End of partition table�[0m
�[0;32mI (259) boot_comm: chip revision: 1, min. application chip revision: 0�[0m
�[0;32mI (266) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=094fch ( 38140) map�[0m
�[0;32mI (288) esp_image: segment 1: paddr=00019524 vaddr=3ffb0000 size=02368h ( 9064) load�[0m
�[0;32mI (292) esp_image: segment 2: paddr=0001b894 vaddr=40080000 size=04784h ( 18308) load�[0m
�[0;32mI (302) esp_image: segment 3: paddr=00020020 vaddr=400d0020 size=1799ch ( 96668) map�[0m
�[0;32mI (337) esp_image: segment 4: paddr=000379c4 vaddr=40084784 size=07aa4h ( 31396) load�[0m
�[0;32mI (351) esp_image: segment 5: paddr=0003f470 vaddr=50000000 size=00010h ( 16) load�[0m
�[0;32mI (357) boot: Loaded app from partition at offset 0x10000�[0m
�[0;32mI (357) boot: Disabling RNG early entropy source...�[0m
�[0;32mI (371) cpu_start: Pro cpu up.�[0m
�[0;32mI (371) cpu_start: Starting app cpu, entry point is 0x400810f0�[0m
�[0;32mI (0) cpu_start: App cpu up.�[0m
�[0;32mI (385) cpu_start: Pro cpu start user code�[0m
�[0;32mI (385) cpu_start: cpu freq: 240000000�[0m
�[0;32mI (385) cpu_start: Application information:�[0m
�[0;32mI (390) cpu_start: Project name: uart_async_rxtxtasks�[0m
�[0;32mI (396) cpu_start: App version: 1�[0m
�[0;32mI (400) cpu_start: Compile time: Aug 8 2022 12:13:07�[0m
�[0;32mI (406) cpu_start: ELF file SHA256: b9ceff7de43675f1...�[0m
�[0;32mI (412) cpu_start: ESP-IDF: v4.4.2�[0m
�[0;32mI (417) heap_init: Initializing. RAM available for dynamic allocation:�[0m
�[0;32mI (424) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM�[0m
�[0;32mI (430) heap_init: At 3FFB2C60 len 0002D3A0 (180 KiB): DRAM�[0m
�[0;32mI (437) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM�[0m
�[0;32mI (443) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM�[0m
�[0;32mI (449) heap_init: At 4008C228 len 00013DD8 (79 KiB): IRAM�[0m
�[0;32mI (457) spi_flash: detected chip: gd�[0m
�[0;32mI (460) spi_flash: flash io: dio�[0m
��ets Jun 8 2016 00:22:57

##############################################################################
Test code
##############################################################################
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_system.h"
#include "esp_log.h"
#include "driver/uart.h"
#include "string.h"
#include "driver/gpio.h"

static const int RX_BUF_SIZE = 1024;

void init()
{
const uart_config_t uart_config0 = {
.baud_rate = 4800,
.data_bits = UART_DATA_8_BITS,
.parity = UART_PARITY_DISABLE,
.stop_bits = UART_STOP_BITS_1,
.flow_ctrl = UART_HW_FLOWCTRL_DISABLE,
.source_clk = UART_SCLK_APB,
};

uart_driver_install(UART_NUM_0, RX_BUF_SIZE * 2, 0, 0, NULL, 0);
uart_param_config(UART_NUM_0, &uart_config0);
uart_set_pin(UART_NUM_0, UART_PIN_NO_CHANGE, UART_PIN_NO_CHANGE, UART_PIN_NO_CHANGE, UART_PIN_NO_CHANGE);

}

void app_main(void)
{
init();
const float value= 1000/0;
ESP_LOGI("None", "Read value : %f ", value);

}

@gracengeer
Copy link
Author

gracengeer commented Aug 11, 2022

I have tried to check further on this issue. To brief, we currently have thousand of production devices at unmanned remote location which may affect due to this critical bug and unable to resolve unless someone resets it manually so kindly looking for a quick addressal and response to this bug.

Like I mentioned earlier it is unable to perform WDT resets but that doesn't seem to be the case actually.

Code Logic Case : UART0 @ 4800, with infinite while loop inside void loop() with task WDT timer enabled

Output :
E (6107) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (6107) task_wdt: - IDLE1 (CPU 1)
E (6107) task_wdt: - loopTask (CPU 1)
E (6107) task_wdt: Tasks currently running:
E (6107) task_wdt: CPU 0: IDLE0
E (6107) taskGuru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)
Core 0 register dump:
PC : 0x4000921a PS : 0x00060034 A0 : 0x80007d16 A1 : 0x3ffbe0c0
A2 : 0x00800000 A3 : 0x60000000 A4 : 0x00000000 A5 : 0x00000000
A6 : 0x3ffb9a20 A7 : 0x3ffbc0dc A8 : 0x3ff40000 A9 : 0x0000005f
A10 : 0x00800000 A11 : 0x3ff4001c A12 : 0x8000⸮⸮⸮X⸮'⸮Xհ⸮⸮E (6107) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (6107) task_wdt: - IDLE1 (CPU 1)
E (6107) task_wdt: - loopTask (CPU 1)
E (6107) task_wdt: Tasks currently running:
E (6107) task_wdt: CPU 0: IDLE0
E (6107) taskGuru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)
Core 0 register dump:
PC : 0x4000921a PS : 0x00060034 A0 : 0x80007d16 A1 : 0x3ffbe0c0
A2 : 0x00800000 A3 : 0x60000000 A4 : 0x00000000 A5 : 0x00000000
A6 : 0x3ffb9a20 A7 : 0x3ffbc0dc A8 : 0x3ff40000 A9 : 0x0000005f
A10 : 0x00800000 A11 : 0x3ff4001c A12 : 0x800�⸮���.�԰�

In one of our production device, I noticed a fatal error with message Instruction Fetch prohibited but seem the ESP32 was able to reset by itself. It was writing Core register dump for CPU Core 0

Code Logic Case: UART0 @ 4800, recurring function call of void loop()
Output :
Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x4008476c PS : 0x00060130 A0 : 0x8008496d A1 : 0x3ffbbf70
A2 : 0x3ffba954 A3 : 0x00060123 A4 : 0x00060120 A5 : 0x3ffb1ff0
A6 : 0x00060920 A7 : 0x00000000 A8 : 0xa5a5a5a5 A9 : 0x00000000
A10 : 0x00000003 A11 : 0x00060123 A12 : 0x00060120 A13 : 0x00000000
A14 : 0x0⸮�휁)⸮=��⸮⸮Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x4008476c PS : 0x00060130 A0 : 0x8008496d A1 : 0x3ffbbf70
A2 : 0x3ffba954 A3 : 0x00060123 A4 : 0x00060120 A5 : 0x3ffb1ff0
A6 : 0x00060920 A7 : 0x00000000 A8 : 0xa5a5a5a5 A9 : 0x00000000
A10 : 0x00000003 A11 : 0x00060123 A12 : 0x00060120 A13 : 0x00000000
A14 : 0x0⸮@⸮⸮⸮⸮H⸮J⸮⸮⸮Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:

In above case too, the device is able to undergo reset on this fatal error.

So the next step was to verify a Divide by Zero error in a function running as a task on Core 0 and Core 1 and comparing their debug outputs.

Case : Divide by Zero Error code as Task on Core 0

Output:

Core 0 register dump:
PC : 0x400d0c04 PS : 0x00060030 A0 : 0x80088218 A1 : 0x3ffb46d0
A2 : 0x00000000 A3 : 0x00000000 A4 : 0x00060023 A5 : 0x3ffb8058
A6 : 0x00000000 A7 : 0x00000000 A8 : 0x000003e8 A9 : 0x3ffbbed0
A10 : 0x00000000 A11 : 0x3ffb2080 A12 : 0x00000020 A13 : 0x80000020
A14 ⸮⸮⸮⸮
⸮�⸮��⸮⸮Guru Meditation Error: Core 0 panic'ed (IntegerDivideByZero). Exception was unhandled.
Core 0 register dump:
PC : 0x400d0c04 PS : 0x00060030 A0 : 0x80088218 A1 : 0x3ffb46d0
A2 : 0x00000000 A3 : 0x00000000 A4 : 0x00060023 A5 : 0x3ffb8058
A6 : 0x00000000 A7 : 0x00000000 A8 : 0x000003e8 A9 : 0x3ffbbed0
A10 : 0x00000000 A11 : 0x3ffb2080 A12 : 0x00000020 A13 : 0x80000020
A14 ⸮⸮7⸮⸮��⸮ Aּ⸮Guru Meditation Error: Core 0 panic'ed (IntegerDivideByZero). Exception was unhandled.
Core 0 register dump:

Case : Divide by Zero Error as Task on Core 1

Output:
V⸮�⸮٥⸮⸮ ⸮i⸮ɽ⸮. Exception was unhandled.
Core 1 register dump:
PC : 0x400d0c04 PS : 0x00060030 A0 : 0x80088218 A1 : 0x3ffb46d0
A2 : 0x00000000 A3 : 0x00000000 A4 : 0x00060023 A5 : 0x3ffb8474
A6 : 0x00000000 A7 : 0x00000000 A8 : 0x000003e8 A9 : 0x3ffb1e30
A10 : 0x00000000 A11 : 0x3ffb2080 A12 : 0x00000020 A13 : 0x80000020
A14

@VojtechBartoska @SuGlider : I wasn't able to simulate the code break instruction, but looking at the debug output of @SuGlider and if you check divide by zero error of mine, both seem to write to the Core 1 register dump and got stuck on low UART0 baud. And with above Divide by zero code that I tried to run on Core 0 and Core 1 with the output it confirms that any exception or fault that is generated on Core 1 when UART0 baud< 6207 seem to freeze ESP32 whereas if on Core 0, it is able to reset. As Arduino sketch runs on Core 1 and possibly there would be many cases leading to such fatal errors, its not very easy to predict on code level and catch runtime behavior of ESP32 and needs to be solved as soon as possible. For me, it is a mission critical application and a very critical bug to be taken care of, so kindly looking for a quick attention.

@gracengeer gracengeer changed the title UART0 Slow Baud Reset Issue on Arduino ESP32 UART0 Slow Baud Reset Issue on Arduino ESP32 | Update : Suggests reset issue while writing Core register dump for CPU 1 at UART0 Baud < 6207 Aug 11, 2022
@SuGlider
Copy link
Collaborator

@gracengeer - It is possible to choose the Core where Arduino Task will run using Arduino IDE.

Would it solve the issue?

image

@SuGlider
Copy link
Collaborator

@gracengeer - I just tested your sketch and you are right!
If I set Arduino to run in Core 1, I can see it freezing after Guru Mediation Error.

But if I set Arduino to run in Core 0, it goes through the reset and starts over again correctly.

It seems that this is the way to solve this issue as a Work Around...

@gracengeer
Copy link
Author

@SuGlider : Unfortunately it is not as straightforward for me to fix this using suggested workaround. Since my current application is pretty much stable on an older version Arduino for ESP32 1.0.4/IDF v3.2.3 which doesn't have the option to set Arduino sketch execution CPU Core. If I happen to update to the newest version Arduino for ESP32, which I tried earlier, it breaks most of my current application with some functions unable to compile totally. Same scenario is if I force the code to be executed on Core 0 using Task for my current older version Arduino.

The workaround would also demand a lot of time - investigation effort for me which is not feasible at the moment and wouldn't guarantee the code stability until production run. The issue I have posted are actually identified in production run and not reported over several Arduino revisions until now. Also, until the root cause is fixed, workaround doesn't sound a good solution for complex piece of code, so if possible I would be glad if it is solved at root level. I am willing to contribute in whatever possible manner.

@SuGlider
Copy link
Collaborator

There is a way to change the Arduino Running Core within ESP32 1.0.4/IDF v3.2.3.
You need to manually change it in the sdkconfig.h file.

This file is at Arduino15 folder (Arduino Instalation Folder), which depends on what operating system is being used.
https://support.arduino.cc/hc/en-us/articles/360018448279-Open-the-Arduino15-folder

The file is at:
Arduino15 folder/packages/esp32/hardware/esp32/1.0.4/tools/sdk/include/config/sdkconfig.h

Open <sdkconfig.h> and search for CONFIG_ARDUINO_EVENT_RUNNING_CORE
There should be a line like this:
#define CONFIG_ARDUINO_EVENT_RUNNING_CORE 1

change it use Core 0:
#define CONFIG_ARDUINO_EVENT_RUNNING_CORE 0

Recompile your sketch and now Arduino will run in Core 0 instead of Core 1.

For your information, this is used in main.cpp when creating the Arduino Task, in app_main():

extern "C" void app_main()
{
    loopTaskWDTEnabled = false;
    initArduino();
    xTaskCreateUniversal(loopTask, "loopTask", 8192, NULL, 1, &loopTaskHandle, CONFIG_ARDUINO_RUNNING_CORE);
}

@SuGlider
Copy link
Collaborator

Also, until the root cause is fixed, workaround doesn't sound a good solution for complex piece of code, so if possible I would be glad if it is solved at root level. I am willing to contribute in whatever possible manner.

I'll investigate it and try to find more information about it.
But I think it has to do with Panic Interrupt and Threatment.
Another potential solution is to use Silent Panic, which means that no message would be displyed, just a reset.
For that it is necessary to recompile the Arduino Core IDF Libraries with the right setting in sdkconfig.

@SuGlider SuGlider added the Status: Needs investigation We need to do some research before taking next steps on this issue label Aug 12, 2022
@gracengeer
Copy link
Author

There is a way to change the Arduino Running Core within ESP32 1.0.4/IDF v3.2.3. You need to manually change it in the sdkconfig.h file.

This file is at Arduino15 folder (Arduino Instalation Folder), which depends on what operating system is being used. https://support.arduino.cc/hc/en-us/articles/360018448279-Open-the-Arduino15-folder

The file is at: Arduino15 folder/packages/esp32/hardware/esp32/1.0.4/tools/sdk/include/config/sdkconfig.h

Open <sdkconfig.h> and search for CONFIG_ARDUINO_EVENT_RUNNING_CORE There should be a line like this: #define CONFIG_ARDUINO_EVENT_RUNNING_CORE 1

change it use Core 0: #define CONFIG_ARDUINO_EVENT_RUNNING_CORE 0

Recompile your sketch and now Arduino will run in Core 0 instead of Core 1.

For your information, this is used in main.cpp when creating the Arduino Task, in app_main():

extern "C" void app_main()
{
    loopTaskWDTEnabled = false;
    initArduino();
    xTaskCreateUniversal(loopTask, "loopTask", 8192, NULL, 1, &loopTaskHandle, CONFIG_ARDUINO_RUNNING_CORE);
}

Noted, thanks for the information. I was able to find the sdkconfig.h but I wondered if it was pre-compiled static file based on menuconfig settings because it mentioned not to edit anything. For now, my application uses all three UARTs but luckily one UART is using 9600 baud whereas other two are 4800 and 600. I reconfigured the UART0 to use at 9600 baud as a workaround. I will try changing the core as per your suggestion and see if this workaround is also an available option incase for my application. Looking forward to your investigation.

@gracengeer
Copy link
Author

In continuation to above issue, we tried to use a quick fix to have UART0 with 9600 baud to be able to write the log without any freezing. But it still seems 9600 baud on UART0 also not a fix after we tried to implement a timer0 based WDT reset due to unidentified issues coming in our production code. This could be simulated with the minimal sketch as follows. The freezing seem to happen immediately or after a few successful resets if the timer0 is not cleared while testing this minimal code.

//IDF version: v3.2.3-14-gd3e562907
//Arduino ESP32 version: v1.0.4
//ESP-32 @ 240MHz, 16M Flash

#include <WiFi.h>

uint32_t wdt_interrupt_reset= 710001000; //7sec
hw_timer_t *timer = NULL;

void IRAM_ATTR resetModule()
{
Serial.println("T0 Reset");
ESP.restart();
}

void setup()
{
Serial.begin(9600);
WiFi.begin("DummySSID","DummyPass"); //If this code of WiFi.begin() is removed it could reset but with SW_CPU_RESET

timer = timerBegin(0, 80, true);
timerAttachInterrupt(timer, &resetModule, true);
timerAlarmWrite(timer, wdt_interrupt_reset, false);
timerAlarmEnable(timer);
}

void loop()
{
//timerWrite(timer, 0); //reset timer (feed watchdog)
delay(1);
return;
}

Output:

Log @ 115200 baud

ets Jun 8 2016 00:22:57
rst:0x7 (TG0WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:8896
load:0x40080400,len:5816
entry 0x400806ac <-"frozen"


Log @ 9600 baud

T0 Reset
Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)
Core 0 register dump:
PC : 0x40089c35 PS : 0x00060434 A0 : 0x8008ae01 A1 : 0x3ffb5580
A2 : 0x3ffbf340 A3 : 0x0000cdcd A4 : 0xb33fffff A5 : 0x00000001
A6 : 0x00060223 A7 : 0x0000abab A8 : 0x0000abab A9 : 0x3ffb5690
A10 : 0x3ffafcbc A11 : 0x400eb77c A12 : 0x400817bc A13 : 0x00000000
A14 : 0x000000c4 A15 : 0x3ffbf504 SAR : 0x0000001e EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff

Backtrace: 0x40089c35:0x3ffb5580 0x4008adfe:0x3ffb55b0 0x4008899b:0x3ffb55d0 0x40088aa5:0x3ffb5610 0x400817a9:0x3ffb5630 0x400edfa7:0x3ffb5650 0x400eb79a:0x3ffb5670 0x400e8a5e:0x3ffb5690 0x4008f63b:0x3ffb56b0 0x40088c35:0x3ffb56f0

Core 1 register dump:
PC : 0x4008b5c6 PS : 0x00060734 A0 : 0x8008a7a3 ⸮⸮���⸮T0 Reset
Guru Meditation Error: Core 1 panic'ed (Interrupt wdt timeout on CPU1)
Core 1 register dump:
PC : 0x4008b5c6 PS : 0x00060334 A0 : 0x8008a7a3 A1 : 0x3ffbe660
A2 : 0x3ffbdac4 A3 : 0x3ffbc688 A4 : 0x00000001 A5 : 0x00000001
A6 : 0x00060323 A7 : 0x00000000 A8 : 0x3ffbc688 A9 : 0x3ffbc688
A10 : 0x00000019 A11 : 0x00000019 A12 : 0x00000001 A13 : 0x00000001
A14 : 0x00060321 A15 : 0x00000000 SAR : 0x0000001a EXCCAUSE: 0x00000006
EXCVADDR: 0x00000000 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff
Core 1 was running in ISR context:
EPC1 : 0x400e12a4 EPC2 : 0x00000000 EPC3 : 0x00000000 EPC4 : 0x4008b5c6

Backtrace: 0x4008b5c6:0x3ffbe660 0x4008a7a0:0x3ffbe680 0x40088a3b:0x3ffbe6a0 0x400d4b95:0x3ffbe6e0 0x400ec92a:0x3ffbe700 0x400e172d:0x3ffbe720 0x4008167b:0x3ffbe750 0x400d1bf7:0x3ffbe770 0x40080f69: <-"frozen"

As per above logs, it again when it is executing interrupt on Core 0 it can recover but fails on Core 1.

Though there is an option to switch core as mentioned in previous discussion, we would like to check if this has been investigated since there is no update on this report as of yet. This critical bug that we assume seem to affect many of our production devices where we cannot determine the cause of other exceptions. Also, we don't have any option to update to a higher Arduino (esp-idf) version as of now due to multiple code breaks.

@SuGlider
Copy link
Collaborator

SuGlider commented Apr 29, 2023

@gracengeer - I have tested the sketch of the comment here above.
I used the Arduino IDE with Arduino Core 2.0.8 (IDF 4.4.4).
I have set the Menu->Tools->Board Session-> Core Debug Level: to "Info" ==> This is why there are some message from WiFi.
I got no Panic or freezing.

Output (I started the Serial Monitor at 115200 and then I changed it to 9600, in order to see the sketch messages):

ets Jul 29 2019 12:21:46

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0030,len:1344
load:0x40078000,len:13924
ho 0 tail 12 room 4
load:0x40080400,len:3600
entry 0x400805f0
[  7392][W][WiFiGeneric.cpp:1057] _eventCallback(): Reason: 201 - NO_AP_FOUND
[  9808][W][WiFiGeneric.cpp:1057] _eventCallback(): Reason: 201 - NO_AP_FOUND
[ 12224][W][WiFiGeneric.cpp:1057] _eventCallback(): Reason: 201 - NO_AP_FOUND
[ 14640][W][WiFiGeneric.cpp:1057] _eventCallback(): Reason: 201 - NO_AP_FOUND

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: ESP-IDF related ESP-IDF related issues Status: Needs investigation We need to do some research before taking next steps on this issue Type: Bug 🐛 All bugs
Projects
None yet
Development

No branches or pull requests

3 participants