Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TW#27664] esp_restart fails to work properly if hardware timers are used #2756

Closed
X-Ryl669 opened this issue Nov 28, 2018 · 17 comments
Closed

Comments

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Nov 28, 2018

Using lastest master as of writing, whenever I've started a timer with an installed interruption (in fact, I'm using 3 HW timers as I need precise ~90µs wake up) and call esp_restart, the system crash upon restart.

The callstack contains my timer's ISR routine, and it crashes when it's trying to send an event to a FreeRTOS queue within ISR.
It seems the timer fired after the restart but before FreeRTOS was started correctly.

I've tried to disable all the timer's interrupt (via timer_disable_intr) and this does not make any change (and reading esp_restart_noos code, that's what you're doing anyway).
I wonder if, when the system is starting, if timer interrupt are re-enabled before the RTOS is correctly initialized, this would explain the behavior ?

If I pause all the timer before restarting, it restarts correctly for few seconds, but crash later on in double exception vector, (located in FreeRTOS's assembly, without a backtrace in that case).

Restarting via the serial link always work correctly, so I wonder if there is a way to restart the complete CPU correctly (both core at the same time, with clean state), instead of having to deal with both core independently, and with left over peripherals messing things up ?

BTW, my code is too big now to isolate a working example.

@Alvin1Zhang Alvin1Zhang changed the title esp_restart fails to work properly if hardware timers are used [TW#27664] esp_restart fails to work properly if hardware timers are used Nov 28, 2018
@igrr
Copy link
Member

igrr commented Nov 28, 2018

@X-Ryl669 Could you please post the log output, including restart messages, panic handler output and the subsequent backtraces, ideally decoded through idf_monitor? Also, can you say whether you are calling timer_isr_register on CPU0 or CPU1, and whether you are calling esp_restart on CPU0 or CPU1? Thanks.

@X-Ryl669
Copy link
Contributor Author

With timer_pause

Rebooting!
I (23214) wifi: state: run -> init (0)
I (23214) wifi: pm stop, total sleep time: lu us / lu us

I (23214) wifi: n:6 0, o:6 0, ap:255 255, sta:6 0, prof:1
I (23234) wifi: flush txq
I (23234) wifi: stop sw txq
I (23234) wifi: lmac stop hw txq
E (23234) wifi: esp_wifi_connect 947 wifi not start
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:6480
ho 0 tail 12 room 4
load:0x40078000,len:9904
load:0x40080400,len:6404
entry 0x40080738
I (30) boot: ESP-IDF v3.2-dev-1602-g4d021d7 2nd stage bootloader
I (31) boot: compile time 23:42:56
I (31) boot: Enabling RNG early entropy source...
I (37) boot: SPI Speed      : 40MHz
I (41) boot: SPI Mode       : DIO
I (45) boot: SPI Flash Size : 4MB
I (49) boot: Partition Table:
I (53) boot: ## Label            Usage          Type ST Offset   Length
I (60) boot:  0 nvs              WiFi data        01 02 00009000 00004000
I (67) boot:  1 otadata          OTA data         01 00 0000d000 00002000
I (75) boot:  2 phy_init         RF data          01 01 0000f000 00001000
I (82) boot:  3 factory          factory app      00 00 00010000 000f0000
I (90) boot:  4 ota_0            OTA app          00 10 00100000 00100000
I (97) boot:  5 ota_1            OTA app          00 11 00200000 00100000
I (105) boot:  6 storage          Unknown data     01 82 00300000 00100000
I (112) boot: End of partition table
I (117) esp_image: segment 0: paddr=0x00100020 vaddr=0x3f400020 size=0x21184 (135556) map
I (172) esp_image: segment 1: paddr=0x001211ac vaddr=0x3ff80000 size=0x00000 (     0) load
I (173) esp_image: segment 2: paddr=0x001211b4 vaddr=0x3ff80000 size=0x00000 (     0) load
I (179) esp_image: segment 3: paddr=0x001211bc vaddr=0x3ffbdb60 size=0x03574 ( 13684) load
I (193) esp_image: segment 4: paddr=0x00124738 vaddr=0x3ffc10d4 size=0x00000 (     0) load
I (197) esp_image: segment 5: paddr=0x00124740 vaddr=0x40080000 size=0x00400 (  1024) load
0x40080000: _WindowOverflow4 at /Users/cyril/esp/esp-idf/components/freertos/xtensa_vectors.S:1685

I (206) esp_image: segment 6: paddr=0x00124b48 vaddr=0x40080400 size=0x0b4c8 ( 46280) load
I (233) esp_image: segment 7: paddr=0x00130018 vaddr=0x400d0018 size=0x713fc (463868) map
0x400d0018: _stext at ??:?

I (394) esp_image: segment 8: paddr=0x001a141c vaddr=0x4008b8c8 size=0x065e8 ( 26088) load
0x4008b8c8: ram_set_txcap_reg at ??:?

I (405) esp_image: segment 9: paddr=0x001a7a0c vaddr=0x400c0000 size=0x00000 (     0) load
I (405) esp_image: segment 10: paddr=0x001a7a14 vaddr=0x50000200 size=0x00000 (     0) load
I (412) esp_image: segment 11: paddr=0x001a7a1c vaddr=0x50000200 size=0x00000 (     0) load
I (432) boot: Loaded app from partition at offset 0x100000
I (432) boot: Disabling RNG early entropy source...
I (433) cpu_start: Pro cpu up.
I (436) cpu_start: Starting app cpu, entry point is 0x400811ac
0x400811ac: call_start_cpu1 at /Users/cyril/esp/esp-idf/components/esp32/cpu_start.c:249

I (428) cpu_start: App cpu up.
I (447) heap_init: Initializing. RAM available for dynamic allocation:
I (454) heap_init: At 3FFAE6E0 len 0000F480 (61 KiB): DRAM
I (460) heap_init: At 3FFD8930 len 000076D0 (29 KiB): DRAM
I (466) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (473) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (479) heap_init: At 40091EB0 len 0000E150 (56 KiB): IRAM
I (485) cpu_start: Pro cpu start user code
I (57) esp_core_dump: Init core dump to UART
E (58) esp_core_dump: No core dump partition found!
I (58) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
Opened NVS correctly 0x1
Loaded calibration cppb(a:0.999155, b:6.72443)
Starting up!
I (145) wifi: wifi driver task: 3ffb7c28, prio:23, stack:3072, core=0
I (145) wifi: wifi firmware version: fbe0c49
I (145) wifi: config NVS flash: enabled
I (145) wifi: config nano formating: enabled
I (155) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (165) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (185) wifi: Init dynamic tx buffer num: 32
I (185) wifi: Init data frame dynamic rx buffer num: 32
I (185) wifi: Init management frame dynamic rx buffer num: 32
I (185) wifi: Init static rx buffer size: 1600
I (195) wifi: Init static rx buffer num: 10
I (195) wifi: Init dynamic rx buffer num: 32
WIFI started!
I (205) gpio: GPIO[22]| InputEn: 1| OutputEn: 0| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0
A Station config is present with ssid XXXX
I (215) wifi: Set ps type: 0

E (215) wifi: esp_wifi_disconnect 955 wifi not start
I (295) phy: phy_version: 4000, b6198fa, Sep  3 2018, 15:11:06, 0, 0
I (295) wifi: mode : sta (30:ae:a4:8d:8f:4c)
Started station for connection to XXXX
I (1015) wifi: n:6 0, o:1 0, ap:255 255, sta:6 0, prof:1
I (1575) wifi: state: init -> auth (b0)
I (1585) wifi: state: auth -> assoc (0)
I (1585) wifi: state: assoc -> run (10)
I (2615) wifi: connected with XXXX, channel 6
I (2615) wifi: pm start, type: 0

I (2625) event: sta ip: 192.168.0.97, mask: 255.255.255.0, gw: 192.168.0.254
Station is connected
I (10295) httpd-freertos: address 0.0.0.0, port 80, maxConnections 16, mode non-ssl
I (10295) httpd-freertos: init
I (10295) httpd-freertos: starting server on port port 80, maxConnections 16, mode non-ssl
I (10295) httpd-freertos: esphttpd: active and listening to connections on 0.0.0.0
I (10305) httpd-freertos: listening for new connections on '0.0.0.0'
SNTP client started
Initial boot time: 2018/11/28 22:11:27
mDNS started
Time updated
Current time: 2018/11/28 22:11:27
Guru Meditation Error: Core  0 panic'ed (Double exception)
Core 0 register dump:
PC      : 0x400803c0  PS      : 0x00060436  A0      : 0x00000016  A1      : 0x3ffb2150
0x400803c0: _DoubleExceptionVector at /Users/cyril/esp/esp-idf/components/freertos/xtensa_vectors.S:512

A2      : 0x3ffb21c0  A3      : 0x00000000  A4      : 0x3ffc21e8  A5      : 0x3ffc21e0
A6      : 0x00002aaa  A7      : 0x3ffb21f0  A8      : 0x80001853  A9      : 0x00000000
A10     : 0x3ffb23cc  A11     : 0x3ffb2210  A12     : 0x00000000  A13     : 0x00000001
A14     : 0x00000aaa  A15     : 0x3ffb21d0  SAR     : 0x0000001a  EXCCAUSE: 0x00000002
EXCVADDR: 0x00000000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0x00000000

Backtrace: 0x400803c0:0x3ffb2150 0x00000013:0x3ffb2210 0x400803bd:0x3ffb20f0 0x4008e871:0x00000000
0x400803c0: _DoubleExceptionVector at /Users/cyril/esp/esp-idf/components/freertos/xtensa_vectors.S:512

0x400803bd: _UserExceptionVector at ??:?

0x4008e871: pvTaskIncrementMutexHeldCount at /Users/cyril/esp/esp-idf/components/freertos/tasks.c:4601


Entering gdb stub now.
$T0b#e6GNU gdb (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-build_apple-darwin16.3.0 --target=xtensa-esp32-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /Users/cyril/esp/nvse_app/build/nvse.elf...done.
Remote debugging using /dev/cu.SLAB_USBtoUART
_DoubleExceptionVector () at /Users/cyril/esp/esp-idf/components/freertos/xtensa_vectors.S:512
512	    break   1, 4                            /* unhandled double exception */
(gdb) bt
#0  _DoubleExceptionVector () at /Users/cyril/esp/esp-idf/components/freertos/xtensa_vectors.S:512
#1  0x40000016 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

With no timer pausing

Rebooting!
I (23634) wifi: state: run -> init (0)
I (23634) wifi: pm stop, total sleep time: lu us / lu us

I (23634) wifi: n:6 0, o:6 0, ap:255 255, sta:6 0, prof:1
I (23654) wifi: flush txq
I (23654) wifi: stop sw txq
I (23654) wifi: lmac stop hw txq
E (23654) wifi: esp_wifi_connect 947 wifi not start
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:6480
ho 0 tail 12 room 4
load:0x40078000,len:9904
load:0x40080400,len:6404
entry 0x40080738
I (30) boot: ESP-IDF v3.2-dev-1602-g4d021d7 2nd stage bootloader
I (31) boot: compile time 23:42:56
I (31) boot: Enabling RNG early entropy source...
I (37) boot: SPI Speed      : 40MHz
I (41) boot: SPI Mode       : DIO
I (45) boot: SPI Flash Size : 4MB
I (49) boot: Partition Table:
I (53) boot: ## Label            Usage          Type ST Offset   Length
I (60) boot:  0 nvs              WiFi data        01 02 00009000 00004000
I (67) boot:  1 otadata          OTA data         01 00 0000d000 00002000
I (75) boot:  2 phy_init         RF data          01 01 0000f000 00001000
I (82) boot:  3 factory          factory app      00 00 00010000 000f0000
I (90) boot:  4 ota_0            OTA app          00 10 00100000 00100000
I (97) boot:  5 ota_1            OTA app          00 11 00200000 00100000
I (105) boot:  6 storage          Unknown data     01 82 00300000 00100000
I (112) boot: End of partition table
I (117) esp_image: segment 0: paddr=0x00200020 vaddr=0x3f400020 size=0x21184 (135556) map
I (172) esp_image: segment 1: paddr=0x002211ac vaddr=0x3ff80000 size=0x00000 (     0) load
I (173) esp_image: segment 2: paddr=0x002211b4 vaddr=0x3ff80000 size=0x00000 (     0) load
I (179) esp_image: segment 3: paddr=0x002211bc vaddr=0x3ffbdb60 size=0x03574 ( 13684) load
I (193) esp_image: segment 4: paddr=0x00224738 vaddr=0x3ffc10d4 size=0x00000 (     0) load
I (197) esp_image: segment 5: paddr=0x00224740 vaddr=0x40080000 size=0x00400 (  1024) load
0x40080000: _WindowOverflow4 at /Users/cyril/esp/esp-idf/components/freertos/xtensa_vectors.S:1685

I (206) esp_image: segment 6: paddr=0x00224b48 vaddr=0x40080400 size=0x0b4c8 ( 46280) load
I (233) esp_image: segment 7: paddr=0x00230018 vaddr=0x400d0018 size=0x713cc (463820) map
0x400d0018: _stext at ??:?

I (394) esp_image: segment 8: paddr=0x002a13ec vaddr=0x4008b8c8 size=0x065e8 ( 26088) load
0x4008b8c8: ram_set_txcap_reg at ??:?

I (405) esp_image: segment 9: paddr=0x002a79dc vaddr=0x400c0000 size=0x00000 (     0) load
I (405) esp_image: segment 10: paddr=0x002a79e4 vaddr=0x50000200 size=0x00000 (     0) load
I (412) esp_image: segment 11: paddr=0x002a79ec vaddr=0x50000200 size=0x00000 (     0) load
I (432) boot: Loaded app from partition at offset 0x200000
I (432) boot: Disabling RNG early entropy source...
I (433) cpu_start: Pro cpu up.
I (436) cpu_start: Starting app cpu, entry point is 0x400811ac
0x400811ac: call_start_cpu1 at /Users/cyril/esp/esp-idf/components/esp32/cpu_start.c:249

I (428) cpu_start: App cpu up.
I (447) heap_init: Initializing. RAM available for dynamic allocation:
I (454) heap_init: At 3FFAE6E0 len 0000F480 (61 KiB): DRAM
I (460) heap_init: At 3FFD8930 len 000076D0 (29 KiB): DRAM
I (466) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (473) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (479) heap_init: At 40091EB0 len 0000E150 (56 KiB): IRAM
I (485) cpu_start: Pro cpu start user code
/Users/cyril/esp/esp-idf/components/freertos/queue.c:1156 (xQueueGenericSendFromISR)- assert failed!
abort() was called at PC 0x4008cfe4 on core 0
0x4008cfe4: xQueueGenericSendFromISR at /Users/cyril/esp/esp-idf/components/freertos/queue.c:2037


Backtrace: 0x4008f8bf:0x3ffe3970 0x4008fb8d:0x3ffe3990 0x4008cfe4:0x3ffe39b0 0x4008ec77:0x3ffe39e0 0x40083911:0x3ffe3a10 0x400849e5:0x3ffe3a40 0x4008f0aa:0x0000e150
0x4008f8bf: invoke_abort at /Users/cyril/esp/esp-idf/components/esp32/panic.c:680

0x4008fb8d: abort at /Users/cyril/esp/esp-idf/components/esp32/panic.c:680

0x4008cfe4: xQueueGenericSendFromISR at /Users/cyril/esp/esp-idf/components/freertos/queue.c:2037

0x4008ec77: xTimerPendFunctionCallFromISR at /Users/cyril/esp/esp-idf/components/freertos/timers.c:961

0x40083911: timerISR(void*) at /Users/cyril/esp/nvse_app/main/Timer.cpp:103

0x400849e5: _xt_lowint1 at /Users/cyril/esp/esp-idf/components/freertos/xtensa_vectors.S:1105

0x4008f0aa: xt_ints_on at /Users/cyril/esp/esp-idf/components/freertos/xtensa_intr_asm.S:163


Entering gdb stub now.
$T0b#e6GNU gdb (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-build_apple-darwin16.3.0 --target=xtensa-esp32-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /Users/cyril/esp/nvse_app/build/nvse.elf...done.
Remote debugging using /dev/cu.SLAB_USBtoUART
0x4008f8bf in invoke_abort () at /Users/cyril/esp/esp-idf/components/esp32/panic.c:155
155	        *((int *) 0) = 0;
(gdb) bt
#0  0x4008f8bf in invoke_abort () at /Users/cyril/esp/esp-idf/components/esp32/panic.c:155
#1  0x4008fb90 in abort () at /Users/cyril/esp/esp-idf/components/esp32/panic.c:170
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Code for registering the timer is:

timer_isr_register(x, y, timerISR, this, ESP_INTR_FLAG_IRAM, NULL);

I've not pinned the http server task to any cpu, so I don't know which one's calling esp_restart

@igrr
Copy link
Member

igrr commented Nov 29, 2018

Is it possible that timer_isr_register is called from a class constructor in your code, and that there is a global instance of this class being created?

In IDF, global constructors are called before FreeRTOS scheduler has started.

It seems like timerISR gets called while xTimerQueue == NULL, which is only possible before the scheduler has started. If this is the case (first ISR is being invoked before scheduler has started) you may need to guard against calling xTimerPendFunctionCallFromISR in your interrupt handler if xTaskGetSchedulerState() returns taskSCHEDULER_NOT_STARTED.

@X-Ryl669
Copy link
Contributor Author

X-Ryl669 commented Nov 29, 2018

It works when powered up regularly.
How would it work in regular startup then ?

The code of the TimerISR is this:

#define SplitID(timerID) (timer_group_t)(timerID >> 1), (timer_idx_t)(timerID & 1)

static EventGroupHandle_t & IRAM_ATTR getTimersEventGroup()
{
    static DRAM_ATTR EventGroupHandle_t timers = xEventGroupCreate();
    return timers;
}

static void IRAM_ATTR timerISR(void * param)
{
    Timer * timer = (Timer*)param;
    uint32 timerID = timer->timerID;

    // Retrieve the interrupt status from the timer that reported the interrupt
    if (timerID & 2)
    {
        uint32 intStatus = TIMERG1.int_st_timers.val;
        TIMERG1.hw_timer[timerID & 1].update = 1;

        if (intStatus & (1<<(timerID & 1)))
        {
            if (timerID & 1) TIMERG1.int_clr_timers.t1 = 1;
            else TIMERG1.int_clr_timers.t0 = 1;
        }
        if (timer->autoReload) TIMERG1.hw_timer[timerID & 1].config.alarm_en = TIMER_ALARM_EN;
    }
    else
    {
        uint32 intStatus = TIMERG0.int_st_timers.val;
        TIMERG0.hw_timer[timerID & 1].update = 1;
        if (intStatus & (1<<(timerID & 1)))
        {
            if (timerID & 1) TIMERG0.int_clr_timers.t1 = 1;
            else TIMERG0.int_clr_timers.t0 = 1;
        }
        if (timer->autoReload) TIMERG0.hw_timer[timerID & 1].config.alarm_en = TIMER_ALARM_EN;
    }
    if (timer->eventID) EventLoop::postEventFromISR(timer->eventID, timer->taskHandle);
    else
    {
        BaseType_t higherPriorityTaskWoken = pdFALSE;
        xEventGroupSetBitsFromISR(getTimersEventGroup(), (1UL<<timerID), &higherPriorityTaskWoken);
        if (higherPriorityTaskWoken == pdTRUE) portYIELD_FROM_ISR();
    }
}

The part that fails is in the EventLoop::postEventFromISR which is like this:


    static QueueHandle_t & getEventLoopQ()
    {
        static uint32 pending[8];
        static StaticQueue_t q;
        static QueueHandle_t handle = xQueueCreateStatic(ArrSz(pending), sizeof(pending), (uint8*)pending, &q);
        return handle;
    }
/** Post an event from an interrupt */
    static bool postEventFromISR(uint32 eventID, void * taskHandle = 0)
    {
        BaseType_t otherTaskWoken = pdFALSE;
        if (pdPASS != xQueueSendToBackFromISR(getEventLoopQ(), &eventID, &otherTaskWoken))
            return false;
        if (otherTaskWoken == pdTRUE) portYIELD_FROM_ISR();
        return true;
    }

So it's not a global variable (I'm using singleton), it's created on first call (getEventLoopQ and getTimersEventGroup).

The timers are created in app_main(), so I guess this is called after FreeRTOS is started, right ?

The issue I experience is mainly that the timer is fired on reboot while it's not yet "created" (although
it's still correctly redirected to the right ISR!). How is it possible ?

Also, what is the second "double exception vector" that happens way after the application is started when the timers are paused on reboot ?

@igrr
Copy link
Member

igrr commented Nov 29, 2018

        if (pdPASS != xQueueSendToBackFromISR(getEventLoopQ(), &eventID, &otherTaskWoken))

With such code, are you sure that xQueueCreateStatic does not get called from ISR context?

The issue I experience is mainly that the timer is fired on reboot while it's not yet "created" (although
it's still correctly redirected to the right ISR!). How is it possible ?

I can't explain this part (how it is redirected to the right ISR before the ISR has been registered). The only possible explanation was that the ISR was registered from a global constructor, but if this isn't the case, that's puzzling. Do you think you can reduce the application to something that still reproduces the issue and that you might be able to share privately?

@X-Ryl669
Copy link
Contributor Author

X-Ryl669 commented Nov 29, 2018

With such code, are you sure that xQueueCreateStatic does not get called from ISR context?

I'm not sure, that's the issue. As I understand it, the ISR shouldn't be called while app_main is not called first. It's clearly not the case, when the system (re)boot from scratch or from UART.
However, when I call esp_restart without stopping the timers, the timer fires during startup, (this is the main issue), while app_main is not called yet, so in that case, yes, xQueueCreateStatic will be called from ISR context. As I understand it, this should not happen logically if the timer wasn't still present/configured/running.

Side question: What is the first entry point I can override (before FreeRTOS is running) so I can dump the HW timer's register to check what's left or not after the reset ?

Also, if I stop the timers, it does not happen or it's not visible it does (the app starts, but crash later on when the SNTP client sets the time).
Can you guarantee nothing is touching the HW timer registers during startup (or while the bootloader is running) ? Is there something that would re-enable the timer's ISR in the startup code (since esp_restart_noos does disable them) ?
Is is possible that timer_(en/dis)able_intr modifies more than it should and enable other timer than expected ?

I'll try to strip this part to a simple app I can share. I'll need few days to make this.

@igrr
Copy link
Member

igrr commented Nov 29, 2018

Can you guarantee nothing is touching the HW timer registers during startup (or while the bootloader is running) ? Is there something that would re-enable the timer's ISR in the startup code (since esp_restart_noos does disable them) ?

esp_restart_noos does not touch the timer (at least, "timer group 0/1" timers), so the timer keeps running while the CPUs are starting up. And it probably sets its interrupt flag while the CPUs are starting up.

However this (theoretically) should not matter, since the CPU interrupt input for respective interrupt is only unmasked when esp_intr_alloc is called. So if you only initialize the timers in app_main, esp_intr_alloc should not be called until app_main.

There is a way to check this, though. First, set log level to "Debug" in menuconfig. Note the message similar to Connected src %d to int %d (cpu %d), where src will be 14/15 for TG0 or 18/19 for TG1. Then add something like

if (esp_reset_reason() == ESP_RST_SW && source == <value you have seen>) { abort(); }

at the start of esp_intr_alloc.
This should generate a panic when esp_intr_alloc is called for this interrupt source, after restart. Then we will see the backtrace and will see where this calls is coming from.

@X-Ryl669
Copy link
Contributor Author

X-Ryl669 commented Nov 29, 2018

Looking at the esp_restart_noos code, particularly this:

// Reset timer/spi/uart
    DPORT_SET_PERI_REG_MASK(DPORT_PERIP_RST_EN_REG,
            DPORT_TIMERS_RST | DPORT_SPI01_RST | DPORT_UART_RST);
DPORT_REG_WRITE(DPORT_PERIP_RST_EN_REG, 0);

You're not touching BIT15, Timer Group1 and BIT13, Timer Group0, only BIT0, Timers. Is it enough to reset all the timers registers to default value ?

@X-Ryl669
Copy link
Contributor Author

X-Ryl669 commented Nov 29, 2018

Ok, I'll try what you're asking to figure this out.

As a side note, if I ask to reset the timer group like this:

DPORT_SET_PERI_REG_MASK(DPORT_PERIP_RST_EN_REG,
            DPORT_TIMERGROUP1_RST | DPORT_TIMERGROUP0_RST);
DPORT_REG_WRITE(DPORT_PERIP_RST_EN_REG, 0);

before restarting, it should actually clean the HW timers's registers, right ?

@X-Ryl669
Copy link
Contributor Author

X-Ryl669 commented Nov 29, 2018

Just thought a bit more about this, that starts to make sense.

My Timer constructor is like this:

(static) uint8 Timer::usedTimers = 0;
Timer::Timer() : call(0), taskHandle(0), eventID(0), timerID(0), autoReload(false)
{
    // Need to find a timer to use
    while ((usedTimers & (1UL<<timerID)) && timerID < 4) timerID++;
    if (timerID == 4)
    {
        Log(Logger::Error|Logger::System, "Not enough HW timer to run");
        return;
    }

    usedTimers |= (1UL<<timerID);
    timer_config_t config = {};
    config.divider     = 8; // 10MHz base clock
    config.counter_dir = TIMER_COUNT_UP;
    config.counter_en  = TIMER_PAUSE;
    config.alarm_en    = TIMER_ALARM_EN;
    config.intr_type   = TIMER_INTR_LEVEL;
    config.auto_reload = TIMER_AUTORELOAD_DIS;
    timer_init(SplitID(timerID), &config);
    timer_set_counter_value(SplitID(timerID), 0x00000000UL);
    timer_enable_intr(SplitID(timerID));
    timer_isr_register(SplitID(timerID), timerISR, this, ESP_INTR_FLAG_IRAM, NULL);
}

The timer is not started when this is called, since there is no timer_start there. I was expecting the timers are not running when the system is booted.

However, if, when calling this the timer is still running like you're saying (because the restart did not stop it), then it can happen that it fires right there after timer_isr_register before the control is given back to app_main

Logically, I have this in my app_main:

app_main()
{
     Timer rgbLedPulse;
     [... some lengthy code here for built in tests ...]
      // Start event loop here, which somehow performs:
     (void)EventLoop::getEventLoopQ();
      // start the timer now
      rgbLedPulse.start(200ms);
}

If the timer is running here before it's really started in the code, then yes, that's the issue, since it'll trigger before EventLoop queue is created. Shouldn't have timer_init stopped the timer ?

@X-Ryl669
Copy link
Contributor Author

X-Ryl669 commented Nov 30, 2018

Ok, I've just tested your idea, and it happens where it should (within app_main).

I've checked that timer_init does not clear a pending interrupt (so if the timer triggers during restart, and the interrupt were disabled (as they are in esp_restart_noop), as soon as it's enabled, the ISR is called even if the timer is "initialized" via timer_init as not running)

IMHO it's a bug. timer_init should disable the interrupts and clear the pending one to avoid leftover from the previous boot/restart period.

I've tried to reset the HW timer (via DPORT_PERIP_RST_EN_REG) just before restarting and it's working well.

The other bug I'm mentionning (double exception) is not due to timer (I've more informations), so I'll open another issue for this to avoid confusion.

@igrr
Copy link
Member

igrr commented Nov 30, 2018

As a side note, if I ask to reset the timer group like this ... before restarting, it should actually clean the HW timers's registers, right ?

Yes, correct.

timer_init should disable the interrupts and clear the pending one to avoid leftover from the previous boot/restart period.

I agree, this would make sense.

@koobest
Copy link
Contributor

koobest commented Dec 3, 2018

Hi, @X-Ryl669
Since we don't have an example to reproduce this issue, could you please help us test this fix? The attachment is timer.c, I am not sure if it can solve your problems.
timer.tar.gz

Thanks in advance :-)

@X-Ryl669
Copy link
Contributor Author

X-Ryl669 commented Dec 3, 2018

Thanks, I'll let you know tomorrow after testing.

@koobest
Copy link
Contributor

koobest commented Dec 10, 2018

Hi, @X-Ryl669
Would you please synchronize your progress to us ?
thank !!

@X-Ryl669
Copy link
Contributor Author

Well, sorry, I was busy with my other development. I've made a fix on my own before and I need to revert and use yours to check if it's working. I'll let you know ASAP.

@X-Ryl669
Copy link
Contributor Author

Ok, your code is working for me.
Thanks!

@igrr igrr closed this as completed in e22c0cc Dec 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@X-Ryl669 @igrr @koobest and others