Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert failed: xQueueGenericSend queue.c:818 (!( ( pvItemToQueue == ((void *)0) ) && ( pxQueue->uxItemSize != ( UBaseType_t ) 0U ) )) (IDFGH-6030) #7716

Closed
RichFalk opened this issue Oct 17, 2021 · 8 comments
Assignees
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally

Comments

@RichFalk
Copy link
Contributor

RichFalk commented Oct 17, 2021

Environment

  • Development Kit: DollaTek ESP32 Type-C WiFi Bluetooth Module 4MB Development Board Based on ESP32-WROVER-B
  • Kit version (for WroverKit/PicoKit/DevKitC): N/A
  • Module or chip used: ESP32-WROVER; chip revision 1
  • IDF version (run git describe --tags to find it): v4.4-dev-3042-g220590d599 (but I checked latest code today and it still has the same problem seen in source code so isn't fixed).
  • Build System: idf.py (uses CMake)
  • Compiler version (run xtensa-esp32-elf-gcc --version to find it): 8.4.0
  • Operating System: macOS
  • Using an IDE?: No
  • Power Supply: USB

Problem Description

xRingbufferSend (and related function xRingbufferSendFromISR) in ringbuf.c incorrectly gives back the semaphore indicating the item was successfully sent (rbGET_RX_SEM_HANDLE) so ready for reading BEFORE giving back the semaphore so other tasks can send (rbGET_TX_SEM_HANDLE) so ready for writing. If a separate task reads from the Ringbuffer, returns the read item to the buffer, and then deletes the Ringbuffer because it is no longer needed, then an assertion crash can occur in xQueueGenericSend in queue.c because the deleted Ringbuffer is freed memory that may be reused and the assertion check for uxItemSize being zero may fail.

The problem does not occur when using xRingbufferSendAcquire and xRingbufferSendComplete in sequence because xRingbufferSendAcquire gives back the semaphore to other tasks (rbGET_TX_SEM_HANDLE) before xRingbufferSendComplete gives back the semaphore indicating the item was successfully sent (rbGET_RX_SEM_HANDLE). The sequence of semaphore release in xRingbufferSend and xRingbufferSendFromISR are reversed from what they should be. The incorrect sequence copied from code is the following:

    if (xReturn == pdTRUE) {
        //Indicate item was successfully sent
        xSemaphoreGive(rbGET_RX_SEM_HANDLE(pxRingbuffer));
    }
    if (xReturnSemaphore == pdTRUE) {
        xSemaphoreGive(rbGET_TX_SEM_HANDLE(pxRingbuffer));  //Give back semaphore so other tasks can send
    }

This sequence should be swapped to be the following in both xRingbufferSend and xRingbufferSendFromISR:

    if (xReturnSemaphore == pdTRUE) {
        xSemaphoreGive(rbGET_TX_SEM_HANDLE(pxRingbuffer));  //Give back semaphore so other tasks can send
    }
    if (xReturn == pdTRUE) {
        //Indicate item was successfully sent
        xSemaphoreGive(rbGET_RX_SEM_HANDLE(pxRingbuffer));
    }

Expected Behavior

Deleting a Ringbuffer after a xRingbufferReceive indicating that the Ringbuffer is no longer needed should not cause an assertion crash. It should allow the Ringbuffer deletion to occur without incident.

Actual Behavior

A vRingbufferDelete sometimes causes an assertion crash in xQueueGenericSend though is timing, task priority, and CPU affinity dependent. It is most likely to occur when the two tasks have the same task priority and CPU affinity (i.e. running on the same CPU).

Steps to reproduce

Run the sample code designed to reproduce the crash.
Fix the code in ringbuf.c and see that the crash no longer occurs.

Note that the original crash was in more complex code that used Wi-Fi and lwIP but is not necessary though to force the crash to occur for this much simplified code in this bug report I had to explicitly write to the freed memory since a new malloc would not allocate at the required memory area (though clearly lwIP code was doing just that in the original code).

Code to reproduce this issue

GIST of code to reproduce this issue

Debug Logs

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
:
:
starting app_main and waiting 4 seconds (so time to stop monitor after crash)
before RingbufferReceive
at start of func
size = 4, retcode = 100
end of app_main

assert failed: xQueueGenericSend queue.c:818 (!( ( pvItemToQueue == ((void *)0) ) && ( pxQueue->uxItemSize != ( UBaseType_t ) 0U ) ))

Backtrace:0x400818d3:0x3ffb78d00x40085bc1:0x3ffb78f0 0x4008a93d:0x3ffb7910 0x400863fe:0x3ffb7a30 0x40085995:0x3ffb7a70 0x400d48eb:0x3ffb7aa0 0x40088489:0x3ffb7ad0
0x400818d3: panic_abort at /Users/richfalk/esp/esp-idf/components/esp_system/panic.c:391

0x40085bc1: esp_system_abort at /Users/richfalk/esp/esp-idf/components/esp_system/esp_system.c:129

0x4008a93d: __assert_func at /Users/richfalk/esp/esp-idf/components/newlib/assert.c:85

0x400863fe: xQueueGenericSend at /Users/richfalk/esp/esp-idf/components/freertos/queue.c:818 (discriminator 2)

0x40085995: xRingbufferSend at /Users/richfalk/esp/esp-idf/components/esp_ringbuf/ringbuf.c:1049

0x400d48eb: func at /Users/richfalk/esp/ringbuffer_crash/main/ringbuffer_crash.c:147

0x40088489: vPortTaskWrapper at /Users/richfalk/esp/esp-idf/components/freertos/port/xtensa/port.c:159

After the code is fixed, we see the following expected output:

starting app_main and waiting 4 seconds (so time to stop monitor after crash)
before RingbufferReceive
at start of func
size = 4, retcode = 100
end of app_main
xRingbufferSend = 1
about to delete func task

Other items if possible

sdkconfig is standard default (attached with ".txt" appended so that it is attachable to this case)
elf file is also attached (as a .zip file)
no coredump (not really needed since sequence of calling is described below based on printf
sdkconfig.txt
ringbuffer_crash.elf.zip

The following shows the sequence of calls in the two tasks showing why the problem occurs.

app_main task                 func task
---------------------------   -----------------------------------------------------
xTaskCreatePinnedToCore
RingbufferReceive
                              at start of func
                              xRingbufferSend
                                  xSemaphoreGive(rbGET_RX_SEM_HANDLE(pxRingbuffer));
vRingbufferReturnItem
vRingbufferDelete
write to uxItemSize
                                  xSemaphoreGive(rbGET_TX_SEM_HANDLE(pxRingbuffer));
                                      assert on uxItemSize
                              vTaskDelete (doesn't get to here due to assertion)

After bug is fixed the sequence will be the following instead:

app_main task                 func task
---------------------------   -----------------------------------------------------
xTaskCreatePinnedToCore
RingbufferReceive
                              at start of func
                              xRingbufferSend
                                  xSemaphoreGive(rbGET_TX_SEM_HANDLE(pxRingbuffer));
                                  xSemaphoreGive(rbGET_RX_SEM_HANDLE(pxRingbuffer));
vRingbufferReturnItem
vRingbufferDelete
write to uxItemSize
                              vTaskDelete

Remember that in the above the "write to uxItemSize" is just to force simulate
other activity in the system that will allocate and write to memory that was
previously freed.

@espressif-bot espressif-bot added the Status: Opened Issue is new label Oct 17, 2021
@github-actions github-actions bot changed the title assert failed: xQueueGenericSend queue.c:818 (!( ( pvItemToQueue == ((void *)0) ) && ( pxQueue->uxItemSize != ( UBaseType_t ) 0U ) )) assert failed: xQueueGenericSend queue.c:818 (!( ( pvItemToQueue == ((void *)0) ) && ( pxQueue->uxItemSize != ( UBaseType_t ) 0U ) )) (IDFGH-6030) Oct 17, 2021
RichFalk added a commit to RichFalk/esp-idf that referenced this issue Oct 18, 2021
The release of the semaphore indicating the item was successfully sent must be the last semaphore released.  The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer.

The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR.  This commit fixes this order.

Issue (IDFGH-6030) espressif#7716 describes the problem in more detail.
@Dazza0 Dazza0 self-assigned this Oct 18, 2021
@sudeep-mohanty sudeep-mohanty self-assigned this Oct 20, 2021
@sudeep-mohanty
Copy link
Collaborator

Hi @RichFalk, Comments added to #7721 that explain how this is not a ringbuf or queue issue. Thanks!

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels Oct 22, 2021
@RichFalk
Copy link
Contributor Author

RichFalk commented Oct 22, 2021

Hi @sudeep-mohanty, comments added to #7721 explain that it is a ringbuf issue. xRingbufferSend should work identically in logic sequence as xRingbufferSendAcquire followed by xRingbufferSendComplete. The xRingbufferSendAcquire takes and gives a TX semaphore. The xRingbufferSendComplete gives a RX semaphore (the caller in another task typically took the RX semaphore earlier via xRingbufferReceive).

The xRingbufferSend is only a convenience function that does the memory copy of the object itself rather than having the caller do that. The xRingbufferSend is currently incorrectly releasing RX before TX.

@espressif-bot espressif-bot added Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally labels Oct 28, 2021
@sudeep-mohanty sudeep-mohanty removed the Status: In Progress Work is in progress label Oct 28, 2021
espressif-bot pushed a commit that referenced this issue Nov 3, 2021
The release of the semaphore indicating the item was successfully sent must be the last semaphore released.  The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer.

The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR.  This commit fixes this order.

Issue (IDFGH-6030) #7716 describes the problem in more detail.

Closes IDFGH-6030, #7716
Closes IDFGH-6036, #7721
@Alvin1Zhang
Copy link
Collaborator

Thanks for your reporting, fix is available at 1222f6d, feel free to reopen.

@AxelLin
Copy link
Contributor

AxelLin commented Nov 4, 2021

Need to backport for stable branches.

espressif-bot pushed a commit that referenced this issue Dec 9, 2021
The release of the semaphore indicating the item was successfully sent must be the last semaphore released.  The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer.

The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR.  This commit fixes this order.

Issue (IDFGH-6030) #7716 describes the problem in more detail.

Closes IDFGH-6030, #7716
Closes IDFGH-6036, #7721
@RichFalk
Copy link
Contributor Author

RichFalk commented Dec 22, 2021

Just adding a related use case where this bug gets triggered. If one has a worker task where one uses a Ringbuffer to communicate with it to give it data and also to signal to the task that it should delete itself as well as the Ringbuffer, then this will fail on the xRingbufferSend call. This is most likely to occur if the two tasks are pinned to the same core and if the worker task is at a higher priority as would be the most typical case.

Unfortunately, the fix is not in v4.4-beta1. Why is that? While there is a simple workaround by using xRingbufferSendAcquire and xRingbufferSendComplete or by modifying the source ESP-IDF code, this is such a simple and safe fix that I believe it should be put into v4.4.

I have 8 other bugs I've found that I plan to report in various areas (panic using esp_event_loop_delete_default; delayed esp-timer over light sleep loses time sync; "fill" in linker coming after *_start symbol when compiling with default optimization; light and deep sleep do not work with 8.5 MHz clock source; setting logging level to debug causes abort from deep sleep; second bootloader releases hold on all GPIO pins causing glitches; first deep sleep after power on or software restart wakes up immediately; Ringbuffer needs to be at least 9 bytes larger than the documented calculated size or one gets an error on send) and for those I can I will provide fixes, but if they are not going to get into actual releases then what's the point of reporting them here?

@igrr
Copy link
Member

igrr commented Dec 23, 2021

Hi @RichFalk, the fix was merged into release/v4.4 branch slightly later than the testing of v4.4-beta1 was started. Subsequently we have tagged -beta1 on the commit which QA was testing, not the tip of the release branch. This fix commit along with other fixes which were merged later will be included into the next pre-release, v4.4-rc1, which is being tested now.
Sorry for the inconvenience this caused!
Issue reports are always appreciated, if we can't fix something in v4.4 we can still include the fix in v4.4.1, for instance.

@AxelLin
Copy link
Contributor

AxelLin commented Dec 23, 2021

Just remind that v4.3.2 does not include this fix.

@igrr
Copy link
Member

igrr commented Dec 23, 2021

Thanks for the reminder @AxelLin, it is merged now, should appear on Github soon and will be part of v4.3.3.

espressif-bot pushed a commit that referenced this issue Dec 24, 2021
The release of the semaphore indicating the item was successfully sent must be the last semaphore released.  The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer.

The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR.  This commit fixes this order.

Issue (IDFGH-6030) #7716 describes the problem in more detail.

Closes IDFGH-6030, #7716
Closes IDFGH-6036, #7721
espressif-bot pushed a commit that referenced this issue Dec 25, 2021
The release of the semaphore indicating the item was successfully sent must be the last semaphore released.  The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer.

The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR.  This commit fixes this order.

Issue (IDFGH-6030) #7716 describes the problem in more detail.

Closes IDFGH-6030, #7716
Closes IDFGH-6036, #7721
espressif-bot pushed a commit that referenced this issue Dec 26, 2021
The release of the semaphore indicating the item was successfully sent must be the last semaphore released.  The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer.

The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR.  This commit fixes this order.

Issue (IDFGH-6030) #7716 describes the problem in more detail.

Closes IDFGH-6030, #7716
Closes IDFGH-6036, #7721
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

7 participants