-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assert failed: xQueueGenericSend queue.c:818 (!( ( pvItemToQueue == ((void *)0) ) && ( pxQueue->uxItemSize != ( UBaseType_t ) 0U ) )) (IDFGH-6030) #7716
Comments
The release of the semaphore indicating the item was successfully sent must be the last semaphore released. The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer. The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR. This commit fixes this order. Issue (IDFGH-6030) espressif#7716 describes the problem in more detail.
Hi @sudeep-mohanty, comments added to #7721 explain that it is a ringbuf issue. xRingbufferSend should work identically in logic sequence as xRingbufferSendAcquire followed by xRingbufferSendComplete. The xRingbufferSendAcquire takes and gives a TX semaphore. The xRingbufferSendComplete gives a RX semaphore (the caller in another task typically took the RX semaphore earlier via xRingbufferReceive). The xRingbufferSend is only a convenience function that does the memory copy of the object itself rather than having the caller do that. The xRingbufferSend is currently incorrectly releasing RX before TX. |
The release of the semaphore indicating the item was successfully sent must be the last semaphore released. The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer. The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR. This commit fixes this order. Issue (IDFGH-6030) #7716 describes the problem in more detail. Closes IDFGH-6030, #7716 Closes IDFGH-6036, #7721
Thanks for your reporting, fix is available at 1222f6d, feel free to reopen. |
Need to backport for stable branches. |
The release of the semaphore indicating the item was successfully sent must be the last semaphore released. The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer. The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR. This commit fixes this order. Issue (IDFGH-6030) #7716 describes the problem in more detail. Closes IDFGH-6030, #7716 Closes IDFGH-6036, #7721
Just adding a related use case where this bug gets triggered. If one has a worker task where one uses a Ringbuffer to communicate with it to give it data and also to signal to the task that it should delete itself as well as the Ringbuffer, then this will fail on the xRingbufferSend call. This is most likely to occur if the two tasks are pinned to the same core and if the worker task is at a higher priority as would be the most typical case. Unfortunately, the fix is not in v4.4-beta1. Why is that? While there is a simple workaround by using xRingbufferSendAcquire and xRingbufferSendComplete or by modifying the source ESP-IDF code, this is such a simple and safe fix that I believe it should be put into v4.4. I have 8 other bugs I've found that I plan to report in various areas (panic using esp_event_loop_delete_default; delayed esp-timer over light sleep loses time sync; "fill" in linker coming after *_start symbol when compiling with default optimization; light and deep sleep do not work with 8.5 MHz clock source; setting logging level to debug causes abort from deep sleep; second bootloader releases hold on all GPIO pins causing glitches; first deep sleep after power on or software restart wakes up immediately; Ringbuffer needs to be at least 9 bytes larger than the documented calculated size or one gets an error on send) and for those I can I will provide fixes, but if they are not going to get into actual releases then what's the point of reporting them here? |
Hi @RichFalk, the fix was merged into release/v4.4 branch slightly later than the testing of v4.4-beta1 was started. Subsequently we have tagged -beta1 on the commit which QA was testing, not the tip of the release branch. This fix commit along with other fixes which were merged later will be included into the next pre-release, v4.4-rc1, which is being tested now. |
Just remind that v4.3.2 does not include this fix. |
Thanks for the reminder @AxelLin, it is merged now, should appear on Github soon and will be part of v4.3.3. |
The release of the semaphore indicating the item was successfully sent must be the last semaphore released. The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer. The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR. This commit fixes this order. Issue (IDFGH-6030) #7716 describes the problem in more detail. Closes IDFGH-6030, #7716 Closes IDFGH-6036, #7721
The release of the semaphore indicating the item was successfully sent must be the last semaphore released. The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer. The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR. This commit fixes this order. Issue (IDFGH-6030) #7716 describes the problem in more detail. Closes IDFGH-6030, #7716 Closes IDFGH-6036, #7721
The release of the semaphore indicating the item was successfully sent must be the last semaphore released. The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer. The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR. This commit fixes this order. Issue (IDFGH-6030) #7716 describes the problem in more detail. Closes IDFGH-6030, #7716 Closes IDFGH-6036, #7721
Environment
git describe --tags
to find it): v4.4-dev-3042-g220590d599 (but I checked latest code today and it still has the same problem seen in source code so isn't fixed).xtensa-esp32-elf-gcc --version
to find it): 8.4.0Problem Description
xRingbufferSend (and related function xRingbufferSendFromISR) in ringbuf.c incorrectly gives back the semaphore indicating the item was successfully sent (rbGET_RX_SEM_HANDLE) so ready for reading BEFORE giving back the semaphore so other tasks can send (rbGET_TX_SEM_HANDLE) so ready for writing. If a separate task reads from the Ringbuffer, returns the read item to the buffer, and then deletes the Ringbuffer because it is no longer needed, then an assertion crash can occur in xQueueGenericSend in queue.c because the deleted Ringbuffer is freed memory that may be reused and the assertion check for uxItemSize being zero may fail.
The problem does not occur when using xRingbufferSendAcquire and xRingbufferSendComplete in sequence because xRingbufferSendAcquire gives back the semaphore to other tasks (rbGET_TX_SEM_HANDLE) before xRingbufferSendComplete gives back the semaphore indicating the item was successfully sent (rbGET_RX_SEM_HANDLE). The sequence of semaphore release in xRingbufferSend and xRingbufferSendFromISR are reversed from what they should be. The incorrect sequence copied from code is the following:
This sequence should be swapped to be the following in both xRingbufferSend and xRingbufferSendFromISR:
Expected Behavior
Deleting a Ringbuffer after a xRingbufferReceive indicating that the Ringbuffer is no longer needed should not cause an assertion crash. It should allow the Ringbuffer deletion to occur without incident.
Actual Behavior
A vRingbufferDelete sometimes causes an assertion crash in xQueueGenericSend though is timing, task priority, and CPU affinity dependent. It is most likely to occur when the two tasks have the same task priority and CPU affinity (i.e. running on the same CPU).
Steps to reproduce
Run the sample code designed to reproduce the crash.
Fix the code in ringbuf.c and see that the crash no longer occurs.
Note that the original crash was in more complex code that used Wi-Fi and lwIP but is not necessary though to force the crash to occur for this much simplified code in this bug report I had to explicitly write to the freed memory since a new malloc would not allocate at the required memory area (though clearly lwIP code was doing just that in the original code).
Code to reproduce this issue
GIST of code to reproduce this issue
Debug Logs
rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
:
:
starting app_main and waiting 4 seconds (so time to stop monitor after crash)
before RingbufferReceive
at start of func
size = 4, retcode = 100
end of app_main
assert failed: xQueueGenericSend queue.c:818 (!( ( pvItemToQueue == ((void *)0) ) && ( pxQueue->uxItemSize != ( UBaseType_t ) 0U ) ))
Backtrace:0x400818d3:0x3ffb78d00x40085bc1:0x3ffb78f0 0x4008a93d:0x3ffb7910 0x400863fe:0x3ffb7a30 0x40085995:0x3ffb7a70 0x400d48eb:0x3ffb7aa0 0x40088489:0x3ffb7ad0
0x400818d3: panic_abort at /Users/richfalk/esp/esp-idf/components/esp_system/panic.c:391
0x40085bc1: esp_system_abort at /Users/richfalk/esp/esp-idf/components/esp_system/esp_system.c:129
0x4008a93d: __assert_func at /Users/richfalk/esp/esp-idf/components/newlib/assert.c:85
0x400863fe: xQueueGenericSend at /Users/richfalk/esp/esp-idf/components/freertos/queue.c:818 (discriminator 2)
0x40085995: xRingbufferSend at /Users/richfalk/esp/esp-idf/components/esp_ringbuf/ringbuf.c:1049
0x400d48eb: func at /Users/richfalk/esp/ringbuffer_crash/main/ringbuffer_crash.c:147
0x40088489: vPortTaskWrapper at /Users/richfalk/esp/esp-idf/components/freertos/port/xtensa/port.c:159
After the code is fixed, we see the following expected output:
starting app_main and waiting 4 seconds (so time to stop monitor after crash)
before RingbufferReceive
at start of func
size = 4, retcode = 100
end of app_main
xRingbufferSend = 1
about to delete func task
Other items if possible
sdkconfig is standard default (attached with ".txt" appended so that it is attachable to this case)
elf file is also attached (as a .zip file)
no coredump (not really needed since sequence of calling is described below based on printf
sdkconfig.txt
ringbuffer_crash.elf.zip
The following shows the sequence of calls in the two tasks showing why the problem occurs.
Remember that in the above the "write to uxItemSize" is just to force simulate
other activity in the system that will allocate and write to memory that was
previously freed.
The text was updated successfully, but these errors were encountered: