Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix assertion xQueueGenericSend queue.c:818 (IDFGH-6036) #7721

Closed
wants to merge 1 commit into from

Conversation

RichFalk
Copy link
Contributor

@RichFalk RichFalk commented Oct 18, 2021

The release of the semaphore indicating the item was successfully sent must be the last semaphore released. The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer.

The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR. This commit fixes this order.

Issue (IDFGH-6030) #7716 describes the problem in more detail.

The release of the semaphore indicating the item was successfully sent must be the last semaphore released.  The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer.

The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR.  This commit fixes this order.

Issue (IDFGH-6030) espressif#7716 describes the problem in more detail.
@CLAassistant
Copy link

CLAassistant commented Oct 18, 2021

CLA assistant check
All committers have signed the CLA.

@espressif-bot espressif-bot added the Status: Opened Issue is new label Oct 18, 2021
@github-actions github-actions bot changed the title Fix assertion xQueueGenericSend queue.c:818 Fix assertion xQueueGenericSend queue.c:818 (IDFGH-6036) Oct 18, 2021
@Alvin1Zhang
Copy link
Collaborator

Thanks for your contribution.

Copy link
Collaborator

@sudeep-mohanty sudeep-mohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @RichFalk,
Thanks for reporting this and for submitting the PR.
I went through the problem you have described in #7716 and this PR. However, I'm afraid to say that the problem you are observing is not related to the order in which the rbGET_RX_SEM_HANDLE and rbGET_TX_SEM_HANDLE semaphores are released. Both these semaphores are mutually exclusive, serving different purposes. While, rbGET_RX_SEM_HANDLE is meant for tasks that are waiting to receive data from a ringbuf, rbGET_TX_SEM_HANDLE is meant for tasks waiting for space to free up in a ringbuf to send more items. Here is a description of both and their usage.

The issue, however, occurs because one of the tasks deletes the ring-buffer untimely and therefore causes a crash in the internal ringbuf/queue functions. In the sample code you have shared https://gist.github.com/RichFalk/3501dc1643512affed9005fb3056da0e, it is the task waiting to receive data that deletes the ringbuf. And it just so happens that when you flip the order of releases the semaphores, the code works because there is no ringbuf action happening after the ringbuf is deleted.

I tweaked your code in a way that the opposite happens. I added a task that is waiting to send data to the ringbuf that goes ahead and deletes the ringbuf as well. We again see a crash even with the changed order of the semaphore releases. As explained earlier, this is due to the fact that the ringbuf itself is deleted before its processes are complete.

Here is the sample code gist that you can try out with the changed order of semaphore release. You would notice that this too results in a crash. However, if you do not delete the ringbuf, the code works well.

Hope this helps explain your problem!

@RichFalk
Copy link
Contributor Author

RichFalk commented Oct 22, 2021

I understand what you did but that is not the correct way to view this problem. If two tasks are sending information to one another, the task that is finished using the Ringbuffer will be doing the sending and the other task that created it will be doing the receiving so the receiving semaphore should be the one that comes last. Think of a "passing a return code from a task when it is done" paradigm. If one task creates another and creates a Ringbuffer for communication between tasks, the created task will signal that it is finished after which the Ringbuffer can be deleted safely IF the semaphore sequence has the RX semaphore be last.

What you did having a task waiting for writing then delete the Ringbuffer would make no sense to do since obviously data has been placed into the Ringbuffer so it is not empty and one would not want to delete the Ringbuffer in that case (if there was to be additional reading of the Ringbuffer). Knowing whether a Ringbuffer can be safely deleted can ONLY be determined by reading an "I'm done with the buffer" signal from the other task(s) that use the Ringbuffer.

Also, another way to think about it is to look at the two-step calls xRingbufferSendAcquire followed by xRingbufferSendComplete. The first call is correctly using the TX semaphore because it is acquiring memory slots in the Ringbuffer so needs to block others trying to do the same while the second call correctly uses the RX semaphore because it releases the semaphore for reading when the update is complete (a different task would usually be the one previously taking the RX semaphore as part of xRingbufferReceive). That is the correct and normal sequence to use for releasing these semaphores -- TX before RX (i.e. RX last).

Technically, one ALWAYS acquires a memory slot in the buffer BEFORE adding the item to it so one should ALWAYS use (and also release) the TX semaphore first BEFORE releasing the RX semaphore. Again, the xRingbufferSend is nothing more than a convenience call that is supposed to do the same thing as xRingbufferSendAcquire followed by xRingbufferSendComplete but does the "copying" of data in the function instead of giving a pointer to the caller to do the copying of data. In fact, it would be prudent to have the code in xRingbufferSend actually call xRingbufferSendAcquire, then copy the data, then call xRingbufferSendComplete and return. This would use less code and be more likely to be maintained more consistently.

I don't want to have to use xRingbufferSendAcquire and xRingbufferSendComplete as a workaround for this problem. The xRingbufferSend and xRingbufferSendFromISR should behave consistently with xRingbufferSendAcquire followed by xRingbufferSendComplete.

@RichFalk
Copy link
Contributor Author

RichFalk commented Oct 22, 2021

Let me show the semaphore sequences side-by-side with the correct xRingbufferSendAcquire followed by xRingbufferSendComplete compared to the incorrect xRingbufferSend with its RX then TX semaphore release order.

CORRECT SEMAPHORE SEQUENCE         INCORRECT SEMAPHORE SEQUENCE
--------------------------         ----------------------------
                      [ in Task #1 ]
xRingbufferReceive                 xRingbufferReceive
   prvReceiveGeneric                  prvReceiveGeneric
      Sem.Take RX                        Sem.Take RX

                      [ in Task #2 ]
xRingbufferSendAcquire             xRingbufferSend
   Sem.Take TX                        Sem.Take TX
   xCheckItemFits                     xCheckItemFits
                                      vCopyItem
   prvAcquireItemNoSplit                 prvAcquireItemNoSplit
   Sem.Give TX

memcpy                                   memcpy

xRingbufferSendComplete
   prvSendItemDoneNoSplit                prvSendItemDoneNoSplit
   Sem.Give RX                        Sem.Give RX
                                      Sem.Give TX

You can see in the above on the left which uses the Acquire/Complete calls that the sequence is correct and only gives the RX semaphore after the item is ready to be read AND does not try and release the TX semaphore after that. Instead, the TX semaphore is released earlier after Ringbuffer memory is acquired.

You can see on the right the incorrect sequencing. While the simplest fix would simply reverse the Give (release) sequence to do TX first and then RX, if you really want to fix this in a cleaner way (but changing more code) I would just have the Send call the Acquire/Complete since that reuses code. The xRingbufferSend would do the following:

xRingbufferSendAcquire
memcpy
xRingbufferSendComplete

Of course, there should be error handling. If you want me to write, test, and submit here this perhaps more elegant fix, let me know and I'd be happy to do that though it could only be done for the xRingbufferSend and not the xRingbufferSendFromISR because there is no Acquire/Complete pair that uses portENTER_CRITICAL_ISR instead of portENTER_CRITICAL. My pull request just did the minimum needed to fix the problem in both xRingbufferSend and xRingbufferSendFromISR.

@sudeep-mohanty
Copy link
Collaborator

Hi @RichFalk,
Thanks for the detailed explanation of the semaphore release sequence. As it turns out, the Rx semaphore indeed triggers a task which has no more use of the ring buffer and deletes it. We can also assume that only a task which is receiving data on the ring buffer may delete the buffer. Hence it would be safe to release the Rx semaphore the last. In conclusion, the minimal change of swapping the order of releasing the semaphores should do the trick in this case.
I will be merging your PR. Thanks again for collaborating!

@RichFalk
Copy link
Contributor Author

Hi @sudeep-mohanty, Thanks! Glad to help.

espressif-bot pushed a commit that referenced this pull request Nov 3, 2021
The release of the semaphore indicating the item was successfully sent must be the last semaphore released.  The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer.

The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR.  This commit fixes this order.

Issue (IDFGH-6030) #7716 describes the problem in more detail.

Closes IDFGH-6030, #7716
Closes IDFGH-6036, #7721
@Alvin1Zhang
Copy link
Collaborator

Thanks for your contribution again, changes merged with 1222f6d.

@Alvin1Zhang Alvin1Zhang closed this Nov 4, 2021
@Alvin1Zhang Alvin1Zhang added Resolution: Done Issue is done internally and removed Status: In Progress Work is in progress labels Nov 4, 2021
espressif-bot pushed a commit that referenced this pull request Dec 9, 2021
The release of the semaphore indicating the item was successfully sent must be the last semaphore released.  The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer.

The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR.  This commit fixes this order.

Issue (IDFGH-6030) #7716 describes the problem in more detail.

Closes IDFGH-6030, #7716
Closes IDFGH-6036, #7721
espressif-bot pushed a commit that referenced this pull request Dec 24, 2021
The release of the semaphore indicating the item was successfully sent must be the last semaphore released.  The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer.

The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR.  This commit fixes this order.

Issue (IDFGH-6030) #7716 describes the problem in more detail.

Closes IDFGH-6030, #7716
Closes IDFGH-6036, #7721
espressif-bot pushed a commit that referenced this pull request Dec 25, 2021
The release of the semaphore indicating the item was successfully sent must be the last semaphore released.  The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer.

The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR.  This commit fixes this order.

Issue (IDFGH-6030) #7716 describes the problem in more detail.

Closes IDFGH-6030, #7716
Closes IDFGH-6036, #7721
espressif-bot pushed a commit that referenced this pull request Dec 26, 2021
The release of the semaphore indicating the item was successfully sent must be the last semaphore released.  The receiver may be in another task and may delete the Ringbuffer (such as with a return code across tasks design pattern) if they are through with the Ringbuffer.

The function xRingbufferSendAcquire followed by xRingbufferSendComplete had the semaphores released in the proper order and that same pattern should have been used in xRingbufferSend and xRingbufferSendFromISR.  This commit fixes this order.

Issue (IDFGH-6030) #7716 describes the problem in more detail.

Closes IDFGH-6030, #7716
Closes IDFGH-6036, #7721
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants