Skip to content

Fix edpt xfer race condition#508

Merged
hathach merged 9 commits into
masterfrom
fix-edpt-race
Sep 14, 2020
Merged

Fix edpt xfer race condition#508
hathach merged 9 commits into
masterfrom
fix-edpt-race

Conversation

@hathach
Copy link
Copy Markdown
Owner

@hathach hathach commented Sep 9, 2020

Describe the PR

uint32_t tud_cdc_n_write_flush (uint8_t itf)
{
  cdcd_interface_t* p_cdc = &_cdcd_itf[itf];

  // skip if previous transfer not complete yet
  TU_VERIFY( !usbd_edpt_busy(TUD_OPT_RHPORT, p_cdc->ep_in), 0 );

  uint16_t count = tu_fifo_read_n(&p_cdc->tx_ff, p_cdc->epin_buf, sizeof(p_cdc->epin_buf));
  if ( count )
  {
    TU_VERIFY( tud_cdc_n_connected(itf), 0 ); // fifo is empty if not connected
    TU_ASSERT( usbd_edpt_xfer(TUD_OPT_RHPORT, p_cdc->ep_in, p_cdc->epin_buf, count), 0 );
  }

  return count;
}

Although this is fine with noOS since tinyusb API() should always be called in thread mode (never ISR). However with an RTOS, between the edpt_busy() and the edpt_xfer() we can be preempted by higher prio task, which can also calling the same API e.g cdc_write_flush() (often after cdc_write()). This lead to the stack re-submit another transfer to an already busy endpoint after High prio task complete and the Low prio continue. Of course dcd detect and reject this and the call failed nicely. However as above write_flush(), the data that is pull out of the tx_fifo is not processed -> cause missing character with print(). Although we can put it back with fifo_write_front() in theory, however not all processed function is revertible.

In short, this PR introduce usbd_edpt_claim() and release() with the help of RTOS mutex mechanism, to prevent other thread to interfere between edpt_busy() and edpt_xfer(). The claim/release is OPTIONAL for now, since not all class driver need it. For example all MSC transfer is handled by the stack via callback, user never actively submit an transfer, therefore there is no racing. However, later on we can make it mandatory just to be consistent across the API.

Ideally it would be 1 mutex per endpiont, but that seems to be a lot of memory for edge case. Maybe only one general mutex is enough, and the edpt_claim()/edpt_release() is fast enough I guess. Let's me know what do you think about the PR and how we could improve it. I am open to any suggestion.

PS is still WIP, only applies to cdc driver first for feedback.

@duempel
Copy link
Copy Markdown
Collaborator

duempel commented Sep 10, 2020

Ideally it would be 1 mutex per endpiont, but that seems to be a lot of memory for edge case. Maybe only one general mutex is enough, and the edpt_claim()/edpt_release() is fast enough I guess. Let's me know what do you think about the PR and how we could improve it. I am open to any suggestion.

Usually I'd say that one mutex for claiming all endpoints should be ok. There is not much processing and high priority tasks can only be blocked very shortly. But this short blocking is not guaranteed, as we can see with a setup like #507 . If the low priority task won't fall back to it's low priority after releasing the mutex other high prio tasks can be blocked for much longer than expected. This won't result in the error we saw before, but could cause problems to time critical tasks. I also don't like to create seperate mutex for each endpoint and don't see a more elegant solution right now. Just want to mention that the chance of blocking is higher because different endpoints are sharing only one ressource. Also in theory the risk of deadlocks will become higher.

Is it necessary to claim the data out endpoint? I think _prep_out_transaction is only called by tud_task and user tasks won't call a function which prepares incoming data. If this is necessary then I'm missing a release of this endpoint. Maybe cdcd_xfer_cb would be a good place for it. Looks like the _prep_out_transaction can not be triggered a 2nd time right now.

Also I'm not sure if we need to have both: busy and claimed flags. The busy flag is indicating that device controller driver are processing the endpoint at that moment. If it's busy we can't schedule more transfers to this endpoint. But right now we also don't prepare fifos or any other ressources with data to process later. From a functional point of view I'd say that an endpoint should be indicated as busy already if some class just plans to use it and prepares ressources (which is done before dcd processed and usbd sets it busy). This is pretty the same as we use the claim flag right now.

@hathach
Copy link
Copy Markdown
Owner Author

hathach commented Sep 10, 2020

Usually I'd say that one mutex for claiming all endpoints should be ok. There is not much processing and high priority tasks can only be blocked very shortly. But this short blocking is not guaranteed, as we can see with a setup like #507 . If the low priority task won't fall back to it's low priority after releasing the mutex other high prio tasks can be blocked for much longer than expected. This won't result in the error we saw before, but could cause problems to time critical tasks. I also don't like to create seperate mutex for each endpoint and don't see a more elegant solution right now. Just want to mention that the chance of blocking is higher because different endpoints are sharing only one ressource. Also in theory the risk of deadlocks will become higher.

It only happens with FreeRTOS which is scenarios that almost never occur 😅 😅 . Surely using more mutex will increase the chance of HP being blocked, however it is really FreeRTOS issue, and isn't ours to fix. Eventually they will resolve this when more people start to complain :)

Is it necessary to claim the data out endpoint? I think _prep_out_transaction is only called by tud_task and user tasks won't call a function which prepares incoming data. If this is necessary then I'm missing a release of this endpoint. Maybe cdcd_xfer_cb would be a good place for it. Looks like the _prep_out_transaction can not be triggered a 2nd time right now.

_prep_out_transaction() is also invoked by user via cdc_read(). The flow is

  • tud_task() try to submit transfer as long as the FIFO can still has spaces for it. But will stop whenever there no more space (else we got discarded data).
  • When user read() from fifo and it got back the space, the read() will submit the transfer. There is still conflict with multiple read() from different thread. Though it happens much less than write() with printf()

Also I'm not sure if we need to have both: busy and claimed flags. The busy flag is indicating that device controller driver are processing the endpoint at that moment. If it's busy we can't schedule more transfers to this endpoint. But right now we also don't prepare fifos or any other ressources with data to process later. From a functional point of view I'd say that an endpoint should be indicated as busy already if some class just plans to use it and prepares ressources (which is done before dcd processed and usbd sets it busy). This is pretty the same as we use the claim flag right now.

Yeah, I am debating this myself as well, however there is scenario where the driver after claimed the endpoint, decide not to make an transfer due to its internal error, e.g out of data from the fifo, which could be another race issue with others. Then the driver must release the endpoint. I am not entirely sure if we could just simply mark busy = 0 in this case, since it can potentially mask an actually busy-transferring endpoint. However, the more I think about this now, I think we can actually do busy = 0 as release, since it is already claimed, there should be no transfer on it. If it does, then we have another issue to fix. Thanks for the feedback, I will update the PR to only use busy.

@hathach
Copy link
Copy Markdown
Owner Author

hathach commented Sep 10, 2020

@duempel oh, while removing the claimed bit, I realize there may be an issue. Since currently the edpt_claim() is optional. Using only 1 busy bit, the edpt_xfer() won't be able to force and check to see if the endpoint is currently transferring or not. E.g an API can call edpt_xfer() 2 times, and the busy is set just like using the claim_edpt() hmm. Maybe we can leave that later when forcing all API to use claim, however, it is not really necessary for endpoint that isn't directly involved to public API().

@hathach
Copy link
Copy Markdown
Owner Author

hathach commented Sep 11, 2020

@duempel Maybe TinyUSB could implement osal_mutex as normal semaphore in FreeRTOS instead 🤔 . Of course, we will fall into the priority inversion scenario, though I am not sure between the 2 cases, which is worst hmmm.

@duempel
Copy link
Copy Markdown
Collaborator

duempel commented Sep 12, 2020

Surely using more mutex will increase the chance of HP being blocked

Yes using more mutex will increase the chance of blocking. But I actually wanted to say that using more mutex objects we can decrease the chance of blocking since EP3 would not try to get the same mutex as EP2. Well but one is just fine for now. Don't need to waste ressources on this.

_prep_out_transaction() is also invoked by user via cdc_read().

Oh yeah, my fault. I just wonder if we would really need it in user API but this can be discussed later 😀

Since currently the edpt_claim() is optional. Using only 1 busy bit, the edpt_xfer() won't be able to force and check to see if the endpoint is currently transferring or not. E.g an API can call edpt_xfer() 2 times, and the busy is set just like using the claim_edpt() hmm. Maybe we can leave that later when forcing all API to use claim, however, it is not really necessary for endpoint that isn't directly involved to public API().

This is a good point. usbd has to know if dcd is processing the data right now. Better to keep the claim bit instead of adding additional logic to handle those cases.

Maybe TinyUSB could implement osal_mutex as normal semaphore in FreeRTOS instead 🤔 . Of course, we will fall into the priority inversion scenario, though I am not sure between the 2 cases, which is worst hmmm.

I've also thought about this after you came up with #507 . This small change could have fixed the issue. But I would keep it as a mutex. We also should not care that much if priority inversion or priority inheritance is our way to go. If someone is designing a time critical application the developer has to plan the scheduling and ressource sharing anyway. In case of FreeRTOS maybe using both would be our way to go: mutex for fifo blocking and binary semaphore as endpoint claiming.
Our goal is that our API do not influence each other and we do not loose data or run into other errors. Timing is too complex to take care of different applications.

@hathach all in all I feel good with these changes. In _prep_out_transaction maybe claiming the endpoint would make more sense if we do this before availability check. In case tud_task has a higher priority than the user task which is calling a tud_cdc_n_read for example. The user task then processes _prep_out_transaction and reads the available bytes in rx fifo. If the higher priority task then invokes cdcd_xfer_cb and write data to rx fifo there might be less space. In this case the _prep_out_transaction called by tud_task notices there is not enough space and will return without claiming the endpoint. Because the endpoint is not claimed and the user task thinks it's enough space within rx fifo, it will continue with usbd_edpt_xfer and some bytes could get lost. Of course when changing the order there has to be a release call added in case of no available space.

@hathach
Copy link
Copy Markdown
Owner Author

hathach commented Sep 13, 2020

Yes using more mutex will increase the chance of blocking. But I actually wanted to say that using more mutex objects we can decrease the chance of blocking since EP3 would not try to get the same mutex as EP2. Well but one is just fine for now. Don't need to waste ressources on this.

Right, I thought of this as well, 16 mutex is lots of resources. 1 Mutex can easily occupies 32 or 64 bytes.

Oh yeah, my fault. I just wonder if we would really need it in user API but this can be discussed later

Oh yeah, If you have idea to not include it in user API, then just submit PR, no need for discussion :D

This is a good point. usbd has to know if dcd is processing the data right now. Better to keep the claim bit instead of adding additional logic to handle those cases.

OK, for now we can leave it as it is for now, at least it shares storage with busy, and doesn't occupies any SRAM. We can always refactor later.

I've also thought about this after you came up with #507 . This small change could have fixed the issue. But I would keep it as a mutex. We also should not care that much if priority inversion or priority inheritance is our way to go. If someone is designing a time critical application the developer has to plan the scheduling and ressource sharing anyway. In case of FreeRTOS maybe using both would be our way to go: mutex for fifo blocking and binary semaphore as endpoint claiming.
Our goal is that our API do not influence each other and we do not loose data or run into other errors. Timing is too complex to take care of different applications.

I agree, it is a bit too much for a usb stack to worry about and it won't be able to solve it anyway. OSAL currently includes the semaphore API, though I have a plan to remove it to only keep queue and mutex to make it easier to port new RTOS. I am thinking about it, however, I am lean toward keeping both as mutex as it reflect the correct scenario we have mutual exclusive to access shared resource. Semaphore is more about synchronization.

@hathach all in all I feel good with these changes. In _prep_out_transaction maybe claiming the endpoint would make more sense if we do this before availability check. In case tud_task has a higher priority than the user task which is calling a tud_cdc_n_read for example. The user task then processes _prep_out_transaction and reads the available bytes in rx fifo. If the higher priority task then invokes cdcd_xfer_cb and write data to rx fifo there might be less space. In this case the _prep_out_transaction called by tud_task notices there is not enough space and will return without claiming the endpoint. Because the endpoint is not claimed and the user task thinks it's enough space within rx fifo, it will continue with usbd_edpt_xfer and some bytes could get lost. Of course when changing the order there has to be a release call added in case of no available space.

maybe we should include the edpt_busy() check in the edpt_claim() before attempting to call mutex_lock(). It seems to be the need at all time.

@hathach
Copy link
Copy Markdown
Owner Author

hathach commented Sep 14, 2020

@hathach all in all I feel good with these changes. In _prep_out_transaction maybe claiming the endpoint would make more sense if we do this before availability check. In case tud_task has a higher priority than the user task which is calling a tud_cdc_n_read for example. The user task then processes _prep_out_transaction and reads the available bytes in rx fifo. If the higher priority task then invokes cdcd_xfer_cb and write data to rx fifo there might be less space. In this case the _prep_out_transaction called by tud_task notices there is not enough space and will return without claiming the endpoint. Because the endpoint is not claimed and the user task thinks it's enough space within rx fifo, it will continue with usbd_edpt_xfer and some bytes could get lost. Of course when changing the order there has to be a release call added in case of no available space.

Ah right, I think I misread this in previous comment. I will update the PR to claim the endpoint before checking the fifo space. Thanks for you analysis.

- add pre-check to reduce mutex lock in usbd_edpt_claim
@hathach hathach marked this pull request as ready for review September 14, 2020 15:48
@hathach
Copy link
Copy Markdown
Owner Author

hathach commented Sep 14, 2020

merged now since it is ready and I need to move on with other works. Issues such as semaphore vs mutex can be done a follow-up PR if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition with cdc auto flush() on transfer completion

2 participants