Remove unnecessary xSemaphoreTake() of USBHID semaphore #7205

RefactorFactory · 2022-08-30T21:14:18Z

Description of Change

The HID semaphore allows USBHID::SendReport() to wait for the completion of
report sending. There seemed to be an extra call to xSemaphoreTake() of this
semaphore that served no purpose.

With this extra call, occasionally, the following would happening:

USBHID::SendReport() would send a report by calling tud_hid_n_report().
The send would complete and (presumably on another thread)
tud_hid_report_complete_cb() would be called and it would xSemaphoreGive()
the semaphore.
In USBHID::SendReport(), the extra xSemaphoreTake(sem, 0) would succeed,
taking the semaphore.
On the next line, xSemaphoreTake(sem, timeout_ms ...) would timeout
because the semaphore was already taken by the previous line of code.

The result would be waiting timeout_ms for no reason.

The fix is to eliminate the extra unnecessary call to xSemaphoreTake().

Tests scenarios

I have tested my Pull Request on Arduino-esp32 core v2.0.4 with ESP32-S2 Board with this scenario.

While testing, I used millis() to time how long the calls to xSemaphoreTake() would take.

SuGlider · 2022-09-05T21:31:02Z

@RefactorFactory - Thanks for the PR. Indeed, this extra xSemaphoreTake(tinyusb_hid_device_input_sem, 0); seems to have no function here unless force an unnecessary extra time out.

@me-no-dev - Please take a look and let me know. Thanks.

me-no-dev · 2022-09-14T08:36:47Z

The reason for this call is to make sure that any previous given semaphore is cleared, so that the next Take will return after the current send is completed. Let's storm a bit on this first. Any chance you can provide a test case that would show an issue?

RefactorFactory · 2022-09-15T05:23:13Z

The reason for this call is to make sure that any previous given semaphore is cleared, so that the next Take will return after the current send is completed. Let's storm a bit on this first. Any chance you can provide a test case that would show an issue?

What is the scenario when a given semaphore is left given / non-cleared? Do you mean a scenario like the following: USBHID::SendReport() is called, it calls tud_hid_n_report(), the callback is called and it gives the semaphore, then for some reason USBHID::SendReport() does not call xSemaphoreTake()? Is this the situation you're trying to address with the xSemaphoreTake(..., 0)?

This gist is a test case: https://gist.github.com/RefactorFactory/62dd9f98d40e55584966bed28fa98868

KeyboardMessage.ino is based on this example. It runs a background task that randomly consumes CPU to try to help reproduce the semaphore race condition.
USBHID.cpp contains the callback fix USBHIDKeyboard: Fix 200ms delay for every key #7218, and it adds instrumentation code to log the return value results from xSemaphoreTake() and it times their executions.
Run the KeyboardMessage.ino sketch (with the USBHID.cpp with instrumentation logging) on an ESP32-S2.
Connect the ESP32-S2 to a computer with a text editor running.
Hold down the boot/0 button on your ESP32-S2 and it will type the letter a and it will also type logging (I didn't log to Serial because I seem to encounter Native USB CDC hangs up on ESP32-S2 after about 320 characters #6221). Keep holding the button down until you see output like this:

a[642579][I][USBHID.cpp:369] SendReport(): take_result0: 0, took 0 ms;  take_result1: 1, took 4 ms
[642579][I][USBHID.cpp:369] SendReport(): take_result0: 0, took 0 ms;  take_result1: 1, took 0 ms
a[643139][I][USBHID.cpp:369] SendReport(): take_result0: 0, took 0 ms;  take_result1: 1, took 1 ms
[643243][E][USBHID.cpp:365] SendReport(): report 1 wait failed

[643243][I][USBHID.cpp:369] SendReport(): take_result0: 1, took 0 ms;  take_result1: 0, took 101 ms
a[644020][I][USBHID.cpp:369] SendReport(): take_result0: 0, took 0 ms;  take_result1: 1, took 0 ms
[644021][I][USBHID.cpp:369] SendReport(): take_result0: 0, took 0 ms;  take_result1: 1, took 1 ms

This output shows take_result0: 1 which means that the xSemaphoreTake(..., 0) succeeded in taking the semaphore. Then xSemaphoreTake(..., timeout_ms / portTICK_PERIOD_MS) is called and it times out after 101 ms, returning 0, causing report 1 wait failed to be logged in the previous line.

This test case demonstrates that the call to xSemaphoreTake(..., 0) periodically/randomly causes unnecessary timeouts for simple code that is just typing the letter a. It only happens some of the time because it's essentially a race condition between the thread that calls the callback tud_hid_report_complete_cb(), and the thread calling USBHID::SendReport().

Thanks.

me-no-dev · 2022-09-15T12:15:15Z

Maybe it would make more sense if moved above res = tud_hid_n_report(0, id, data, len); to make sure that any possible previous semaphore is taken and that semaphore, if be given before res = tud_hid_n_report(0, id, data, len); returns is handled properly.

RefactorFactory · 2022-09-15T19:56:32Z

Maybe it would make more sense if moved above res = tud_hid_n_report(0, id, data, len); to make sure that any possible previous semaphore is taken and that semaphore, if be given before res = tud_hid_n_report(0, id, data, len); returns is handled properly.

I think I understand your alternative move, but what is the scenario where the previous semaphore is given, but not taken, that you're trying to address?

One does not try to clear a critical section or mutex before entering it, so why clear a semaphore before trying to take it? What am I missing?

Thanks.

me-no-dev · 2022-09-16T13:10:01Z

One option is to send a packet and not get the "sent" event on time, but there have been other cases where an extra "sent" will be received and the semaphore will already be given. The point of this semaphore is to make sending blocking, so send something, wait for the "sent" event and be able to send something else. If you have the semaphore given earlier and not cleared, function will return immediately and can cause USB overflow and other unintended consequences.

The HID semaphore allows USBHID::SendReport() to wait for the completion of report sending. With a zero timeout xSemaphoreTake() after calling tud_hid_n_report(), occasionally, the following would happening: 1. USBHID::SendReport() would send a report by calling tud_hid_n_report(). 2. The send would complete and (presumably on another thread) tud_hid_report_complete_cb() would be called and it would xSemaphoreGive() the semaphore. 3. In USBHID::SendReport(), the zero timeout xSemaphoreTake(sem, 0) would succeed, taking the semaphore. 4. On the next line, xSemaphoreTake(sem, timeout_ms ...) would timeout because the semaphore was already taken by the previous line of code. The result would be waiting timeout_ms for no reason. The purpose of the zero timeout xSemaphoreTake() is to clear the semaphore in case a previous SendReport() timed out waiting for the semaphore. In that case, tud_hid_report_complete_cb() may be called after the timeout, giving the semaphore. Then the next SendReport() would start with the semaphore given, which isn't desired if we want to call xSemaphoreTake(sem, timeout_ms ...) on it. There have also been other cases where tud_hid_report_complete_cb() is called an extra time, causing the same situation. The fix is to move the zero timeout xSemaphoreTake() before the call to tud_hid_n_report(). This eliminates the race between the zero timeout xSemaphoreTake() and tud_hid_report_complete_cb() in the common case when no timeout occurs. There is still a possible race condition between the zero timeout xSemaphoreTake() and tud_hid_report_complete_cb() in the case of a timeout, but that should be rarer.

RefactorFactory · 2022-09-17T04:22:15Z

One option is to send a packet and not get the "sent" event on time, but there have been other cases where an extra "sent" will be received and the semaphore will already be given. The point of this semaphore is to make sending blocking, so send something, wait for the "sent" event and be able to send something else. If you have the semaphore given earlier and not cleared, function will return immediately and can cause USB overflow and other unintended consequences.

Ok, thanks for explaining, it makes sense. I'll update the commit to just move the zero timeout xSemaphoreTake() above the tud_hid_n_report(). I tested it by hand and it seems fine.

VojtechBartoska requested a review from SuGlider September 5, 2022 12:44

SuGlider approved these changes Sep 5, 2022

View reviewed changes

SuGlider self-assigned this Sep 5, 2022

SuGlider added this to the 2.0.5 milestone Sep 5, 2022

RefactorFactory force-pushed the usbhid_semaphore branch from d679b9b to af1f2bd Compare September 17, 2022 04:28

me-no-dev merged commit ccf0d87 into espressif:master Sep 17, 2022

RefactorFactory deleted the usbhid_semaphore branch September 17, 2022 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unnecessary xSemaphoreTake() of USBHID semaphore #7205

Remove unnecessary xSemaphoreTake() of USBHID semaphore #7205

RefactorFactory commented Aug 30, 2022

SuGlider commented Sep 5, 2022

me-no-dev commented Sep 14, 2022

RefactorFactory commented Sep 15, 2022 •

edited

Loading

me-no-dev commented Sep 15, 2022

RefactorFactory commented Sep 15, 2022

me-no-dev commented Sep 16, 2022

RefactorFactory commented Sep 17, 2022

Remove unnecessary xSemaphoreTake() of USBHID semaphore #7205

Remove unnecessary xSemaphoreTake() of USBHID semaphore #7205

Conversation

RefactorFactory commented Aug 30, 2022

Description of Change

Tests scenarios

SuGlider commented Sep 5, 2022

me-no-dev commented Sep 14, 2022

RefactorFactory commented Sep 15, 2022 • edited Loading

me-no-dev commented Sep 15, 2022

RefactorFactory commented Sep 15, 2022

me-no-dev commented Sep 16, 2022

RefactorFactory commented Sep 17, 2022

RefactorFactory commented Sep 15, 2022 •

edited

Loading