Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary xSemaphoreTake() of USBHID semaphore #7205

Merged
merged 1 commit into from
Sep 17, 2022

Conversation

RefactorFactory
Copy link
Contributor

Description of Change

The HID semaphore allows USBHID::SendReport() to wait for the completion of
report sending. There seemed to be an extra call to xSemaphoreTake() of this
semaphore that served no purpose.

With this extra call, occasionally, the following would happening:

  1. USBHID::SendReport() would send a report by calling tud_hid_n_report().
  2. The send would complete and (presumably on another thread)
    tud_hid_report_complete_cb() would be called and it would xSemaphoreGive()
    the semaphore.
  3. In USBHID::SendReport(), the extra xSemaphoreTake(sem, 0) would succeed,
    taking the semaphore.
  4. On the next line, xSemaphoreTake(sem, timeout_ms ...) would timeout
    because the semaphore was already taken by the previous line of code.

The result would be waiting timeout_ms for no reason.

The fix is to eliminate the extra unnecessary call to xSemaphoreTake().

Tests scenarios

I have tested my Pull Request on Arduino-esp32 core v2.0.4 with ESP32-S2 Board with this scenario.

While testing, I used millis() to time how long the calls to xSemaphoreTake() would take.

@SuGlider SuGlider self-assigned this Sep 5, 2022
@SuGlider SuGlider added this to the 2.0.5 milestone Sep 5, 2022
@SuGlider
Copy link
Collaborator

SuGlider commented Sep 5, 2022

@RefactorFactory - Thanks for the PR. Indeed, this extra xSemaphoreTake(tinyusb_hid_device_input_sem, 0); seems to have no function here unless force an unnecessary extra time out.

@me-no-dev - Please take a look and let me know. Thanks.

@me-no-dev
Copy link
Member

The reason for this call is to make sure that any previous given semaphore is cleared, so that the next Take will return after the current send is completed. Let's storm a bit on this first. Any chance you can provide a test case that would show an issue?

@RefactorFactory
Copy link
Contributor Author

RefactorFactory commented Sep 15, 2022

The reason for this call is to make sure that any previous given semaphore is cleared, so that the next Take will return after the current send is completed. Let's storm a bit on this first. Any chance you can provide a test case that would show an issue?

What is the scenario when a given semaphore is left given / non-cleared? Do you mean a scenario like the following: USBHID::SendReport() is called, it calls tud_hid_n_report(), the callback is called and it gives the semaphore, then for some reason USBHID::SendReport() does not call xSemaphoreTake()? Is this the situation you're trying to address with the xSemaphoreTake(..., 0)?

This gist is a test case: https://gist.github.com/RefactorFactory/62dd9f98d40e55584966bed28fa98868

  1. KeyboardMessage.ino is based on this example. It runs a background task that randomly consumes CPU to try to help reproduce the semaphore race condition.
  2. USBHID.cpp contains the callback fix USBHIDKeyboard: Fix 200ms delay for every key #7218, and it adds instrumentation code to log the return value results from xSemaphoreTake() and it times their executions.
  3. Run the KeyboardMessage.ino sketch (with the USBHID.cpp with instrumentation logging) on an ESP32-S2.
  4. Connect the ESP32-S2 to a computer with a text editor running.
  5. Hold down the boot/0 button on your ESP32-S2 and it will type the letter a and it will also type logging (I didn't log to Serial because I seem to encounter Native USB CDC hangs up on ESP32-S2 after about 320 characters #6221). Keep holding the button down until you see output like this:
a[642579][I][USBHID.cpp:369] SendReport(): take_result0: 0, took 0 ms;  take_result1: 1, took 4 ms
[642579][I][USBHID.cpp:369] SendReport(): take_result0: 0, took 0 ms;  take_result1: 1, took 0 ms
a[643139][I][USBHID.cpp:369] SendReport(): take_result0: 0, took 0 ms;  take_result1: 1, took 1 ms
[643243][E][USBHID.cpp:365] SendReport(): report 1 wait failed

[643243][I][USBHID.cpp:369] SendReport(): take_result0: 1, took 0 ms;  take_result1: 0, took 101 ms
a[644020][I][USBHID.cpp:369] SendReport(): take_result0: 0, took 0 ms;  take_result1: 1, took 0 ms
[644021][I][USBHID.cpp:369] SendReport(): take_result0: 0, took 0 ms;  take_result1: 1, took 1 ms

This output shows take_result0: 1 which means that the xSemaphoreTake(..., 0) succeeded in taking the semaphore. Then xSemaphoreTake(..., timeout_ms / portTICK_PERIOD_MS) is called and it times out after 101 ms, returning 0, causing report 1 wait failed to be logged in the previous line.

This test case demonstrates that the call to xSemaphoreTake(..., 0) periodically/randomly causes unnecessary timeouts for simple code that is just typing the letter a. It only happens some of the time because it's essentially a race condition between the thread that calls the callback tud_hid_report_complete_cb(), and the thread calling USBHID::SendReport().

Thanks.

@me-no-dev
Copy link
Member

Maybe it would make more sense if moved above res = tud_hid_n_report(0, id, data, len); to make sure that any possible previous semaphore is taken and that semaphore, if be given before res = tud_hid_n_report(0, id, data, len); returns is handled properly.

@RefactorFactory
Copy link
Contributor Author

Maybe it would make more sense if moved above res = tud_hid_n_report(0, id, data, len); to make sure that any possible previous semaphore is taken and that semaphore, if be given before res = tud_hid_n_report(0, id, data, len); returns is handled properly.

I think I understand your alternative move, but what is the scenario where the previous semaphore is given, but not taken, that you're trying to address?

One does not try to clear a critical section or mutex before entering it, so why clear a semaphore before trying to take it? What am I missing?

Thanks.

@me-no-dev
Copy link
Member

One option is to send a packet and not get the "sent" event on time, but there have been other cases where an extra "sent" will be received and the semaphore will already be given. The point of this semaphore is to make sending blocking, so send something, wait for the "sent" event and be able to send something else. If you have the semaphore given earlier and not cleared, function will return immediately and can cause USB overflow and other unintended consequences.

The HID semaphore allows USBHID::SendReport() to wait for the completion of
report sending.

With a zero timeout xSemaphoreTake() after calling tud_hid_n_report(),
occasionally, the following would happening:

1. USBHID::SendReport() would send a report by calling tud_hid_n_report().
2. The send would complete and (presumably on another thread)
   tud_hid_report_complete_cb() would be called and it would xSemaphoreGive()
   the semaphore.
3. In USBHID::SendReport(), the zero timeout xSemaphoreTake(sem, 0) would
   succeed, taking the semaphore.
4. On the next line, xSemaphoreTake(sem, timeout_ms ...) would timeout
   because the semaphore was already taken by the previous line of code.

The result would be waiting timeout_ms for no reason.

The purpose of the zero timeout xSemaphoreTake() is to clear the semaphore in
case a previous SendReport() timed out waiting for the semaphore. In that case,
tud_hid_report_complete_cb() may be called after the timeout, giving the
semaphore. Then the next SendReport() would start with the semaphore given,
which isn't desired if we want to call xSemaphoreTake(sem, timeout_ms ...) on
it.

There have also been other cases where tud_hid_report_complete_cb() is called
an extra time, causing the same situation.

The fix is to move the zero timeout xSemaphoreTake() before the call to
tud_hid_n_report(). This eliminates the race between the zero timeout
xSemaphoreTake() and tud_hid_report_complete_cb() in the common case when no
timeout occurs.

There is still a possible race condition between the zero timeout
xSemaphoreTake() and tud_hid_report_complete_cb() in the case of a timeout,
but that should be rarer.
@RefactorFactory
Copy link
Contributor Author

One option is to send a packet and not get the "sent" event on time, but there have been other cases where an extra "sent" will be received and the semaphore will already be given. The point of this semaphore is to make sending blocking, so send something, wait for the "sent" event and be able to send something else. If you have the semaphore given earlier and not cleared, function will return immediately and can cause USB overflow and other unintended consequences.

Ok, thanks for explaining, it makes sense. I'll update the commit to just move the zero timeout xSemaphoreTake() above the tud_hid_n_report(). I tested it by hand and it seems fine.

@me-no-dev me-no-dev merged commit ccf0d87 into espressif:master Sep 17, 2022
@RefactorFactory RefactorFactory deleted the usbhid_semaphore branch September 17, 2022 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants