Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Refactor ChibiOS USB endpoints to be fully async #21656

Merged
merged 4 commits into from
Feb 28, 2024

Conversation

KarlK90
Copy link
Member

@KarlK90 KarlK90 commented Jul 31, 2023

Description

This PR aims to unifies the ChibiOS USB endpoint implementation for all IN and OUT endpoints to fully async operation.

Current ChibiOS USB endpoint implementation

Previously unidirectional sending for all IN endpoints was implemented in a blocking manner. Sending was implemented by invoking the ChibiOS USB API directly, waiting with a fixed timeout of 100ms when the endpoint wasn't ready and discarding any report that couldn't be send.

Bidirectional IN and OUT endpoints used a driver that was derived from the ChibiOS CDC driver that was already async in operation. This implementation wasn't suitable for unidirectional endpoints though and discarded any new report if the sending queue was full.

Why asynchronous report handling instead of simple blocking?

The basic assumption is that we don't know when an IN endpoint is polled the next time or at all and we can't apply a heuristic for how long to wait. Likewise we don't know if an OUT endpoint will ever receive data. Some examples to back this up:

Furthermore USB HID keybord reports (and other absolute mode reports) are stateful. A pressed key report must be followed by a report that releases the key again or it will remain stuck.

From this I derive the following requirements:

  1. We must not block the main processing loop of QMK when sending or receiving reports, otherwise we dead lock the device. See ChibiOS: Use 'usbTransmit' instead of custom function for 'send_report' #22386 (comment)
  2. We must not discard the latest HID reports or we can bring the keyboard state out of sync. e.g. stuck keys.

Which leads to my conclusion that:

  1. That sending and receiving must be implemented async
  2. USB IN reports must be queued in a ring buffer which always holds the latest reports

Which is done in this PR.

As the USB idle rate and get report request handling was tightly coupled to the blocking sending implementation it was refactored as well to not introduce any regression in features. A benefit is that these are now supported for all IN endpoints in a generic manner.

Tests

All tests are run against handwired/onekey/<board>:default keyboards with enabled console, nkro and mousekeys features. So the most used feature set is covered.

Automated USB compliance tests

For compliance tests the USB3CV tool version 3.0.0.0 is used on a W11 hosts.

MCU Chapter 9 Tests (USB 2 Devices) HID Tests
STM32F401 ✔️ ✔️
STM32F411 ✔️ ✔️
STM32F303 ✔️ ✔️
RP2040 ✔️ ✔️

Manual suspend and resume tests

These manual tests exercised that resume and suspend is working as expected, which means:

  • Host is woken on keypress
  • No hanging keys on wake
  • Keyboard works just the same as before suspending
MCU Linux MacOS W11
STM32F401 ✔️ ✔️ ✔️
STM32F411 ✔️ ✔️ ✔️
STM32F303 ✔️ ✔️ ✔️
RP2040 ✔️ ✔️ ✔️

Types of Changes

  • Core
  • Bugfix
  • New feature
  • Enhancement/optimization
  • Keyboard (addition or update)
  • Keymap/layout/userspace (addition or update)
  • Documentation

Issues Fixed or Closed by This PR

  • Inconsistent USB driver for ChibiOS based keyboards
  • Edge cases for suspend after resume bugs with W11/W10
  • Fix suspend and resume handling of QMK on ChibiOS powered MCUs which relied on hard reseting the USB stack as there where numerous unaddressed edge cases such as:
    • STM32F4s not waking the host
    • STM32F4s being unresponsive after wake-up
    • STM32F3s only sending the key press event but not sending the key up event after wake-up on W11 hosts. Leading to hanging keys.
  • Properly implement the following requests for all endpoints:
    • set/get idle
    • get report

Checklist

  • My code follows the code style of this project: C, Python
  • I have read the PR Checklist document and have made the appropriate changes.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • I have tested the changes and verified that they work and don't break anything (as well as I can manage).

@github-actions github-actions bot added the core label Jul 31, 2023
@KarlK90 KarlK90 force-pushed the feature/buffered-usb-driver branch 6 times, most recently from d546925 to 39feccb Compare August 1, 2023 11:26
@KarlK90 KarlK90 force-pushed the feature/buffered-usb-driver branch from 39feccb to e5881c2 Compare August 1, 2023 13:11
@KarlK90 KarlK90 marked this pull request as ready for review August 1, 2023 13:38
@KarlK90 KarlK90 requested a review from a team August 1, 2023 13:53
@dexter93
Copy link
Contributor

dexter93 commented Aug 2, 2023

Does this supersede #21537?

@KarlK90
Copy link
Member Author

KarlK90 commented Aug 2, 2023

Does this supersede #21537?

This supersedes 6a88a07da6ca15539a88f3181d6ee9de1b51b1b4 of #21537. The other fixes still apply.

Copy link
Member

@drashna drashna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside from already dealt with mouse endpoint issues, works well in testing.

Something about this seems to cause reports to get stuck. Can reproduce on both moonlander (stm32f303) and blackpill based board.

@drashna drashna requested review from a team and drashna August 2, 2023 18:00
@KarlK90
Copy link
Member Author

KarlK90 commented Aug 5, 2023

Something about this seems to cause reports to get stuck. Can reproduce on both moonlander (stm32f303) and blackpill based board.

Thanks for testing and reporting. Can you tell me which kind of reports, all of them or only keyboard reports? How does this manifest and is there a way to reproduce?

Maybe this can be resolved by increasing the default buffer size per endpoint to something like 12 reports by #define USB_DEFAULT_BUFFER_CAPACITY 12.

@KarlK90 KarlK90 force-pushed the feature/buffered-usb-driver branch 2 times, most recently from 8260d0c to 50cff75 Compare August 5, 2023 13:47
@KarlK90
Copy link
Member Author

KarlK90 commented Aug 5, 2023

@drashna I can confirm that mouse scrolling was broken with this implementation as usb reports that where below the size of an endpoint would be buffered and sent in an incomplete manner. I've committed a fix. If you have the time please try it again.

@drashna
Copy link
Member

drashna commented Aug 6, 2023

Yeah, this seems to work properly, and doesn't cause the issue with keys getting stuck.

@KarlK90
Copy link
Member Author

KarlK90 commented Aug 6, 2023

Thanks for the confirmation! I'll wait for another review before merging though as the changes are rather profound and maybe there is an edge case that neither of us triggered during testing.

@drashna
Copy link
Member

drashna commented Aug 6, 2023

unfortunately, the behavior seems to be back. I'm not sure what is causing it specifically. The define above seems to allow it to work a bit longer before going crazy.

And specifically, it looks like the last report sent gets stuck (eg, not cleared properly). A key or two will work fine before the issue triggers.

@KarlK90
Copy link
Member Author

KarlK90 commented Aug 6, 2023

You are using an STM32F4 board as the testing device right?

@drashna
Copy link
Member

drashna commented Aug 6, 2023

STM32F411 and STM32F303

@KarlK90
Copy link
Member Author

KarlK90 commented Aug 7, 2023

STM32F411 and STM32F303

I've been using my F303 split keyboard for multiple hours and couldn't trigger the described behavior (mashing buttons and all that) on a host with a Linux OS (Ubuntu 23.04). Could you describe me some specific actions (e.g. oneshot mods combined with button mashing) that triggers those hangs?

@drashna
Copy link
Member

drashna commented Aug 7, 2023

STM32F411 and STM32F303

I've been using my F303 split keyboard for multiple hours and couldn't trigger the described behavior (mashing buttons and all that) on a host with a Linux OS (Ubuntu 23.04). Could you describe me some specific actions (e.g. oneshot mods combined with button mashing) that triggers those hangs?

honestly, it's frustratingly inconsistent. right now, I can get it to trigger consistently on my moonlander (stm32f303), but no longer will occur.
I'm using arm-none-eabi-gcc 12.2.1 (from homebrew, on a mac), if it helps.

And I just tap a normal keycodes a couple of times and it triggers. Basic keycodes, too.

And can confirm that it happens on RP2040, (adafruit macropad).

This refactoring unifies the endpoint handling for all IN and OUT
endpoints (except for EP0 and IDLE handling) for ChibiOS based
keyboards:

- All endpoints are now backed by a configurable buffer that queues
  received reports, or reports that shall be sent.
- IN endpoints are served by the `send_report` function.
- OUT endpoints are served by the `receive_report` function.

Sending is now asynchronous and doesn't block the main loop when there
is space in the sending buffer. If the sending buffer is full, a short
timeout is applied to wait for new space in the buffer. If this fails a
"disconnected" endpoint is assumed and the next report will no longer
apply a timeout. If sending fails the report the current queue is reset
and the report enqued.

This queueing and timeout of reports makes the sending resilient to:

1. Host USB stack implementations that don't support advanced features
   like NKRO, extra keys or mouse keys.
2. Short hick-ups in the Host USB stack where an endpoint isn't polled
   for a short amount of time. e.g. W11/W10 has a resume from suspend
   quirk where the first report with a key down event is received but a
   fast subsequent key release event can only be received ~500ms-1s
   later. The queueing of these events should prevent the stuck key.

Signed-off-by: Stefan Kerkmann <karlk90@pm.me>
Previously this would be done in the SOF callback which is fired with a
frequency of 1KHz thus wasting many cycles for the context switching
overhead and usually doing no work as the endpoints have no data to send
most of the time. Instead we now flush the endpoints in dedicated
endpoint tasks, which is a cheap check NULL pointer check when no work
is to be done.

Signed-off-by: Stefan Kerkmann <karlk90@pm.me>
The USB idle rate handling for non-nkro keyboard endpoint was
implemented via a virtual timer that fired asynchronously at the host
requested idle rate. Sending was done in the callback of the virtual
timer in an ISR context. This is no longer an option as the asynchronous
endpoints take ownership of all used USB endpoints and thus assume that
all reports will be send via it's API.

This commit introduces a report storage that holds the last successfully
send USB report for all USB IN **HID** endpoints. This report storage is
modelled in a OOP fashion as the shared endpoint with its multiple
report types complicates the handling.

With this report storage we can not only fullfill the idle rate handling
but also the mandatory get report requests for all USB HID IN endpoints
in a generic fashion.

Signed-off-by: Stefan Kerkmann <karlk90@pm.me>
@KarlK90
Copy link
Member Author

KarlK90 commented Jan 14, 2024

I've rebased onto develop and removed the console OUT endpoint which was done #22304 as well. The ring buffering implementation using rbuf is obsolete as all endpoints have an queue attached by default.

Tested on RP2040 and STM32F411 again.

@tzarc tzarc added the develop-fast-track Intended to be merged early in the next develop cycle. label Feb 16, 2024
@tzarc tzarc merged commit 0e02b0c into qmk:develop Feb 28, 2024
3 of 4 checks passed

#define QMK_USB_REPORT_STORAGE(_get_report, _set_report, _reset_report, _get_idle, _set_idle, _idle_timer_elasped, _report_count, _reports...) \
&((usb_report_storage_t){ \
.reports = (_Alignas(4) usb_fs_report_t *[_report_count]){_reports}, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out that this kind of usage of _Alignas in compound literals is supported only in GCC >= 8.x. Because of that, GNU Guix is no longer suitable for building the QMK firmware (they have only gcc-arm-none-eabi-7-2018-q2-update).

This particular usage of _Alignas is also questionable — apparently it is applied to an array of pointers, which should be aligned appropriately anyway. Other places where the alignment is applied to byte arrays used for endpoint buffers look reasonable though.

@KarlK90 KarlK90 mentioned this pull request Mar 8, 2024
2 tasks
dunk2k pushed a commit to dunk2k/qmk_firmware that referenced this pull request Mar 11, 2024
mechlovin pushed a commit to mechlovin/qmk_firmware that referenced this pull request Apr 4, 2024
mechlovin pushed a commit to mechlovin/qmk_firmware that referenced this pull request Apr 4, 2024
mechlovin pushed a commit to mechlovin/qmk_firmware that referenced this pull request Apr 15, 2024
0x647262 pushed a commit to 0x647262/qmk_firmware that referenced this pull request Apr 26, 2024
lokher added a commit to Keychron/qmk_firmware that referenced this pull request May 23, 2024
PepperLola pushed a commit to PepperLola/qmk_firmware that referenced this pull request May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core develop-fast-track Intended to be merged early in the next develop cycle.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants