Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce firmware USB transfer size from 16KB to 8KB #1203

Merged
merged 1 commit into from
Dec 1, 2022

Conversation

martinling
Copy link
Member

This PR reduces the size of firmware-side USB transfers from 16KB to 8KB: a quarter of the available buffer, rather than half.

This provides greatly improved throughput in practice:

  • With 16KB transfers, Windows and macOS would both typically fail to maintain sufficient USB throughput above 16-17Msps sample rates, with only Linux sustaining 20Msps. With 8KB transfers, all three operating systems now sustain 20Msps.

  • With 16KB transfers, Linux would sustain 20Msps TX but with occasional brief underruns. With 8KB transfers, there are none. Buffer stats printed from hackrf_transfer -B clearly show a higher and more consistent level of data in the buffer.

The smaller transfer size helps because it reduces sensitivity to latency. Previously, the device had to NAK the IN/OUT tokens from the host until there was a full 16KB of data or buffer space ready. Using the largest possible chunks is efficient, but means the device can miss chances to send some data when it could. If the host then polls late, it may be hard to recover.

It seems like the 16KB size interacted poorly with Windows and macOS, which would both give up polling just before the next chunk became ready, and then retry too late to recover before a shortfall occured. Linux probably performed better due to polling more aggressively.

8KB has been chosen as a sweet spot: going further down to 4KB transfers has been tested but collapses the RX performance.

Copy link
Member

@mossmann mossmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lovely! Thank you!

@mossmann mossmann merged commit e226a41 into greatscottgadgets:master Dec 1, 2022
@martinling
Copy link
Member Author

martinling commented Dec 10, 2023

This change turns out to have caused problems.

It improves USB throughput to the host, but can cause glitches in the sample stream, as now identified in #1363.

The 32KB sample buffer straddles a block boundary such that each 16KB half-buffer is in a separate SRAM block which has its own independent AHB bus connection. With 16KB transfers, the M0 and the USB peripheral were always accessing different blocks. This change to 8KB transfers allowed the USB peripheral to start reading from the first 8KB of a 16KB half-buffer while the M0 core was still writing to the second 8KB. The contention caused by having two bus masters accessing the same SRAM block added intermittent latency, causing the M0 to sometimes miss its SGPIO timing deadlines.

Reverting this PR will eliminate these glitches, but will also cause a significant regression to host throughput.

I think that with a little more work, it may be possible to get the best of both worlds. This could be done by modifying the buffer ordering in order to switch between SRAM blocks every 8KB. This is straightforward to implement from the M4 side, but timing will be tight on the M0 side where we currently have only 11 cycles spare in the RX loop.

I prototyped this approach previously in martinling@53707ce, as part of an effort to expand the buffer size to 64KB, and the cost was an additional 7 cycles. In theory we have sufficient timing margin for this. There may also be faster ways to implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants