Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option 14: unholy alloy of gold and silver #4

Open
corsix opened this issue Aug 1, 2022 · 1 comment
Open

Option 14: unholy alloy of gold and silver #4

corsix opened this issue Aug 1, 2022 · 1 comment

Comments

@corsix
Copy link

corsix commented Aug 1, 2022

With reference to https://www.corsix.org/content/fast-crc32c-4k, what I call crc32_4k is your option 12 ("8-byte Hardware-accelerated"), and what I call crc32_4k_three_way is your option 13 ("Golden"). The theoretical upper bound on option 13 is 64 bits/cycle, which your implementation gets close to, at 62 bits/cycle. What I realised is that:

  1. There's an inferior option, that I call crc32_4k_pclmulqdq, but you might call "Silver".
  2. Gold and silver use separate execution ports, and thus can be alloyed together, for a theoretical upper bound of 120.89 bits/cycle (this is 64+72 bytes every 9 cycles). I'm measuring 93 bits/cycle for this alloy, and I imagine that a well tuned implementation could get closer to 120.89.
@komrad36
Copy link
Owner

komrad36 commented Aug 3, 2022

That's awesome. Alloyed, ha!
Given that there are other bottlenecks than just execution ports, like decode or just total uops scheduled/retired, I'm surprised it's possible to do anything with the remaining bandwidth in the processor. But I'll have to check this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants