Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Portable SIMD support #231

Closed
V0ldek opened this issue Aug 30, 2023 · 1 comment · Fixed by #260
Closed

Portable SIMD support #231

V0ldek opened this issue Aug 30, 2023 · 1 comment · Fixed by #260
Assignees
Labels
area: library Improvements to the library API quality type: feature New feature or request
Milestone

Comments

@V0ldek
Copy link
Member

V0ldek commented Aug 30, 2023

Is your feature request related to a problem? Please describe.
All our SIMD is forced to be precompiled with specific target features, making all executables non-portable. While this might be fine and better for performance for advanced users, it will make distributing rq a nightmare.

Currently, there is a number of largely independent variables that influence which SIMD capabilities are possible for rsonpath.

  • Vector size: 128 or 256? (AVX2 vs SSSE3/SSE2)
  • Supports the vpshufb instruction? (AVX2/SSSE3 vs SSE2)
  • Supports plclmulqdq? (independent target feature)
  • Supports fast popcnt? (independent target feature)
  • Native word size: 32 or 64?

Note: It's unlikely that there exists an architecture that would support AVX2 but not popcnt, but for purposes of Rust's compilation we'd need to have that as a separate target_feature guard anyway, or else we'd lose that optimisation (and it's a crucial one for the depth classifier).

Managing a matrix of all possible combinations of those is unfeasible.

Describe the solution you'd like
We need a resolver that will figure out which SIMD-specialised types to use based on the CPU it's running on. This resolution should happen once when the engine is built. It's important to not try something crazy like checking the target feature before every call to a classify_block function, as that would have enormous overhead.

There should be a way to override this runtime check and specify what combination of types to use, for testing purposes.
Testing the entire matrix against all possible targets is probably unfeasible, but we should aim for sensible coverage in the CI pipeline.

This will also allow us to support SSE2 targets. Currently we require SSSE3, because that's the one that has vpshufb for our structural classifier. Having a more flexible SIMD resolution approach would allow SSE2 to use all the fancy classifiers for quotes and depth, and then a slower fallback for structural.

Additional context
Some guidance on target features can be found in the packed_simd book

All the available features can be found in the Rust reference.

@V0ldek V0ldek added the type: feature New feature or request label Aug 30, 2023
@github-actions github-actions bot added the acceptance: triage Waiting for owner's input label Aug 30, 2023
@github-actions
Copy link

Tagging @V0ldek for notifications

@V0ldek V0ldek added this to the v1.0.0 milestone Aug 30, 2023
@github-actions github-actions bot added acceptance: go ahead Reviewed, implementation can start and removed acceptance: triage Waiting for owner's input labels Aug 30, 2023
@V0ldek V0ldek added mod: engine area: library Improvements to the library API quality area: app Improvements in overall CLI app usability and removed area: app Improvements in overall CLI app usability labels Aug 30, 2023
@V0ldek V0ldek self-assigned this Sep 6, 2023
@V0ldek V0ldek pinned this issue Sep 7, 2023
@V0ldek V0ldek mentioned this issue Sep 10, 2023
3 tasks
V0ldek added a commit that referenced this issue Sep 10, 2023
- SIMD capabilities are now discovered at runtime,
allowing us to distribute one binary per target.
- Requirements for SIMD are now more granular,
allowing weaker CPUs to still get some of the acceleration:
  - Base SIMD is either SSE2, SSSE3, or AVX2.
  - Structural classification works on SSSE3 and above.
  - Quote classification works if `pclmulqdq` is available.
  - Depth classification works if `popcnt` is available.
- To counteract the increased binary size debug info is no longer
included in distributed binaries.
- Codegen for distributed binaries is improved with fat LTO and setting
codegen units to 1.
- SIMD capabilities are listed with `rq --version`.

#231
@github-actions github-actions bot removed the acceptance: go ahead Reviewed, implementation can start label Sep 10, 2023
@V0ldek V0ldek unpinned this issue Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: library Improvements to the library API quality type: feature New feature or request
Projects
Status: Released
Development

Successfully merging a pull request may close this issue.

1 participant