You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
All our SIMD is forced to be precompiled with specific target features, making all executables non-portable. While this might be fine and better for performance for advanced users, it will make distributing rq a nightmare.
Currently, there is a number of largely independent variables that influence which SIMD capabilities are possible for rsonpath.
Vector size: 128 or 256? (AVX2 vs SSSE3/SSE2)
Supports the vpshufb instruction? (AVX2/SSSE3 vs SSE2)
Supports plclmulqdq? (independent target feature)
Supports fast popcnt? (independent target feature)
Native word size: 32 or 64?
Note: It's unlikely that there exists an architecture that would support AVX2 but not popcnt, but for purposes of Rust's compilation we'd need to have that as a separate target_feature guard anyway, or else we'd lose that optimisation (and it's a crucial one for the depth classifier).
Managing a matrix of all possible combinations of those is unfeasible.
Describe the solution you'd like
We need a resolver that will figure out which SIMD-specialised types to use based on the CPU it's running on. This resolution should happen once when the engine is built. It's important to not try something crazy like checking the target feature before every call to a classify_block function, as that would have enormous overhead.
There should be a way to override this runtime check and specify what combination of types to use, for testing purposes.
Testing the entire matrix against all possible targets is probably unfeasible, but we should aim for sensible coverage in the CI pipeline.
This will also allow us to support SSE2 targets. Currently we require SSSE3, because that's the one that has vpshufb for our structural classifier. Having a more flexible SIMD resolution approach would allow SSE2 to use all the fancy classifiers for quotes and depth, and then a slower fallback for structural.
Additional context
Some guidance on target features can be found in the packed_simd book
- SIMD capabilities are now discovered at runtime,
allowing us to distribute one binary per target.
- Requirements for SIMD are now more granular,
allowing weaker CPUs to still get some of the acceleration:
- Base SIMD is either SSE2, SSSE3, or AVX2.
- Structural classification works on SSSE3 and above.
- Quote classification works if `pclmulqdq` is available.
- Depth classification works if `popcnt` is available.
- To counteract the increased binary size debug info is no longer
included in distributed binaries.
- Codegen for distributed binaries is improved with fat LTO and setting
codegen units to 1.
- SIMD capabilities are listed with `rq --version`.
#231
Is your feature request related to a problem? Please describe.
All our SIMD is forced to be precompiled with specific target features, making all executables non-portable. While this might be fine and better for performance for advanced users, it will make distributing
rq
a nightmare.Currently, there is a number of largely independent variables that influence which SIMD capabilities are possible for
rsonpath
.vpshufb
instruction? (AVX2/SSSE3 vs SSE2)plclmulqdq
? (independent target feature)popcnt
? (independent target feature)Note: It's unlikely that there exists an architecture that would support AVX2 but not
popcnt
, but for purposes of Rust's compilation we'd need to have that as a separatetarget_feature
guard anyway, or else we'd lose that optimisation (and it's a crucial one for the depth classifier).Managing a matrix of all possible combinations of those is unfeasible.
Describe the solution you'd like
We need a resolver that will figure out which SIMD-specialised types to use based on the CPU it's running on. This resolution should happen once when the engine is built. It's important to not try something crazy like checking the target feature before every call to a
classify_block
function, as that would have enormous overhead.There should be a way to override this runtime check and specify what combination of types to use, for testing purposes.
Testing the entire matrix against all possible targets is probably unfeasible, but we should aim for sensible coverage in the CI pipeline.
This will also allow us to support SSE2 targets. Currently we require SSSE3, because that's the one that has
vpshufb
for our structural classifier. Having a more flexible SIMD resolution approach would allow SSE2 to use all the fancy classifiers for quotes and depth, and then a slower fallback for structural.Additional context
Some guidance on target features can be found in the packed_simd book
All the available features can be found in the Rust reference.
The text was updated successfully, but these errors were encountered: