Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build VPERM At Runtime (Again?) #11385

Merged
merged 7 commits into from Jan 18, 2022
Merged

Build VPERM At Runtime (Again?) #11385

merged 7 commits into from Jan 18, 2022

Conversation

Nekotekina
Copy link
Member

Some other refactoring.
Use efficient runtime dispatch for SSSE3 version of VPERM (built at runtime) and fallback. The main point is that the final function doesn't contain branches for runtime dispatch, it's only done at program startup. While it may seem silly, it completely bypasses compiler shenanigans regarding different target options (for example, if target differs for two functions, one cannot inline another in GCC/clang; MSVC has some problems as well).
ARM version of VPERM instruction is also written.

@Megamouse Megamouse added the CPU label Jan 16, 2022
@Nekotekina Nekotekina force-pushed the master2 branch 3 times, most recently from 867d001 to b8776b6 Compare January 17, 2022 16:18
Notably, runtime-built SSSE3 version of VPERM.
Some other instructions are refactored and vectorized.
Aarch64 impl of multiple instructions including VPERM.
Fix: nearbyint -> roundeven
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants