Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use wmemset for fill optimization #3641

Closed
wants to merge 7 commits into from

Conversation

AlexGuteniev
Copy link
Contributor

Towards #3167

Will look into vectorization of 32 and 64 bit types if this passes.

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner April 8, 2023 13:04
@AlexGuteniev AlexGuteniev changed the title Memset Use wmemset for fill optimization Apr 8, 2023
@AlexGuteniev
Copy link
Contributor Author

⚠️ Possibly this should not be merged

See #3642

@StephanTLavavej StephanTLavavej added performance Must go faster decision needed We need to choose something before working on this labels Apr 8, 2023
@StephanTLavavej
Copy link
Member

If the compiler can perform this optimization, I'd prefer to avoid merging this, and potentially even remove the 1-byte memset optimization. (The metaprogramming has a minor throughput cost but a major code complexity cost - we've put a lot of energy into maintaining its correctness. I'm not too concerned about regressing non-optimized debug mode perf by returning to classic loops here.)

Marked as "decision needed" so we can talk about this on Wednesday.

@AlexGuteniev
Copy link
Contributor Author

AlexGuteniev commented Apr 8, 2023

and potentially even remove the 1-byte memset optimization

There's even further simplification option: remove multibyte memset optimization. The compiler is still able to insert memset if it is able to propagate compile time known value. However, if zero constant is not propagated, sure the compiler does not insert the branch.

On the other hand, the previously discussed direction for #3167 is to make c32memset and c64memset, which the compiler does not insert automatically, so they could have made some real gain. Well, unless the compiler will extend its automatic memset and rep stosw insertion further, I've created DevCom-10334038 for that.

@StephanTLavavej StephanTLavavej removed the decision needed We need to choose something before working on this label Apr 12, 2023
@StephanTLavavej
Copy link
Member

We talked about this at the weekly maintainer meeting and we concluded:

  • Because the compiler handles the 2-byte case, expanding the STL's metaprogramming (in an already complicated area) to call wmemset doesn't appear to be valuable enough to be worth the risk
  • The compiler should be enhanced to handle 4-byte and 8-byte cases (thanks for filing the DevCom issue! 😻)
  • In the long term, we might be interested in removing the 1-byte memset metaprogramming in the STL, but only if we can comprehensively confirm that the compiler emits equivalent (or better) codegen (with optimizations enabled). Importantly, we'd like to make sure that this happens even for user-defined contiguous iterators that don't participate in iterator unwrapping (as @strega-nil-ms observed).
    • The _Fill_memset_is_safe optimization is intertwined with the similar-but-different _Fill_zero_memset_is_safe optimization, and it occurs frequently enough in the STL that attempting to remove _Fill_memset_is_safe could be invasive and potentially destabilizing; although this could be possible, we're not super eager to make such a change any time soon.

@AlexGuteniev AlexGuteniev deleted the memset branch April 18, 2023 07:35
@AlexGuteniev
Copy link
Contributor Author

In the long term, we might be interested in removing the 1-byte memset metaprogramming in the STL, but only if we can comprehensively confirm that the compiler emits equivalent (or better) codegen (with optimizations enabled). Importantly, we'd like to make sure that this happens even for user-defined contiguous iterators that don't participate in iterator unwrapping (as @strega-nil-ms observed).

  • The _Fill_memset_is_safe optimization is intertwined with the similar-but-different _Fill_zero_memset_is_safe optimization, and it occurs frequently enough in the STL that attempting to remove _Fill_memset_is_safe could be invasive and potentially destabilizing; although this could be possible, we're not super eager to make such a change any time soon.

I think it would be best to eventually remove both of the optimizations, as _Fill_zero_memset_is_safe is even less useful currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants