Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for slow codegen on x86 atomic load #2110

Merged
merged 1 commit into from
Aug 17, 2021

Conversation

AlexGuteniev
Copy link
Contributor

Compiler emits jumptable or jcc sequence that prevents inlining
of atomic load; separation of order check and barrier condition helps

Compiler emits jumptable or jcc sequence that prevents inlining
of atomic load; separation of order check and barrier condition helps
@AlexGuteniev
Copy link
Contributor Author

Not sure if it should be applied, or the compiler should be fixed instead.

@AlexGuteniev AlexGuteniev changed the title Workaround for bad codegen, ox x86 atomic load Workaround for bad codegen on x86 atomic load Aug 10, 2021
@RSilicon
Copy link
Contributor

Send feedback to the Visual Studio compiler team via visual studio feedback

@AlexGuteniev
Copy link
Contributor Author

AlexGuteniev commented Aug 10, 2021

The feedback was already sent by @Chronial , see DevCom-1491677

I'm not really sure if there should be workaround in code. There are at least three optimizer issues:

  • Compiler barrier emits npad 1 (nop instruction) instead of being truly no-op
  • switch does not collapse to if, even though there are only two possibilities
  • Decision whether to inline a function is based on full function size, not the size when constant propagation happens

Fixing just one of them would help. The workaround is a particular solutiin for wider problems.

@AlexGuteniev
Copy link
Contributor Author

Proof that it works: https://godbolt.org/z/5e7re6bGY

@StephanTLavavej StephanTLavavej added the performance Must go faster label Aug 10, 2021
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! The behavior is equivalent, and the code is simpler, so this is a good perma-workaround (no TRANSITION comment).

Copy link
Member

@barcharcraz barcharcraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the "workaround" code better than the old code.

@StephanTLavavej
Copy link
Member

I'm mirroring this to an MSVC-internal PR. Please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej changed the title Workaround for bad codegen on x86 atomic load Workaround for slow codegen on x86 atomic load Aug 14, 2021
@StephanTLavavej
Copy link
Member

Changed the title to say "slow codegen", as the compiler back-end team conventionally uses "bad codegen" to mean incorrect codegen.

@StephanTLavavej StephanTLavavej merged commit 7feac35 into microsoft:main Aug 17, 2021
@StephanTLavavej
Copy link
Member

Thanks... for... improving... this... slow... codegen... ! 🐌 😹 🐇

@AlexGuteniev AlexGuteniev deleted the atomic_load_fix branch August 17, 2021 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants