Optimize `atomic_thread_fence` #740

AlexGuteniev · 2020-04-23T14:28:38Z

Fix #739 (without saving stack variable)

StephanTLavavej

I suggest minor comment edits and sorting the suppressed warnings; other than that, this looks good to me. Assuming that interlocked operations on separate guard variables provide a sufficient fence, there should be no ODR concerns with mixing-and-matching; older TUs will share a single separate guard variable.

stl/inc/atomic

Co-Authored-By: Stephan T. Lavavej <stl@nuwen.net>

AlexGuteniev · 2020-04-25T03:29:27Z

After @pcordes comment, I'm now thinking that _InterlockedXor(reinterpret_cast<volatile long*>(_AddressOfReturnAddress()), 0); may be a slightly better alternative, and in this case no suppression needed.

But I'll need to add intrinsic include (_AddressOfReturnAddress() is not <intrin0.h>), and you have already approved the current approach, so let's go with that.

StephanTLavavej · 2020-04-30T10:02:08Z

Thanks for this performance improvement - I'm really glad you noticed it! 😸

AlexGuteniev · 2020-04-30T10:07:30Z

Thanks to @pcordes for great explanations on this subject!

Fix microsoft#739 (without saving stack variable)

ad3adbd

AlexGuteniev requested a review from a team as a code owner April 23, 2020 14:28

AlexGuteniev added 3 commits April 23, 2020 19:20

Suppress warning, explain it

764c824

Quotes

d38cf4b

Avoid destination register by using xor

1ec772f

AlexGuteniev mentioned this pull request Apr 24, 2020

<atomic>: Optimize atomic_thread_fence #739

Closed

lock inc has smaller encoding

a09db16

BillyONeal approved these changes Apr 24, 2020

View reviewed changes

StephanTLavavej added the performance Must go faster label Apr 24, 2020

StephanTLavavej approved these changes Apr 24, 2020

View reviewed changes

stl/inc/atomic Outdated Show resolved Hide resolved

AlexGuteniev and others added 3 commits April 25, 2020 06:06

Update stl/inc/atomic

2eb0922

Co-Authored-By: Stephan T. Lavavej <stl@nuwen.net>

Shorten comment to avoid enforced bad newline

61cd41b

Merge branch 'optimize_atomic_thread_fence

44a7f31

StephanTLavavej self-assigned this Apr 29, 2020

StephanTLavavej approved these changes Apr 30, 2020

View reviewed changes

StephanTLavavej merged commit a5a9e49 into microsoft:master Apr 30, 2020

AlexGuteniev deleted the optimize_atomic_thread_fence branch April 30, 2020 10:44

AlexGuteniev mentioned this pull request Jun 9, 2020

Consider not using _mm_mfence even when it is available boostorg/atomic#36

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `atomic_thread_fence` #740

Optimize `atomic_thread_fence` #740

AlexGuteniev commented Apr 23, 2020

StephanTLavavej left a comment

AlexGuteniev commented Apr 25, 2020

StephanTLavavej commented Apr 30, 2020

AlexGuteniev commented Apr 30, 2020 •

edited

Loading

Optimize atomic_thread_fence #740

Optimize atomic_thread_fence #740

Conversation

AlexGuteniev commented Apr 23, 2020

StephanTLavavej left a comment

Choose a reason for hiding this comment

AlexGuteniev commented Apr 25, 2020

StephanTLavavej commented Apr 30, 2020

AlexGuteniev commented Apr 30, 2020 • edited Loading

Optimize `atomic_thread_fence` #740

Optimize `atomic_thread_fence` #740

AlexGuteniev commented Apr 30, 2020 •

edited

Loading