Skip to content

Build new AVX file with /ARCH:AVX#4442

Merged
tracysh merged 2 commits intomasterfrom
tracysh/fix_avx_build
Jul 7, 2020
Merged

Build new AVX file with /ARCH:AVX#4442
tracysh merged 2 commits intomasterfrom
tracysh/fix_avx_build

Conversation

@tracysh
Copy link
Contributor

@tracysh tracysh commented Jul 7, 2020

Description: In the Windows build, build min_max_elements.cpp with /ARCH:AVX to ensure that the vzeroupper instruction is emitted at the proper location. With the existing code, vzeroupper is happening before another pair of instructions that uses ymm registers. That leaves the processor in a state where SSE instructions run with poor performance. This is costing several milliseconds per inference for a quantized BERT.

@tracysh tracysh requested a review from yufenglee July 7, 2020 06:29
@tracysh tracysh requested a review from a team as a code owner July 7, 2020 06:29
yufenglee
yufenglee previously approved these changes Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants