-
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD: Optimize the performance of einsum's submodule sum. #18012
Conversation
const @temptype@ a01 = @from@(*data) + @from@(*(data + 1)); | ||
const @temptype@ a23 = @from@(*(data + 2)) + @from@(*(data + 3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const @temptype@ a01 = @from@(*data) + @from@(*(data + 1)); | |
const @temptype@ a23 = @from@(*(data + 2)) + @from@(*(data + 3)); | |
const @temptype@ a01 = @from@(*data) + @from@(data[1]); | |
const @temptype@ a23 = @from@(data[2]) + @from@(data[3]); |
make it simpler?
accum += a01 + a23; | ||
} | ||
#endif // !NPY_DISABLE_OPTIMIZATION | ||
for (; count > 0; --count, data += 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (; count > 0; --count, data += 1) { | |
for (; count > 0; --count, ++data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done, Thank you!
Thanks @Qiyu8 |
Introduction
This is the sixth part of #17049 , The
sum
operation is extracted because three sub functions depends on it. The optimized code reduced the amount of code by 85%, the performance increased 45%~50% on X86 and about 14%~77% on ARM.Benchmark
Here is the ASV benchmark result.
SSE2 enabled
AV2 enabled
NEON enabled
System Info