Skip to content
This repository has been archived by the owner. It is now read-only.

std::minmax_element is 3 times slower than hand written loop #40533

Closed
apolukhin mannequin opened this issue Jan 30, 2019 · 5 comments
Closed

std::minmax_element is 3 times slower than hand written loop #40533

apolukhin mannequin opened this issue Jan 30, 2019 · 5 comments

Comments

@apolukhin
Copy link
Mannequin

@apolukhin apolukhin mannequin commented Jan 30, 2019

Bugzilla Link 40533
Resolution INVALID
Resolved on Feb 07, 2019 10:10
Version unspecified
OS Linux
CC @apolukhin,@mclow

Extended Description

Moreover, std::minmax_element is slower than calling std::min_element+std::max_element if there's a lot of data and it does not fit into the CPU cache: http://quick-bench.com/tlgxCx9CUMZgOfYbwhFaEI0WNOg

@apolukhin
Copy link
Mannequin Author

@apolukhin apolukhin mannequin commented Jan 30, 2019

assigned to @mclow

Loading

@mclow
Copy link

@mclow mclow commented Jan 30, 2019

When I visited that Quickbench link, I noticed that it was set to GCC and to libstdc++.

When I changed it to clang and libc++, the standard library implementation was faster than the hand-written version.

Did you mean to report this against libstdc++?

Loading

@apolukhin
Copy link
Mannequin Author

@apolukhin apolukhin mannequin commented Jan 30, 2019

Sorry, wrong link. Here's the right one http://quick-bench.com/RQrBIzB8sBYS932z90b7DTEyEks

There's a good comment from Marc Glisse with description of close behaviour in GCC https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89120#c1

However, it looks like there's still a room for improvement. If std::minmax_element is called with standard comparators and for fundamental type, then using a simple loop (with more comparisons) will improve performance (for CPUs that have branch predictors and non costly comparison instructions for builtins).

Although I'm not sure that this could/should be solved on the library level - too many CPU specific knowledge in libc++ does not seem right.

Loading

@mclow
Copy link

@mclow mclow commented Jan 30, 2019

Sorry, wrong link. Here's the right one
http://quick-bench.com/RQrBIzB8sBYS932z90b7DTEyEks

That's better; but you are testing different things.
Your "hand written code" returns (in pointers) two values.
The "standard one" returns a pair of iterators.

Also, despite being faster, you're doing more comparisons than the one in the standard:
You: 2N comparisons.
Std: 3N/2 comparisons.

There's a good comment from Marc Glisse with description of close behaviour
in GCC https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89120#c1

And now I see I've made the same points that he has.

Loading

@mclow
Copy link

@mclow mclow commented Feb 7, 2019

Closing as "Not a bug"

Loading

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant