New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify and optimize the eg_value() extractor. #211
Conversation
Speedups vary depending on gcc version but never slower. i7 mobile, Windows7-64 gcc 4.7.4 Results for 20 tests for each version: Base Test Diff Mean 1145602 1145932 -330 StDev 26686 24296 4889 p-value: 0.527 gcc 4.8.2 Results for 20 tests for each version: Base Test Diff Mean 1118372 1136727 -18355 StDev 11992 14562 3831 p-value: 1 gcc 4.9.2 Results for 20 tests for each version: Base Test Diff Mean 1089072 1091734 -2662 StDev 11955 12205 2827 p-value: 0.827 No functional change.
I just realized the & 0xFFFFU is not needed either. I will post new timings a bit later. |
maybe it works with gcc, but what makes you think it is portable? |
Thx for the patch, but this really opens a cans of worms. If you look at On Wednesday, January 14, 2015, mstembera notifications@github.com wrote:
|
At one point we had different implementation for GCC and other compilers, but I wouldn't go back to that UNLESS the speed up is big, let's say >= 2%. |
The patch is not committable as it stands, so I'm closing the pull request. |
Joona, I'm sorry I wasn't able to respond right away but please at least read what I have to say below before dismissing this patch. I agree that this needs to be 100% portable if it is to be committed and NOT be a gcc specific implementation either. return Value((int16_t)s); which also "works" and is deceptively similar but didn't precisely because of portability. However this patch uint32_t a = uint32_t(s); // signed to unsigned (also used in current implementation) I believe each of the above steps is portable. Also there is no dependence on big or little endian as one of the past patches had. Could someone w/ the Intel compiler please give it a go? Also, if you can think of some other platform to test I'm all for it. In light of past portability issues extra caution and scrutiny IS warranted but I also hope that IF everything checks out we can commit because this code is on the hot path and the patch is both a simplification and a speedup. |
Thanks for the response, I've reopened the rpull equest for further discussion. |
Thanks. In the mean time I've asked on the forum if anyone could try the Intel compiler. |
We should definitely add a comment here, as people would be tempted to simplify the sequence of 3 casts and break things in the process :). If it compiles on Intel, looks good! Thanks for testing out so many platforms. |
If we decide to go that way, then for consistency we should consider: inline Value mg_value(Score s) { |
This seems to be functionally equivalent to the original code when compiled in MSVC (number of nodes searched in the bench is unchanged). EDIT: In fact, the bench remains unchanged even if I remove all of the casts in the new code except the int16_t one. |
@joona |
@joona |
Hmm... and stylistically it's better use "unsigned" than "uint32_t" as 32-bits means nothing here... inline Value mg_value(Score s) { inline Value eg_value(Score s) { The only question is that if this is standard compliant? |
Yes that makes sense. The original implementation also casts signed and unsigned of the same size back and forth so the two steps that are new is the 32bit unsigned to 16bit unsigned cast and the 16bit signed to 32bit signed sign extend. I can retest the platforms I already tested w/ the above tonight. |
I think that probably strictly according to the standard uint16_t -> int16_t conversion is implementation specific for "negative values". But then again, it's very likely that all compilers would behave sensibly here, so maybe we can just go ahead with this... |
See: http://en.cppreference.com/w/cpp/language/implicit_cast If the destination type is signed, the value does not change if the source integer can be represented in the destination type. Otherwise the result is implementation-defined. (Note that this is different from signed integer arithmetic overflow, which is undefined) |
To make sure I'm drawing the correct conclusion from your comment. We should be fine correct? inline Value mg_value(Score s) { inline Value eg_value(Score s) { I think this is uglier though. Speed looks similar. I will post timings. |
Timings of Joonas patch against master: gcc 474 gcc 482 gcc 492 Union patch against master: gcc 482 gcc 492 |
I strongly suggest o
|
@mcostalba: Just for clarity: "undefined" != "implementation defined". |
@mstembera: union solution has endianess issues. |
@joona |
More info... typedef union { which was used to split the 4 byte word into two 2 byte chunks which ended up being order dependent based on endianess. The new union is only used to reinterpret the same 2 bytes safely from unsigned to signed w/o using a cast. I don't have any big-endian hardware to test. |
@mstembera: OK. You are right. My bad, I misread the code. |
I think that we will have to go ahead with the uglier version. It's the only fully standard compliant version... Or can someone find a flaw? inline Value mg_value(Score s) { inline Value eg_value(Score s) { |
I'll do some benchmarks with different gcc versions as well... |
My speed up results: gcc-4.7 (1.5%) Results are statistically meaningful. I will commit the version below which I believe to be fully standard compliant tomorrow unless someone objects with valid arguments: inline Value mg_value(Score s) { inline Value eg_value(Score s) { |
Nice! Looks good. |
One possibility to mitigate the ugliness is to define the union once outside since we use it in both extractors like this. union us16 { uint16_t u; int16_t s; }; inline Value mg_value(Score s) { inline Value eg_value(Score s) { It's just a matter of taste though and I have no actual preference. Thanks for all your time Joona! |
I think us16 pollutes the global namespace, I'd suggest: union ValueExtractor { uint16_t u; int16_t s; }; Moreover, once we move to c++11 we could write (not tested): inline Value mg_value(Score s) { |
Alternatively also unnamed unions can work: inline Value mg_value(Score s) { inline Value eg_value(Score s) { And perhaps is the best solution in C++03 |
OK. Let's use unnamed unions. |
I like this also. Thanks Marco. |
Here is an interesting way to write the mg_value() extractor using an Endian dependent union but avoiding the issue by using another Endian dependent union to do the indexing. const union { uint32_t u32; struct { uint16_t high, low; }; } endianIdx = { 1 };
inline Value mg_value(Score s) {
union { int32_t s32; int16_t s16[2]; } mg = { s + 0x8000 };
return Value(mg.s16[endianIdx.high]);
} It's not any faster but may be useful in dealing with other Endian issues in the future. |
Tweak crazyhouse SEE
Speedups vary depending on gcc version but never slower.
i7 mobile, Windows7-64
gcc 4.7.4
Results for 20 tests for each version:
p-value: 0.527
gcc 4.8.2
Results for 20 tests for each version:
p-value: 1
gcc 4.9.2
Results for 20 tests for each version:
p-value: 0.827
No functional change.