Simplify and optimize the eg_value() extractor. #211

mstembera · 2015-01-14T02:06:16Z

Speedups vary depending on gcc version but never slower.

i7 mobile, Windows7-64

gcc 4.7.4
Results for 20 tests for each version:

        Base      Test      Diff
Mean    1145602   1145932   -330
StDev   26686     24296     4889

p-value: 0.527

gcc 4.8.2
Results for 20 tests for each version:

        Base      Test      Diff
Mean    1118372   1136727   -18355
StDev   11992     14562     3831

p-value: 1

gcc 4.9.2
Results for 20 tests for each version:

        Base      Test      Diff
Mean    1089072   1091734   -2662
StDev   11955     12205     2827

p-value: 0.827

No functional change.

Speedups vary depending on gcc version but never slower. i7 mobile, Windows7-64 gcc 4.7.4 Results for 20 tests for each version: Base Test Diff Mean 1145602 1145932 -330 StDev 26686 24296 4889 p-value: 0.527 gcc 4.8.2 Results for 20 tests for each version: Base Test Diff Mean 1118372 1136727 -18355 StDev 11992 14562 3831 p-value: 1 gcc 4.9.2 Results for 20 tests for each version: Base Test Diff Mean 1089072 1091734 -2662 StDev 11955 12205 2827 p-value: 0.827 No functional change.

mstembera · 2015-01-14T03:17:43Z

I just realized the & 0xFFFFU is not needed either. I will post new timings a bit later.

lucasart · 2015-01-14T05:58:59Z

maybe it works with gcc, but what makes you think it is portable?

mcostalba · 2015-01-14T05:59:04Z

Thx for the patch, but this really opens a cans of worms. If you look at
past history of that code (using blame with GitHub) you will see a lot of
compatibility issues with MSVC and iirc Intel compiler.

On Wednesday, January 14, 2015, mstembera notifications@github.com wrote:

I just realized the & 0xFFFFU is not needed either. I will post new
timings a bit later.

—
Reply to this email directly or view it on GitHub
#211 (comment)
.

zamar · 2015-01-14T07:52:12Z

At one point we had different implementation for GCC and other compilers, but I wouldn't go back to that UNLESS the speed up is big, let's say >= 2%.

zamar · 2015-01-14T20:24:34Z

The patch is not committable as it stands, so I'm closing the pull request.

mstembera · 2015-01-14T21:51:14Z

Joona, I'm sorry I wasn't able to respond right away but please at least read what I have to say below before dismissing this patch.
BTW, if anyone finds even one platform where this doesn't work I will immediately concede.

I agree that this needs to be 100% portable if it is to be committed and NOT be a gcc specific implementation either.
I wanted to submit an even simpler patch a long time ago

return Value((int16_t)s);

which also "works" and is deceptively similar but didn't precisely because of portability.
The problem with the above patch is that casting a signed value to a smaller signed value is implementation defined.

However this patch
return Value(int16_t(uint16_t(uint32_t(s))));
never does this. It can be equivalently rewritten as

uint32_t a = uint32_t(s); // signed to unsigned (also used in current implementation)
uint16_t b = uint16_t(a); // keep only the low 16 unsigned bits
int16_t c = int16_t(b); // reinterpret 16 unsigned as 16 signed
Value v = Value(c); // sign extend from 16 to 32 bits
return v;

I believe each of the above steps is portable. Also there is no dependence on big or little endian as one of the past patches had.
More to the point I tested:
AMD PhenomII w/ Ubuntu(14.04) w/ both gcc(4.8.2) and clang(3.4) both 64 and 32 bit compiles.
Intel i7 Windows7-64 gcc(4.7.4, 4.8.2, 4.9.2) and MSVC(2013) both 64 and 32 bit compiles.

Could someone w/ the Intel compiler please give it a go? Also, if you can think of some other platform to test I'm all for it. In light of past portability issues extra caution and scrutiny IS warranted but I also hope that IF everything checks out we can commit because this code is on the hot path and the patch is both a simplification and a speedup.

zamar · 2015-01-14T22:03:12Z

Thanks for the response, I've reopened the rpull equest for further discussion.

mstembera · 2015-01-14T22:16:43Z

Thanks. In the mean time I've asked on the forum if anyone could try the Intel compiler.
https://groups.google.com/forum/#!topic/fishcooking/VCC1TFca78s

glinscott · 2015-01-14T22:27:34Z

We should definitely add a comment here, as people would be tempted to simplify the sequence of 3 casts and break things in the process :).

If it compiles on Intel, looks good! Thanks for testing out so many platforms.

zamar · 2015-01-14T22:32:27Z

If we decide to go that way, then for consistency we should consider:

inline Value mg_value(Score s) {
return Value(int16_t(uint16_t(uint32_t(s + 0x8000) >> 16)));
}

DU-jdto · 2015-01-14T22:40:03Z

This seems to be functionally equivalent to the original code when compiled in MSVC (number of nodes searched in the bench is unchanged).

EDIT: In fact, the bench remains unchanged even if I remove all of the casts in the new code except the int16_t one.

mstembera · 2015-01-14T22:42:12Z

@joona
Wow that was fast :) I did test
return Value(int16_t(uint16_t(uint32_t(s + (uint32_t(s) & 0x8000U)) >> 16)));
(almost identical to yours) yesterday but didn't include it in the submission because it wasn't faster
(maybe a tiny bit slower but not statistically significant) but from a consistency point of view I completely agree.

mstembera · 2015-01-14T22:56:49Z

@joona
Sorry, your version is simpler and faster. I will post new timings w/ it shortly.

zamar · 2015-01-14T23:39:56Z

Hmm... and stylistically it's better use "unsigned" than "uint32_t" as 32-bits means nothing here...
So perhaps my current favourite is:

inline Value mg_value(Score s) {
return Value(int16_t(uint16_t(unsigned(s + 0x8000) >> 16)));
}

inline Value eg_value(Score s) {
return Value(int16_t(uint16_t(unsigned(s))));
}

The only question is that if this is standard compliant?

mstembera · 2015-01-14T23:58:55Z

Yes that makes sense. The original implementation also casts signed and unsigned of the same size back and forth so the two steps that are new is the 32bit unsigned to 16bit unsigned cast and the 16bit signed to 32bit signed sign extend. I can retest the platforms I already tested w/ the above tonight.

zamar · 2015-01-15T00:07:49Z

I think that probably strictly according to the standard uint16_t -> int16_t conversion is implementation specific for "negative values".

But then again, it's very likely that all compilers would behave sensibly here, so maybe we can just go ahead with this...

zamar · 2015-01-15T00:18:25Z

See: http://en.cppreference.com/w/cpp/language/implicit_cast

If the destination type is signed, the value does not change if the source integer can be represented in the destination type. Otherwise the result is implementation-defined. (Note that this is different from signed integer arithmetic overflow, which is undefined)

mstembera · 2015-01-15T05:28:43Z

To make sure I'm drawing the correct conclusion from your comment. We should be fine correct?
If it turns out to be a show stopper anyway we can still avoid it like this

inline Value mg_value(Score s) {
const union us16 { uint16_t u; int16_t s; } mg = { uint16_t(unsigned(s + 0x8000) >> 16) };
return Value(mg.s);
}

inline Value eg_value(Score s) {
const union us16 { uint16_t u; int16_t s; } eg = { uint16_t(unsigned(s)) };
return Value(eg.s);
}

I think this is uglier though. Speed looks similar. I will post timings.

mstembera · 2015-01-15T05:32:26Z

Timings of Joonas patch against master:

gcc 474
Results for 20 tests for each version:
Base Test Diff
Mean 1114071 1116287 -2216
StDev 25946 26214 3818
p-value: 0.719

gcc 482
Results for 20 tests for each version:
Base Test Diff
Mean 1103317 1123009 -19692
StDev 15890 15481 4077
p-value: 1

gcc 492
Results for 20 tests for each version:
Base Test Diff
Mean 1038425 1042409 -3984
StDev 32922 33162 4728
p-value: 0.8

Union patch against master:
gcc 474
Results for 20 tests for each version:
Base Test Diff
Mean 1179923 1181505 -1582
StDev 15138 15041 2196
p-value: 0.764

gcc 482
Results for 20 tests for each version:
Base Test Diff
Mean 1104881 1124275 -19394
StDev 20022 21807 3472
p-value: 1

gcc 492
Results for 20 tests for each version:
Base Test Diff
Mean 1128140 1130533 -2393
StDev 23758 24802 2220
p-value: 0.859

mcostalba · 2015-01-15T07:17:33Z

I strongly suggest
to avoid any known undefined behavior by standard. This is the
only sane approach.

o
On Thursday, January 15, 2015, Joona Kiiski notifications@github.com
wrote:

I think that probably strictly according to the standard uint16_t ->
int16_t conversion is implementation specific for "negative values".

But then again, it's very likely that all compilers would behave sensibly
here, so maybe we can just go ahead with this...

—
Reply to this email directly or view it on GitHub
#211 (comment)
.

zamar · 2015-01-15T08:35:15Z

@mcostalba: Just for clarity: "undefined" != "implementation defined".

zamar · 2015-01-15T08:35:55Z

@mstembera: union solution has endianess issues.

mstembera · 2015-01-15T08:42:32Z

@joona
This union should not have endianess issues because the union only has 16 bit members and not a mix of 16 and 32 bit like the past solution.

mstembera · 2015-01-15T09:05:56Z

More info...
The previous union looked like

typedef union {
uint32_t full;
struct { int16_t eg, mg; } half;
} ScoreView;

which was used to split the 4 byte word into two 2 byte chunks which ended up being order dependent based on endianess. The new union is only used to reinterpret the same 2 bytes safely from unsigned to signed w/o using a cast. I don't have any big-endian hardware to test.

zamar · 2015-01-15T13:20:24Z

@mstembera: OK. You are right. My bad, I misread the code.

zamar · 2015-01-15T19:40:44Z

I think that we will have to go ahead with the uglier version. It's the only fully standard compliant version... Or can someone find a flaw?

inline Value mg_value(Score s) {
const union us16 { uint16_t u; int16_t s; } mg = { uint16_t(unsigned(s + 0x8000) >> 16) };
return Value(mg.s);
}

inline Value eg_value(Score s) {
const union us16 { uint16_t u; int16_t s; } eg = { uint16_t(unsigned(s)) };
return Value(eg.s);
}

zamar · 2015-01-15T19:52:00Z

I'll do some benchmarks with different gcc versions as well...

zamar · 2015-01-15T20:20:15Z

My speed up results:

gcc-4.7 (1.5%)
gcc-4.8 (0.5%)
gcc-4.9 (1.0%)

Results are statistically meaningful.

I will commit the version below which I believe to be fully standard compliant tomorrow unless someone objects with valid arguments:

inline Value mg_value(Score s) {
const union us16 { uint16_t u; int16_t s; } mg = { uint16_t(unsigned(s + 0x8000) >> 16) };
return Value(mg.s);
}

inline Value eg_value(Score s) {
const union us16 { uint16_t u; int16_t s; } eg = { uint16_t(unsigned(s)) };
return Value(eg.s);
}

glinscott · 2015-01-15T21:13:06Z

Nice! Looks good.

mstembera · 2015-01-16T00:31:07Z

One possibility to mitigate the ugliness is to define the union once outside since we use it in both extractors like this.

union us16 { uint16_t u; int16_t s; };

inline Value mg_value(Score s) {
const us16 mg = { uint16_t(unsigned(s + 0x8000) >> 16) };
return Value(mg.s);
}

inline Value eg_value(Score s) {
const us16 eg = { uint16_t(unsigned(s)) };
return Value(eg.s);
}

It's just a matter of taste though and I have no actual preference. Thanks for all your time Joona!

…l-stockfish#211 official-stockfish#211

mcostalba · 2015-01-16T08:17:42Z

I think us16 pollutes the global namespace, I'd suggest:

union ValueExtractor { uint16_t u; int16_t s; };

Moreover, once we move to c++11 we could write (not tested):

inline Value mg_value(Score s) {
return ValueExtractor{ uint16_t(unsigned(s + 0x8000) >> 16) }.s;
}

mcostalba · 2015-01-16T08:32:35Z

Alternatively also unnamed unions can work:

inline Value mg_value(Score s) {
union { uint16_t u; int16_t s; } mg = { uint16_t(unsigned(s + 0x8000) >> 16) };
return Value(mg.s);
}

inline Value eg_value(Score s) {
union { uint16_t u; int16_t s; } eg = { uint16_t(unsigned(s)) };
return Value(eg.s);
}

And perhaps is the best solution in C++03

zamar · 2015-01-16T11:43:17Z

OK. Let's use unnamed unions.

mstembera · 2015-01-16T13:54:33Z

I like this also. Thanks Marco.

mstembera · 2016-05-01T21:42:46Z

Here is an interesting way to write the mg_value() extractor using an Endian dependent union but avoiding the issue by using another Endian dependent union to do the indexing.

const union { uint32_t u32; struct { uint16_t high, low; }; } endianIdx = { 1 };

inline Value mg_value(Score s) {
    union { int32_t s32; int16_t s16[2]; } mg = { s + 0x8000 };
    return Value(mg.s16[endianIdx.high]);
}

It's not any faster but may be useful in dealing with other Endian issues in the future.

Tweak crazyhouse SEE

mstembera added 2 commits January 13, 2015 18:01

The & 0xFFFFU is not even necessary.

766bdad

zamar closed this Jan 14, 2015

zamar reopened this Jan 14, 2015

mstembera added 2 commits January 14, 2015 18:09

Merge branch 'master' into EGextractor

96d1eb0

Better version by Joona. No functional change.

d262539

Updating to latest candidate. See discussion for pull reguest officia…

a904c3b

…l-stockfish#211 official-stockfish#211

Unnamed union version by Marco

301ee3b

zamar closed this in 58fdb84 Jan 16, 2015

syzygy1 mentioned this pull request Oct 30, 2016

Simplify and improve portability of Score extraction. #849

Closed

niklasf pushed a commit to niklasf/Stockfish that referenced this pull request Jan 19, 2017

Merge pull request official-stockfish#211 from ianfab/crazyhouse-see

7c78b9c

Tweak crazyhouse SEE

syzygy1 mentioned this pull request Aug 17, 2020

Reformat code in little-endian patch #3016

Merged

Simplify and optimize the eg_value() extractor. #211

Simplify and optimize the eg_value() extractor. #211

Conversation

mstembera commented Jan 14, 2015

mstembera commented Jan 14, 2015

lucasart commented Jan 14, 2015

mcostalba commented Jan 14, 2015

zamar commented Jan 14, 2015

zamar commented Jan 14, 2015

mstembera commented Jan 14, 2015

zamar commented Jan 14, 2015

mstembera commented Jan 14, 2015

glinscott commented Jan 14, 2015

zamar commented Jan 14, 2015

DU-jdto commented Jan 14, 2015

mstembera commented Jan 14, 2015

mstembera commented Jan 14, 2015

zamar commented Jan 14, 2015

mstembera commented Jan 14, 2015

zamar commented Jan 15, 2015

zamar commented Jan 15, 2015

mstembera commented Jan 15, 2015

mstembera commented Jan 15, 2015

mcostalba commented Jan 15, 2015

zamar commented Jan 15, 2015

zamar commented Jan 15, 2015

mstembera commented Jan 15, 2015

mstembera commented Jan 15, 2015

zamar commented Jan 15, 2015

zamar commented Jan 15, 2015

zamar commented Jan 15, 2015

zamar commented Jan 15, 2015

glinscott commented Jan 15, 2015

mstembera commented Jan 16, 2015

mcostalba commented Jan 16, 2015

mcostalba commented Jan 16, 2015

zamar commented Jan 16, 2015

mstembera commented Jan 16, 2015

mstembera commented May 1, 2016 • edited

mstembera commented May 1, 2016 •

edited