Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use int16_t for histories #1113

Closed
wants to merge 1 commit into from
Closed

Conversation

vondele
Copy link
Member

@vondele vondele commented May 11, 2017

saves over 2Mb of memory, as in particular CounterMoveHistoryStats is a large object. Improved cache utilization yields a small speedup (0.7% locally).

The patch is non-functional as int16_t is sufficiently large to capture all values that can be reached with the current update scheme.
The accessible range of values is [-32 * D, 32 * D] as a closer analysis of the update formula reveals.

Two functionally equivalent, but less clean, versions of this patch have been tested for no-regression:

http://tests.stockfishchess.org/tests/view/591353890ebc59035df344e4
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 54476 W: 9949 L: 9886 D: 34641

http://tests.stockfishchess.org/tests/view/59140f6b0ebc59035df3451a
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 27019 W: 4905 L: 4794 D: 17320

No functional change.

@mstembera
Copy link
Contributor

mstembera commented May 11, 2017

I have verified bench stays the same 1779683533 up to depth 28. I also get close to a 1% speedup.

Results for 20 tests for each version:

            Base      Test      Diff      
    Mean    913511    921760    -8249     
    StDev   271808    274072    2847      

p-value: 0.998
speedup: 0.009

@MichaelB7
Copy link
Contributor

MichaelB7 commented May 11, 2017

Nice catch @vondele - 1% speed ups are pretty awesome - will test when I get home but pretty sure I will see it too.

Edit: 0.8% gain on macOS with PGO builds - consistent with various bench levels from depth 16 to depth 24

@lucasart
Copy link

lucasart commented May 12, 2017

@vondele: Regarding the speedup coming from improved cache effect, I think this will depend on the hash size. I would expect the speedup to be more mesurable with a minimal hash size. stockfish bench defaults to hash=16, but with hash=1, it should be more mesurable. And in real games with a large hash, the effect should be null (as you constantly have cache misses due to random lookup on a large hash table, which in turns pollutes the cache for other things). But what is nice here is that we reduce the memory footprint of stockfish. Let's not forget that stockfish has been successfully run on some exotic hardware with limited resources, such as MIPS routers.

@mcostalba
Copy link

mcostalba commented May 12, 2017

@vondele I really miss the logic of a patch that is NOT a simplification in any way (indeed it is sold as an optimization) that is tested with [-3, 1].

I think this should be at least [0, 4]

Moving from int to a specific int size should be backed with some sound reason, memory alone is not one because it is only an indirect (and very weak) hint to a possible speed-up.

This is a well known historical myth of pre-testing era developers that try to use the smaller integer that fits the bill thinking that it will make engine faster. This is wrong in many ways. It is another case of voodoo programming that in chess engine world still persist until today (possibly because chess engines programmers are hobbyists for most part).

See this for some longer explanation:

Typically, CPUs are fastest at operating on integers of their native word size

A CPU works more efficient when the data with equals to the native CPU register width

Most of the time the space cost is negligible and you shouldn't worry about it

@vondele
Copy link
Member Author

vondele commented May 12, 2017

@mcostalba in that test, should test and master use the same resources in terms of memory ?

The increased memory efficiency of the test branch could be used to increase the TT and have processes compete at equal process memory footprint.

Furthermore, the savings is 2Mb per thread, so the test would be best done in a multithreaded run. In that case (since L3 is typically shared between two or more cores) the speedup can be expected to be somewhat larger.

Obviously, the speedup is not due to doing arithmetics on smaller integer sizes, as the arithmetic is all done on default int.

Edit: at 32 threads (and without TT), the resident process size of master is 50% more than the test branch (210Mb, vs 144Mb)

@mcostalba
Copy link

@vondele the test should be a usual standard test: if less memory usage speeds-up the engine with your patch this will be shown, if less memory usage has no effect then also this will be shown.

We are not testing if memory is saved (we already know it) we are testing if this yields to some improvement or not.

You never set TT to max available RAM in real games, you always have some free RAM left, indeed usually hundreds of MB if not GB of still free RAM, so your argument is moot.

@vondele
Copy link
Member Author

vondele commented May 12, 2017

So, not too surprising for a memory reduction patch, failed yellow at [0,4]:

LLR: -2.94 (-2.94,2.94) [0.00,4.00]
Total: 62608 W: 11312 L: 11255 D: 40041

Anybody has an accurate estimate about what % speed-up is needed to pass [0,4] at STC ?

@VoyagerOne
Copy link
Contributor

@vondele
I think the rule of thumb is 1% speedup is equal to 1 elo.
So I would say you need a speedup > 2% to have a good chance of passing [0, 4].
................................

I also agree with @mcostalba that non-simplification patches should never be tested and accepted with [-3,1] bounds.

A patch may cure cancer....but won't and shouldn't be accepted if it doesn't prove an elo gain.

@anshulongithub
Copy link

If a patch is found to be ELO neutral but shows benefits that are evident in terms of speed or memory footprint or other parameters, then why shudn't it be accepted?

@VoyagerOne
Copy link
Contributor

@anshulongithub
Because of what you just said... the patch is ELO neutral.

If the patch is accepted...we can no longer tweak stat values past 16 bits. Right now the max stat value can barely fit into 16 bits. So tweaking the stats a bit higher will cause an overflow.

Now what if down the road we found a significant ELO gain by tweaking max stat value > 2^19?

Every time I would like to submit a patch with stats over 16 bit... I would have to convert all the int16 to regular int. This will demotivate me to do any patches for higher stat values.

I am not a fan of patches that makes it more difficult to tweak and test ideas....

@ddugovic
Copy link

@anshulongithub Any non-simplification patch is more complex. Guidelines exist to preserve order.

@Thanar2
Copy link

Thanar2 commented May 12, 2017

This is a non-functional speed optimization so according to the guidelines in the relevant section of https://github.com/glinscott/fishtest/wiki/Creating-my-first-test :

Benchmark it with psybench or FishBench, then send a pull request. The speed up is verified "to ensure:

  • The speedup needs to be statistically significant and not just random noise.
  • The speedup needs to be confirmed on different machines. Sometimes a speedup on one machine is a slowdown on another.
  • The speedup needs to be put in perspective with the code changes.
    If the code changes are invasive and/or ugly, only to achieve a small speedup, we will probably not accept the patch. This is subject to our appreciation. if If the above steps are not enough to clarify the opportunity to commit the patch, then the patch will go under a special fishtest SPRT[0, 5] test at STC."

There seems to be some evidence of a speed up. Is it confirmed on different machines? The patch doesn't seem invasive and/or ugly to me.

@Thanar2
Copy link

Thanar2 commented May 12, 2017

Regarding VoyagerOne's comment about patches that make it "more difficult to tweak and test ideas":

I've been working on sorting tweaks that depend on the stats fitting in an int16 (since it seemed to me that they could, which this patch demonstrates). So while this patch may make some tweaks more difficult, it makes other tweaks possible.

@mstembera
Copy link
Contributor

This patch is not a simplification but it is non functional and doesn't add complexity either. Reducing memory usage by 2MB per thread is a benefit by itself (probably even more so on platforms w/ many cores and small caches like 8 core cell phones.) The speedup is measurable w/ statistical significance and the yellow fishtest result is in line with it. IMO this patch has upsides w/o downsides and so is an improvement to the codebase and should be committed.

@VoyagerOne I'm not for making things more difficult to tweak but keeping extra baggage around w/ just the hope it may be useful someday is a bad policy IMO. Also IF in the future some possible tweak required going over 2^16 wouldn't it be better to just rescale everything down to keep it more efficient at 16 bits anyway?

@mcostalba
Copy link

I don't believe in measuring speed ups with benches: this is totally unreliable as long as the speed up is not so clear (but in the latter case it will pass easily [0, 4]) and is heavily machine/compiler dependent.

IMO [0, 4] is very sensible and can spot even 1-1,5% speeds-up. To verify this (that alone would be very interesting) maybe we can think to a patch that slowdowns the engine of a tunable value and verify until which slowdown SPRT[0, 4] is able to discriminate between the 2 version.

Regarding this patch I agree with Voyager: it is not a simplification, it is NOT proved to be a speed-up on average (indeed failed [0, 4] and I am quite confident that 1% speed-up are caught by fishtest) and makes the code worse than before, this is the main point for me.

@vondele
Copy link
Member Author

vondele commented May 13, 2017

I'll be reporting some benchmarking on 32 threads (quad socket) and 64 threads (xeon phi) later today.

I disagree with @mcostalba 's belief that fishtest would be more reliable than benchmarks in providing evidence for speedups in ~1% range.

@mcostalba
Copy link

@vondele this should be tested of course, but I am confident sprt[0, 4] is able to spot 1% speed-up and on average on a borader set of machines and OS, not only in the tester setup.

@vondele
Copy link
Member Author

vondele commented May 13, 2017

@mcostalba testing this is indeed a good idea. I'll work on getting a tuneable slowdown in the code, and do these tests in the framework.

@mcostalba
Copy link

mcostalba commented May 13, 2017

@vondele already done, see mcostalba/Stockfish@slowdown

and here is the test, just 5% as a first run to verify everything is ok.

@vondele
Copy link
Member Author

vondele commented May 13, 2017

don't think the test in fishtest is right. Obviously 5% slowdown will fail to pass [0,4]. you might want to swap master and test or adjust the bounds ?

@mcostalba
Copy link

@vondele this is just a first check to see if code has bugs, after this I will lower the slowdown until it is no more detectable by [0, 4]

@vondele
Copy link
Member Author

vondele commented May 13, 2017

to detect a slowdown you still need to swap master and test or use different bounds ? All (positive) values of slowdown should fail a [0,4] test.

@mcostalba
Copy link

mcostalba commented May 13, 2017

@vondele you are right. Test stopped and re-submitted with branches swapped.

@vondele
Copy link
Member Author

vondele commented May 13, 2017

@mcostalba I realised your branch is also a good way to answer the other question, namely the reliability of benchmarking. This is the estimated speedup based on '40 test + 40 master ./stockfish bench' invocations, i.e. my standard script, which takes 5min on 1 core of a low end home PC:

                        test#1  test#2
master      vs master :  0.163,  0.229
slowdown-1% vs master : -0.847, -1.043
slowdown-2% vs master : -2.076, -2.008
slowdown-5% vs master : -4.724, -4.693

so all within 0.3 error based on 40 runs only.

I have also added the 1% slowdown run to the fishtest framework.

@vondele
Copy link
Member Author

vondele commented May 13, 2017

Finally benchmark results for the quad socket and the xeon phi (both used 32 threads). To improve statistics (needed because of the non-determinism in the multithreaded runs), I ran a depth 20 bench, and repeated them a few hundred times. Results are consistent with other benchmarks posted.

bench 256 32 20 default depth
Xeon Phi    (500 + 500 runs) : 0.777 %
Quad socket (400 + 400 runs) : 0.561 %

@mcostalba
Copy link

mcostalba commented May 13, 2017

Ok it seems testing with [0, 4] at STC is a reliable way to measure 1% speed up and above:

http://tests.stockfishchess.org/tests/view/591701360ebc59035df345e0
http://tests.stockfishchess.org/tests/view/5917014a0ebc59035df345e2
http://tests.stockfishchess.org/tests/view/5916e0b00ebc59035df345cc

Maybe we can even go lower than 1% but IMO is not necessary: if a speed-up patch adds complexity and/or makes the code worse and more fragile (as is this case) than original, I think we need at least a 1% speed-up to be considered worth it.

If the patch is very invasive / ugly, then 1% may be even not enough, but this is not the case, this patch is not extremely ugly, just mildly.

MY SUMMARY:

This patch introduces a limitations on types from int to int16_t that has following drawbacks:

  • Limiting history values to 16 bit will cause an overflow if tweaking the stats a bit higher (and it is subtle enough!)

  • Introducing another numerical type is not considered a good programming practice and should be avoided if not really necessary: it is better to use always int whenever possible.

  • The patch IF it is a speed up on average machine, then it is a a lower than 1% speed up

P.S: As an added bonus we now know that SPRT[0, 4] at STC can be used to discriminate speed optimizations of at least 1% (and perhaps even lower).

@joergoster
Copy link
Contributor

@mcostalba The results of the tests with a 1% slowdown imply an elo gain of about 4 - 7. These very clear test results look somewhat suspicious, imho.

@vondele
Copy link
Member Author

vondele commented May 13, 2017

@joergoster also I was surprised how easily they passed, so far down to 0.3%. Full table:

5%
Total: 4897 W: 983 L: 809 D: 3105
elo =   12.350 +-    5.876 LOS:  100.0%

2%
Total: 14640 W: 2743 L: 2537 D: 9360
elo =    4.889 +-    3.375 LOS:   99.8%

1.5%
Total: 6865 W: 1333 L: 1153 D: 4379
elo =    9.112 +-    4.938 LOS:  100.0%

1.0%
Total: 7906 W: 1518 L: 1335 D: 5053
elo =    8.044 +-    4.593 LOS:  100.0%

Total: 9373 W: 1851 L: 1660 D: 5862
elo =    7.081 +-    4.298 LOS:   99.9%

Total: 21866 W: 4107 L: 3875 D: 13884
elo =    3.686 +-    2.778 LOS:   99.5%

0.7%
Total: 15692 W: 2872 L: 2664 D: 10156
elo =    4.606 +-    3.224 LOS:   99.7%

0.5%
Total: 20542 W: 3801 L: 3575 D: 13166
elo =    3.823 +-    2.843 LOS:   99.6%

0.3%
Total: 23353 W: 4420 L: 4182 D: 14751
elo =    3.541 +-    2.701 LOS:   99.5%

@lantonov
Copy link

SPRT is the most powerful possible test (Neyman-Pearson lemma) so detecting 4-7 elo difference is not that surprising, especially at STC.

@joergoster
Copy link
Contributor

@mcostalba I propose to repeat the test with 1% slowdown at LTC. UCI protocol and therefore cutechess-cli can't provide the accuracy needed at STC, because times are always sent in milliseconds. So even if SF spent only 500 microseconds more, cutechess will eventually send 1 millisecond less with the next 'go' command. And this sums up during the game possibly magnifying the intended slowdown.

@ppigazzini
Copy link
Contributor

ppigazzini commented May 26, 2017

Windows 10 make profile-build ARCH=x86-64-modern COMP=mingw
benching 7af755268d6589ce0fadc3afcddae0cef8c1d887 vs. ecd3218

gcc version 6.3.0 (Rev3, Built by MSYS2 project)

bash ./bench-parallel.sh ./stockfish-s-3.exe ./stockfish-master.exe 30
base =    2205113 +/- 7404
test =    2277399 +/- 7715
diff =      72286 +/- 1032
speedup = 0.032781

gcc version 7.1.0 (x86_64-posix-seh-rev0, Built by MinGW-W64 project)

bash ./bench-parallel.sh ./stockfish-s-3.exe ./stockfish-master.exe 30
base =    2228808 +/- 6291
test =    2291915 +/- 6490
diff =      63106 +/- 979
speedup = 0.028314

Windows 10 WSL make profile-build ARCH=x86-64-modern COMP=gcc

gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)

bash ./bench-parallel.sh ./stockfish-s-3.exe ./stockfish-master.exe 30
base =    2341469 +/- 9495
test =    2397598 +/- 9674
diff =      56128 +/- 860
speedup = 0.023972

@ppigazzini
Copy link
Contributor

ppigazzini commented May 26, 2017

testing as @joergoster

gcc version 6.3.0 (Rev3, Built by MSYS2 project)

$ ./stockfish-s-3-630.exe go wtime 800000 > /dev/null
Search time (ms): 29554
TotalDelay (ms): 1399 (4%)

TotalDelay/Search time = 0.047337

gcc version 7.1.0 (x86_64-posix-seh-rev0, Built by MinGW-W64 project)

$ ./stockfish-s-3-710.exe go wtime 800000 > /dev/null
Search time (ms): 29248
TotalDelay (ms): 1385 (4%)

TotalDelay/Search time = 0.047353

Windows 10 WSL
gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)

~# ./stockfish-s-3-484 go wtime 800000 > /dev/null
Search time (ms): 46638
TotalDelay (ms): 2214 (4%)

TotalDelay/Search time = 0.047472

@mcostalba
Copy link

Thanks for testing!

Ok first test passed SPRT[0, 4] as expected, now I have submitted this to validate slowdown code. Test should fail to verify slowdown code is reliable and does not introduces odd overheads like in previous versions.

@ppigazzini
Copy link
Contributor

ppigazzini commented May 26, 2017

@mcostalba no problems on Windows 10 benching 5ae31a854facbb1a414134065647e55a90732649 vs. ecd3218

gcc version 6.3.0 (Rev3, Built by MSYS2 project)

bash ./bench-parallel.sh ./stockfish-s-0.exe ./stockfish-master.exe 30
base =    2286583 +/- 6794
test =    2296705 +/- 6898
diff =      10121 +/- 916
speedup = 0.004427

@ppigazzini
Copy link
Contributor

ppigazzini commented May 27, 2017

@mcostalba i7 3770k Windows 10 benches for Check resolution with slowdown below 1% (0564ecd64912d23349af8fedef5b2623c21c0342 vs. ecd3218)

gcc version 6.3.0 (Rev3, Built by MSYS2 project)

bash ./bench-parallel.sh ./stockfish-s10-630.exe ./stockfish-master.exe 30
base =    2240923 +/- 7871
test =    2264533 +/- 8023
diff =      23610 +/- 825
speedup = 0.010536
$ ./stockfish-s10-630.exe go wtime 800000 > /dev/null
Search time (ms): 116112
TotalDelay (ms): 1161 (0%)
TotalDelay / Search time = 0.009999

gcc version 7.1.0 (x86_64-posix-seh-rev0, Built by MinGW-W64 project)

bash ./bench-parallel.sh ./stockfish-s10-710.exe ./stockfish-master.exe 30
base =    2232218 +/- 8827
test =    2258333 +/- 9211
diff =      26115 +/- 1212
speedup = 0.011699
$ ./stockfish-s10-710.exe go wtime 800000 > /dev/null
Search time (ms): 115507
TotalDelay (ms): 1155 (0%)
TotalDelay / Search time = 0.009999

WSL - gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)

bash ./bench-parallel.sh ./stockfish-s10-484 ./stockfish-master 30
base =    2367905 +/- 5977
test =    2397156 +/- 6266
diff =      29251 +/- 1123
speedup = 0.012353
:~# ./stockfish-s10-484 go wtime 800000 > /dev/null
Search time (ms): 156662
TotalDelay (ms): 1566 (0%)
TotalDelay / Search time = 0.009996

@mcostalba
Copy link

...and the test passed! :-)

The (improved) slowdown branch above which test is based should be less than 1% of slowdown against master.

Please confirm this is less than 1% also for you and then we can finally prove that SPRT[0, 4] is able to detect below 1% speed-ups!!!

@ppigazzini
Copy link
Contributor

@mcostalba Windows 10 benches in previous post, speed-ups: 1.05% - 1.17% - 1.24%

@joergoster
Copy link
Contributor

This is what I get.

./stockfish go wtime 80000 > /dev/null
Search time (ms): 14824
TotalDelay (ms): 148 (0%)

And the bench comparison:

Result of  25 runs
==================
base =    1652613  +/- 58502
test =    1634034  +/- 57716
diff =     -18579  +/- 1350

speedup        = -0.0112
P(speedup > 0) =  0.0000

CPU: x86_64

@mcostalba
Copy link

@ppigazzini @joergoster Thanks for testing and confirming code works as expected!

So it is confirmed slowdown in long searches is about 1%, indeed in real games it is even lower. For instance with just one second remaining on the clock (a typical STC end-game), the slowdown disappears:

$ ./stockfish go wtime 1000 > /dev/null
Search time (ms): 47
TotalDelay (ms): 0 (0%)

So that at STC the slowdown is mainly at the opening / midgame, but this is already enough to measure a sensible difference.

I am surprised of how well SPRT[0, 4] at STC is able to resolve even very small differences in speed in a reliable way. I guess the key ingredient here is STC that yields to games played at relatively shallow search depth, where even a bit of depth more has an impact on the overall engine strength.

Our fishtest framework is really amazing :-)

@ppigazzini
Copy link
Contributor

ppigazzini commented May 27, 2017

@mcostalba what about testing 0.5% slowdown?

Anyway, check here how well works fishtest despite relaxed bounds, not many CPU contributors and (I suppose) some CPUs that should be considered too noisy by fishcooking people :) Obviously easier tasks.

@mcostalba
Copy link

@ppigazzini I think we are already at the limit of resolution of slowdown code (maybe not of fishtest). For instance average slowdown for the game is already below 1% because, as I said, when 1 second remains on the clock there is no slowdown. So moving to 0,5% it means that the very little slowdown will be effective only at the beginning and early midgame. Moreover bench results show that speed-up uncertainty is about 0,5% already and setting the slowdown to something lower than 1% we risk to measure just noise.

As is now is good to prove that SPRT[0, 4] at STC is able to discriminate optimizations of at least 1% or even a bit lower and this is what we wanted to prove.

saves over 2Mb of memory, as in particular CounterMoveHistoryStats is a large object. Improved cache utilization yields a small speedup

No functional change.
@vdbergh
Copy link
Contributor

vdbergh commented May 28, 2017

Nothing was "proved" here. At the very least the test would have to be repeated a number of times to really yield an idea experimentally how sensitive an SPRT(0,4) is to 1% speedups.

A much better approach is to first determine the ratio elo/speedup. Under the reasonable assumption that the relation elo<->speedup is linear for small elo differences this can be done for a much larger speedup (say 5% or even 10%) using a fixed length test. (Trying instead to measure directly the elo corresponding to a 1% speedup would require an enormous number of games to have a reliable result.)

Once the ratio elo/speedup is known the pass probabilty for a fixed speedup can just be computed mathematically.

@ppigazzini
Copy link
Contributor

ppigazzini commented May 28, 2017

@mcostalba bench process has some juice left :)
There is a 0.2% difference between different compilers, but difference between builds from the same compiler is lower. Check below the complete output of the bench or simply runs test-speedup.sh with bench-parallel.sh on your pc: linux, windows with msys2 (also with other mingw-w64)

bash ./bench-parallel.sh ./stockfish-s10-630.exe ./stockfish-master.exe 30
run	base	test	diff
  1  2280574  2303306  +22732
  2  2272563  2293509  +20946
  3  2252781  2274161  +21380
  4  2283794  2304127  +20333
  5  2280574  2305769  +25195
  6  2248866  2277363  +28497
  7  2272563  2304947  +32384
  8  2244964  2265402  +20438
  9  2286215  2306591  +20376
 10  2271765  2291884  +20119
 11  2282183  2306591  +24408
 12  2263817  2284600  +20783
 13  2274161  2300849  +26688
 14  2259863  2283794  +23931
 15  2275761  2295136  +19375
 16  2251997  2269375  +17378
 17  2275761  2298397  +22636
 18  2262233  2287832  +25599
 19  2245744  2274161  +28417
 20  2282988  2305769  +22781
 21  2237976  2262233  +24257
 22  2287832  2306591  +18759
 23  2259074  2285407  +26333
 24  2280574  2309885  +29311
 25  2271765  2297581  +25816
 26  2248866  2270968  +22102
 27  2270968  2295136  +24168
 28  2276561  2295136  +18575
 29  2277363  2300031  +22668
 30  2237976  2259863  +21887

base =    2267270 +/- 5358
test =    2290546 +/- 5389
diff =      23275 +/- 1261
speedup = 0.010266

@vondele
Copy link
Member Author

vondele commented May 28, 2017

So, final version of this patch rebased on master. Certainly won't pass [0,4], so if as a result of the discussion in this thread it is decided that [0,4] is the standard for future speedup patches, feel free to close.

@ppigazzini
Copy link
Contributor

ppigazzini commented May 28, 2017

@mcostalba fishtest is both a theoretical framework, and for this aspect the @vdbergh suggestion to measure/verify elo/speedup and pass probability is correct, and a practical framework, where the contribute of the workers was never tested/measured. There are some not answered questions (and some complaint threads) about: disabling HT, n° cores -1, using virtual cores, cloud instances, CPU load from other processes ... Instructions and FAQ suggest to use only strictly controlled workers, but the workers set today is a big fruit salade :)

IMO is important to check if the practical framework is able to work according to the theoretical framework and the speedup test at STC is an easy way to perform the check.

I'm confident that the big fruit salade is working fine :)

@zamar
Copy link

zamar commented Jun 6, 2017

Time to make a decision on this one:

  • The patch seem to provide a very small speed up.
  • However changing from int to int16 is somewhat ugly and makes more difficult to try further tweaks on this area.

I'll take 'auto&' changes from this patch, but the data type shall remain as int.

@vondele
Copy link
Member Author

vondele commented Jun 6, 2017

fine.

@zamar zamar closed this in 1781439 Jun 6, 2017
SapphireBrand pushed a commit to SapphireBrand/Stockfish that referenced this pull request Jun 7, 2017
SapphireBrand pushed a commit to SapphireBrand/Stockfish that referenced this pull request Jun 8, 2017
SapphireBrand pushed a commit to SapphireBrand/Stockfish that referenced this pull request Jun 9, 2017
vondele referenced this pull request in protonspring/Stockfish Aug 2, 2017
@vondele vondele mentioned this pull request Aug 18, 2017
atumanian pushed a commit to atumanian/Stockfish that referenced this pull request Jan 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet