Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing fmh2 (follow up move2) and adding CMS pointer to the Search::Stack #636

Closed
wants to merge 2 commits into from

Conversation

loco-loco
Copy link
Contributor

@loco-loco loco-loco commented Apr 15, 2016

There are two main points for this patch:
a) Keeping a pointer to the CounterMoveStats (CMS) history on the Search::Stack:
The reason is that we loose information of the moved piece from previous move, and
trying to guess it from the current position cause problems (e.g. captures, castle, promotion,
multiple move by the same piece, etc)
b) Adding a variation of @locutus2 idea to incorporate a second level followup move (fmh2).

STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 6438 W: 1229 L: 1077 D: 4132

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 4000 W: 605 L: 473 D: 2922

bench: 7378965

@lantonov
Copy link

Congratulations and admiration!

@loco-loco loco-loco changed the title Implmenting fmh2 (follow up move2) and adding CMS pointer to the Search::Stack Implementing fmh2 (follow up move2) and adding CMS pointer to the Search::Stack Apr 15, 2016
@Stefano80
Copy link
Contributor

Congrats!
Am 15.04.2016 12:08 schrieb "Lyudmil Antonov" notifications@github.com:

Congratulations and admiration!


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#636 (comment)

@AlexandreMasta
Copy link

Woah!!!! Congrats!

@locutus2
Copy link
Member

locutus2 commented Apr 15, 2016

@loco-loco
Basing on the fail of my original fm2 and cmh2 and new cmh2 patch i was skeptical of the usefulness of fmh2 and tries a simplification by removing the fmh2 part.

But after the tests it seems there are synergetic effects of the fmh2 idea and your additional changes.

So i congrats again to this great patch and i' am happy that the my fmh2 idea has some use.

@MichaelB7
Copy link
Contributor

very nice +1

@mcostalba
Copy link

Congratulations!

It is a somewhat a complex patch, it will take some time for me to sort it out and understand properly...

...I will study it once it will be committed to master branch :-)

@mcostalba
Copy link

@loco-loco ok I have done some formatting and some small fixes, please add following non functinal change commit to your patch:

mcostalba/Stockfish@fmh2

Indeed there is more to do becuae your patch adds more than one idea, but further simplifications are deferred once your patch is in.

@zamar When you commit please edit commit msg and add bench number (7378965) that was missed.

And further simplify MovePicker API.

Revert 'cmh' to a reference because it is always set.

by: mcostalba

bench: 7378965 (No functional change)
@AlexandreMasta
Copy link

Just waiting for this patch to start a test. Hope soon. =)

@theo77186
Copy link

@AlexandreMasta Actually you can already compile by yourself, it is what I do to test unreleased Stockfish. But you can also wait for abrok.eu compiles.

@zamar zamar closed this in ec6aab0 Apr 17, 2016
@glinscott
Copy link
Contributor

Congratulations!

mcostalba pushed a commit that referenced this pull request Apr 18, 2016
STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 6438 W: 1229 L: 1077 D: 4132

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 4000 W: 605 L: 473 D: 2922

bench: 7378965

Resolves #636
IIvec referenced this pull request in IIvec/Stockfish Apr 18, 2016
STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 6438 W: 1229 L: 1077 D: 4132

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 4000 W: 605 L: 473 D: 2922

bench: 7378965

Resolves #636
ElbertoOne pushed a commit to ElbertoOne/Stockfish that referenced this pull request Jun 3, 2016
Louis Zulli reported that Stockfish suffers from very occasional hangs with his 20 cores machine.

Careful SMP debugging revealed that this was caused by "a ghost split point slave", where thread
was marked as a split point slave, but wasn't actually working on it.

The only logical explanation for this was double booking, where due to SMP race, the same thread
is booked for two different split points simultaneously.

Due to very intermittent nature of the problem, we can't say exactly how this happens.

The current handling of Thread specific variables is risky though. Volatile variables are in some
cases changed without spinlock being hold. In this case standard doesn't give us any kind of
guarantees about how the updated values are propagated to other threads.

We resolve the situation by enforcing very strict locking rules:
- Values for key thread variables (splitPointsSize, activeSplitPoint, searching)
can only be changed when the thread specific spinlock is held.
- Structural changes for splitPoints[] are only allowed when the thread specific spinlock is held.
- Thread booking decisions (per split point) can only be done when the thread specific spinlock is held.

With these changes hangs didn't occur anymore during 2 days torture testing on Zulli's machine.

We probably have a slight performance penalty in SMP mode due to more locking.

STC (7 threads):
ELO: -1.00 +-2.2 (95%) LOS: 18.4%
Total: 30000 W: 4538 L: 4624 D: 20838

However stability is worth more than 1-2 ELO points in this case.

No functional change

Resolves official-stockfish#422

Scales the endgame score by the number of pawns.

Credits goes also to Stephane Nicolet for his great idea of scaling by pawns.

STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 9994 W: 1929 L: 1760 D: 6305

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 11240 W: 1789 L: 1626 D: 7825

bench 7298564

Resolves official-stockfish#423

Remove unnecessary generation check in TT save

Checking for generation is unnecessary because if the key matches then the entry was probed and refreshed earlier.

STC 2MB
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 57391 W: 10671 L: 10613 D: 36107
http://tests.stockfishchess.org/tests/view/55ef59fa0ebc5976a2d6da5d

LTC 8MB
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 60732 W: 9260 L: 9199 D: 42273
http://tests.stockfishchess.org/tests/view/55ef8fe60ebc5976a2d6da6b

STC 16MB
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 23443 W: 4369 L: 4293 D: 14781
http://tests.stockfishchess.org/tests/view/55ef8fe60ebc5976a2d6da6b

No functional change

Resolves official-stockfish#427

Reduce writes in TT::probe().

Only refresh TT entry when it's really necessary.
This should give a small speed boost for some machines.
And it's a risk-free change.

No functional change.

Resolves official-stockfish#429

Refine ranks and increase resulting bonus.

STC:
LLR: 2.94 (-2.94,2.94) [0.00,4.00]
Total: 272379 W: 51773 L: 50658 D: 169948

LTC:
LLR: 3.06 (-2.94,2.94) [0.00,4.00]
Total: 41504 W: 6555 L: 6273 D: 28676

bench: 7658406

Resolves official-stockfish#430

Simplify outposts

This outpost simplification removes a number of unnecessary if statements. No functional change, but maybe I want to test it anyway.

Fix build errors

revert

Rework lock protecting

When changing 'search' and 'splitPointsSize' we have to
use thread locks, not split point ones, because can_join()
is called under the formers.

Verified succesfully with 24 hours toruture tests with 20
cores machine by Louis Zulli: it does not hangs.

Verifyed for no regressions with STC, 7 threads:
LLR: 2.94 (-2.94,2.94) [-3.00,1.00]
Total: 52804 W: 8159 L: 8087 D: 36558

No functional change.

Bonus for checking moves

STC:
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 14531 W: 2765 L: 2576 D: 9190

LTC:
LLR: 3.20 (-2.94,2.94) [0.00,5.00]
Total: 52518 W: 8107 L: 7782 D: 36629

Bench: 7556477

Resolves official-stockfish#435

File based passed pawn bonus

Add file based bonus for passed pawns. Values tuned by SPSA.

STC:
LLR: 3.33 (-2.94,2.94) [0.00,5.00]
Total: 36889 W: 6805 L: 6507 D: 23577

LTC:
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 32301 W: 5101 L: 4858 D: 22342

Bench: 8073614

Resolves official-stockfish#436

Run PVS-STUDIO analyzer

Fix issues after a run of PVS-STUDIO analyzer.
Mainly false positives but warnings are anyhow
useful to point out not very readable code.

Noteworthy is the memset() one, where PVS prefers ss-2
instead of stack. This is because memeset() could
be optimized away by the compiler when using 'stack',
due to stack being a local variable no more used after
memset. This should normally not happen, but when
it happens it leads to very sublte and difficult
to find bug, so better to be safe than sorry.

No functional change.

Fix a comment in TTEntry::save

Comment was slightly incorrect.

No functional change.

Add Trevis CI support

Add Travis CI support to GitHub repo.

After every push to master, Travis will build
the sources directly from GitHub repo according
to .travis.yml and verify everything is ok.

No functional change.

Remove queen threat evaluation

Threats by queen seem to be worthless.

STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 13627 W: 2607 L: 2473 D: 8547

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 19146 W: 2950 L: 2827 D: 13369

Bench: 8222484

Resolves official-stockfish#439

Tuning of assorted values

Passed STC

LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 45401 W: 8590 L: 8274 D: 28537

Passed LTC

LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 36089 W: 5589 L: 5331 D: 25169

Bench: 8397672

Resolves official-stockfish#445

Travis CI: add clang and osx

Extend builds to clang and osx platforms.

And check bench numbers.

No functional change.

Travis CI: add gcc 4.8 for osx

This setup was still missing.

Suggested by Stéphane Nicolet.

No functional change.

Retire rook contact checks

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 34114 W: 6363 L: 6265 D: 21486

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 61776 W: 9349 L: 9289 D: 43138

LTC (after rebasing):
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15261 W: 2343 L: 2214 D: 10704

Bench: 7523382

Resolves official-stockfish#442

Combination of two ideas:

Apply bonus for the prior CMH that caused a fail low.

Balance Stats: CMH and History bonuses are updated differently.
This eliminates the "fudge" factor weight when scoring moves. Also
eliminated discontinuity in the gravity history stat formula. (i.e. stat
scores will no longer inverse when depth exceeds 22)

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 21802 W: 4107 L: 3887 D: 13808

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 46036 W: 7046 L: 6756 D: 32234

Bench: 7677367

Asymmetry bonus for the attacking side

Use asymmetry in the position (king separation, pawn structure) to
compute an "initiative bonus" for the attacking side.

Passed STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 14563 W: 2826 L: 2636 D: 9101

And LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 14363 W: 2317 L: 2141 D: 9905

Bench: 8116244

Resolves official-stockfish#462

Lazy SMP

Start all threads searching on root position and
use only the shared TT table as synching scheme.

It seems this scheme scales better than YBWC for
high number of threads.

Verified for nor regression at STC 3 threads
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40232 W: 6908 L: 7130 D: 26194

Verified for nor regression at LTC 3 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 28186 W: 3908 L: 3798 D: 20480

Verified for nor regression at STC 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 3607 W: 674 L: 526 D: 2407

Verified for nor regression at LTC 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 4235 W: 671 L: 528 D: 3036

Tested with fixed games at LTC with 20 threads
ELO: 44.75 +-7.6 (95%) LOS: 100.0%
Total: 2069 W: 407 L: 142 D: 1520

Tested with fixed games at XLTC (120secs) with 20 threads
ELO: 28.01 +-6.7 (95%) LOS: 100.0%
Total: 2275 W: 349 L: 166 D: 1760

Original patch of mbootsector, with additional work
from Ivan Ivec (log formula), Joerg Oster (id loop
simplification) and Marco Costalba (assorted formatting
and rework).

Bench: 8116244

Almost passed tuning attempts

Collect and give a second try to some almost passed tuning attempts and
one-line tweaks from the last month.

Passed STC

LLR: 3.07 (-2.94,2.94) [0.00,4.00]
Total: 15124 W: 2974 L: 2756 D: 9394

And LTC

LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 21577 W: 3507 L: 3289 D: 14781

Bench: 8855226

Resolves official-stockfish#464

Update authors

Fishtest is a key factor of SF success.

Thanks to Fishtest we have not only greately
improved ELO but, even more important, we
have enabled a kind of joint development and
testing that it is the herat of on open
source project like SF.

Open sourcing is not just about open code, it is
about commuity development. In case of a chess engine
this has never been possible before due to missing
a strong and strict testing environment that allows
many people to contribute in a safe and coordinate way.

Fishtest is a new way of developing chess engines,
something that has never exsisted before.

No functional change.

History pruning

Prune moves with negative History
and CMH scores at low depth.

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 24182 W: 4672 L: 4439 D: 15071

LTC:
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 12579 W: 1959 L: 1792 D: 8828

bench: 8907701

Simplify threats

Using less parameters and code to compute Threats
Includes also a few spacing edits.

Run as a simplification.

Passed STC 10+0.1
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 18879 W: 3725 L: 3600 D: 11554

Passed LTC 60+0.4
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 74116 W: 11001 L: 10958 D: 52157

bench: 8004751

Cleanup history stats

And other assorted trivia.

No functional change.

KRPPKRP endgame: Simplify ugly switch statement

No functional change

Resolves official-stockfish#470

Use atomics instead of volatile

Rely on well defined behaviour for message passing, instead of volatile. Three
versions have been tested, to make sure this wouldn't cause a slowdown on any
platform.

v1: Sequentially consistent atomics

No mesurable regression, despite the extra memory barriers on x86. Even with 15
threads and extreme time pressure, both acting as a magnifying glass:

threads=15, tc=2+0.02
ELO: 2.59 +-3.4 (95%) LOS: 93.3%
Total: 18132 W: 4113 L: 3978 D: 10041

threads=7, tc=2+0.02
ELO: -1.64 +-3.6 (95%) LOS: 18.8%
Total: 16914 W: 4053 L: 4133 D: 8728

v2: Acquire/Release semantics

This version generates no extra barriers for x86 (on the hot path). As expected,
no regression either, under the same conditions:

threads=15, tc=2+0.02
ELO: 2.85 +-3.3 (95%) LOS: 95.4%
Total: 19661 W: 4640 L: 4479 D: 10542

threads=7, tc=2+0.02
ELO: 0.23 +-3.5 (95%) LOS: 55.1%
Total: 18108 W: 4326 L: 4314 D: 9468

As suggested by Joona, another test at LTC:

threads=15, tc=20+0.05
ELO: 0.64 +-2.6 (95%) LOS: 68.3%
Total: 20000 W: 3053 L: 3016 D: 13931

v3: Final version: SeqCst/Relaxed

threads=15, tc=10+0.1
ELO: 0.87 +-3.9 (95%) LOS: 67.1%
Total: 9541 W: 1478 L: 1454 D: 6609

Resolves official-stockfish#474

Some code and comment cleanup

- Remove all references to split points
- Some grammar and spelling fixes

No Functional change

Resolves official-stockfish#478

Reduce variation in rootDepth between different threads

Reduce the variation in Root Depth between different threads. This
prevents threads from searching at a depth much higher than Main Thread.

Performed well at STC 24 Threads:
ELO: 3.44 +-3.8 (95%) LOS: 96.1%
Total: 10000 W: 1627 L: 1528 D: 6845

And LTC 24 Threads
LLR: 1.43 (-2.94,2.94) [0.00,4.00]
Total: 3804 W: 500 L: 420 D: 2884
ELO : +7.31
p-value: 73.16%

Passed no regression at STC 3 Threads:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40457 W: 7148 L: 7060 D: 26249

And LTC 3 Threads:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 17704 W: 2489 L: 2364 D: 12851

Raising a pull request early as 24 Thread tests are very expensive and
this is clearly a positive gain at high thread counts and high time
controls. The change is a small parameter tweak with no additional
logic.

No functional change for single thread mode.

Resolves official-stockfish#481

New History Bonus Formula

bonus = d^2 + d - 1

Bench: 8639247

Resolves official-stockfish#484

Assorted trivia in search.cpp

The only interesting change is the moving of
stack[MAX_PLY+4] back to its original position
in id_loop (now renamed Thread::search).

No functional change.

Pick bestmove from the deepest thread.

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26930 W: 4441 L: 4214 D: 18275

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 7783 W: 1017 L: 876 D: 5890

No functional change in single thread mode

Resolves official-stockfish#485

Get rid of timer thread

Unfortunately std::condition_variable::wait_for()
is not accurate in general case and the timer thread
can wake up also after tens or even hundreds of
millisecs after time has elapsded. CPU load, process
priorities, number of concurrent threads, even from
other processes, will have effect upon it.

Even official documentation says: "This function may
block for longer than timeout_duration due to scheduling
or resource contention delays."

So retire timer and use a polling scheme based on a
local thread counter that counts search() calls and
a small trick to keep polling frequency constant,
independently from the number of threads.

Tested for no regression at very fast TC 2+0.05 th 7:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 32969 W: 6720 L: 6620 D: 19629

TC 2+0.05 th 1:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 7765 W: 1917 L: 1765 D: 4083

And at STC TC, both single thread
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15587 W: 3036 L: 2905 D: 9646

And with 7 threads
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 8149 W: 1367 L: 1227 D: 5555

bench: 8639247

Ensure that rootDepth < DEPTH_MAX

Indeed, if we use a depth >= DEPTH_MAX, we start having negative depth in the
TT (due to int8_t cast).

No functional change in single thread mode

Resolves official-stockfish#490

Avoid friend

operator<<(os, pos) does not need to access any private members of pos.

No functional change.

Resolves official-stockfish#492

Fix broken UCI 'wait for stop'

When we reach the maximum depth, we can finish the
search without a raise of Signals.stop. However, if
we are pondering or in an infinite search, the UCI
protocol states that we shouldn't print the best move
before the GUI sends a "stop" or "ponderhit" command.

It was broken by lazy smp. Fix it by moving the stopping
of the threads after waiting for GUI.

No functional change.

Retire ThreadBase

Now that we don't have anymore TimerThread, there is
no need of this long class hierarchy.

Also assorted reformatting while there.

To verify no regression, passed at STC with 7 threads:
LLR: 2.97 (-2.94,2.94) [-5.00,0.00]
Total: 30990 W: 4945 L: 4942 D: 21103

No functional change.

Bonus for reachable outpost

Give a bonus for outpost squares which in reach of a bishop or knight.

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 22725 W: 4570 L: 4339 D: 13816

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 15019 W: 2333 L: 2157 D: 10529

Bench: 8503181

Resolves official-stockfish#495

Do not conceal the invocation of the benchmark program

It is better to be able to see what arguments it is being called with.

No functional change

Resolves official-stockfish#497

History Pruning: Don't prune the main killer move.

Also increased pruned depth to 4.

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 23380 W: 4581 L: 4350 D: 14449

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 28934 W: 4329 L: 4105 D: 20500

Bench: 8369743

Resolves official-stockfish#498

Rewrite how threads are spawned

Instead of creating a running std::thread and
returning, wait in Thread c'tor that the native
thread of execution goes to sleep in idle_loop().

In this way we can simplify how search is started,
because when main thread is idle we are sure also
all other threads will be idle, in any case, even
at thread creation and startup.

After lazy smp went in, we can simpify and rewrite
a lot of logic that is now no more needed. This is
hopefully the final big cleanup.

Tested for no regression at 5+0.1 with 3 threads:
LLR: 2.95 (-2.94,2.94) [-5.00,0.00]
Total: 17411 W: 3248 L: 3198 D: 10965

No functional change.

Fix TT comment and static_assert()

Comment is based on a misunderstanding of what unaligned memory access is. Here
is an article that explains it very clearly:
https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt

No matter how we define TTEntry or TTCluster, there will never be any unaligned
memory access. This is because the complier knows the alignment rules, and does
the necessary adjustments to make sure unaligned memory access does not occur.

The issue being adressed here has nothing to do with unaligned memory access. It
is about cache performance. In order to achieve best cache performance:
- we prefetch the cacheline as soon as possible.
- we ensure that TT clusters do not spread across two cachelines. If they did,
  we would need to prefetch 2 cachelines, which could hurt cache performance.

Therefore the true conditions to achieve this are:
1/ start adress of TT is cache line aligned. void TranspositionTable::resize()
enforces this.
2/ TT cluster size should *divide* the cache line size. Currently, we pack 2
clusters per cache lines. It used to be 1 before "TT sardines". Does not matter
what the ratio is, all we want is to fit an integer number of clusters per cache
line.

No functional change.

Resolves official-stockfish#506

Clean up RootMove less operator

This is used by std::stable_sort() to sort moves from highest score to lowest score.

1) The comment is incorrect since highest to lowest means descending.
2) It's more natural to implement a less operator using another less operator rather than a greater operator.

No functional change.

Resolves official-stockfish#504

Allow cross compilation of Windows binaries on a Linux system

that are PGO, LTO, and statically linked.
Credit: pasquale....@gmail.com

No functional change

Resolves official-stockfish#505

Revert "Allow cross compilation of Windows binaries on a Linux system"

This reverts commit 388630a.

Confuses fishtest build system

Introduce new Threats weights = {350, 256}

Raise the midgame threats weight by 37%.

Passed STC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 8165 W: 1675 L: 1487 D: 5003

and LTC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 28181 W: 4141 L: 3912 D: 20128

Bench: 7824961

Resolves official-stockfish#512

Proper Makefile for cross compiling 64 or 32 bit PGO + LTO + static Windows binaries under Linux.

No functional change

Resolves official-stockfish#511

Simplify outpost code

Also inline defintions of SpaceMask and CenterBindMask.

Verified from assembly that compiler computes the values
at compile time, so it is also theoretical faster.

While there factor out scale factor evaluation.

No functional change.

New Tuned Weights

More accurate evaluation weights

Performed better at STC

LLR: 1.32 (-2.94,2.94) [0.00,4.00]
Total: 190043 W: 37433 L: 36675 D: 115935

Passed LTC

LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 30157 W: 4540 L: 4303 D: 21314

Bench: 9264977

Resolves official-stockfish#515

Fix MultiPv and Skill in SMP.

7 threads, 5+0.1:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 55460 W: 9665 L: 9601 D: 36194

No functional change in normal playing mode

Simplify time management and fix 'ponder on' bug

Simplify time management code by removing hard stops for unchanging first root moves.
Search is now stopped earlier at the end iteration if it did not have fail-lows at root.

This simplification also fixes pondering bug. Ponder flag was true by default
and cutechess-cli doesn't change it to false even though no pondering is possible.
Fix the issue by setting the default value of 'Ponder' flag to false.

10+0.1:
ELO: 3.51 +-3.0 (95%) LOS: 99.0%
Total: 20000 W: 3898 L: 3696 D: 12406

40+0.4:
ELO: 1.39 +-2.7 (95%) LOS: 84.7%
Total: 20000 W: 3104 L: 3024 D: 13872

60+0.06:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 37231 W: 5333 L: 5236 D: 26662

Stopped run at 100+1:
LLR: 1.09 (-2.94,2.94) [-3.00,1.00]
Total: 37253 W: 4862 L: 4856 D: 27535

Resolves official-stockfish#523
Fixes official-stockfish#510

Threats retuned

STC:

LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 45239 W: 8913 L: 8591 D: 27735

LTC:

LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 21046 W: 3200 L: 2989 D: 14857

Bench: 8012530

Resolves official-stockfish#526

Fix easy move bug in SMP mode

Fix a bug where we could stop the search after only 10% of time used due to a matching easy move but later switch to a different move that was never pre-screened as easy due to SMP thread select.

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 27227 W: 4910 L: 4800 D: 17517

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 40368 W: 5826 L: 5733 D: 28809

Resolves official-stockfish#521

Distinct iteration paths for Lazy SMP threads

STC 5+0.1, threads 7
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 6026 W: 1047 L: 901 D: 4078

LTC: 20+0.2, threads 7
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 19739 W: 2910 L: 2721 D: 14108

STC 5+0.1, threads 20
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 2493 W: 462 L: 331 D: 1700

LTC 30+0.3, threads 20
ELO: 8.86 +-3.7 (95%) LOS: 100.0%
Total: 8000 W: 1076 L: 872 D: 6052

Bench: 8012530

Resolves official-stockfish#525

Remove unused field SearchStack::ttMove

No functional change

Resolves official-stockfish#533

Remove killer move conditions from LMR

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 8459 W: 1619 L: 1477 D: 5363

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 32239 W: 4404 L: 4299 D: 23536

Bench: 7597031

Resolves official-stockfish#534

New mobility bonus

Tuned the global mobility factor for each piece, as well as some +- delta,

The master mobility factor was {266,334} and tuning gave
{267, 362} +S(-2,-2) for the Knight
{249, 328} +S( 0,-2) for the Bishop
{298, 353} +S(1,1) for the Rook
{265, 358} +S(2,-1) for the Queen

Passed STC
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 49402 W: 9367 L: 9037 D: 30998

and LTC
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 26831 W: 3871 L: 3658 D: 19302

Bench: 8355485

Resolves official-stockfish#536

Remove another unnecessary Search::Stack field

No functional change

Resolves official-stockfish#535

Fix compiling of 32 bit binary on 64-bit Windows

Two versions of mingw-w64 (targeting Win64 and Win32)
can be installed on Windows too.

No functional change

Resolves official-stockfish#532

Revert "Fix compiling of 32 bit binary on 64-bit Windows"

This reverts commit 1e8836d

Broken compile on mingw under Windows:

Config:
debug: 'yes'
optimize: 'yes'
arch: 'i386'
bits: '32'
prefetch: 'yes'
bsfq: 'no'
popcnt: 'no'
sse: 'yes'
pext: 'no'

Flags:
CXX: i686-w64-mingw32-c++
CXXFLAGS: -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11  -Wextra -Wshadow -g -O3 -msse
LDFLAGS:  -static

Testing config sanity. If this fails, try 'make help' ...

mingw32-make[1]: Leaving directory 'C:/stockfish/src'
c:/MinGw/bin/mingw32-make ARCH=x86-32 COMP=mingw all
mingw32-make[1]: Entering directory 'C:/stockfish/src'
sh: C:\Program: No such file or directory
i686-w64-mingw32-c++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11  -Wextra -Wshadow -g -O3 -msse   -c -o benchmark.o benchmark.cpp
<builtin>: recipe for target 'benchmark.o' failed
process_begin: CreateProcess(NULL, i686-w64-mingw32-c++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -Wextra -Wshadow -g -O3 -msse -c -o benchmark.o benchmark.cpp, ...) failed.
make (e=2): Impossibile trovare il file specificato.

mingw32-make[1]: *** [benchmark.o] Error 2
mingw32-make[1]: Leaving directory 'C:/stockfish/src'
makefile:401: recipe for target 'build' failed
mingw32-make: *** [build] Error 2

No functional change.

Move some globals into main thread scope

Make it explicit that those variables are not globals, but
are used only by main thread. I think it is a sensible
clarification because easy move is already tricky enough
and current patch makes the involved actors explicit.

No functional change.

Resolves official-stockfish#537

Stockfish 7 Beta 1

Bench: 8355485

No functional change

Fix assert with very high score position

In case of a very high material score, we can
overflow VALUE_INFINITE.

This patch fixes an assert with:

position fen 7k/QQQQR3/2B5/4KN1Q/3QQ3/8/8/4R3 b - - 0 1
go depth 1

No functional change.

Resolves official-stockfish#546

Correct Pawn Trace Score + Code Clean up

No functional change

Resolves official-stockfish#542

Stockfish 7 Beta 2

Bench: 8355485

No functional change

Update Copyright year

No functional change.

Resolves official-stockfish#554

Update AUTHORS and copyright notice

No functional change

Resolves official-stockfish#555

Stockfish 7

Bench: 8355485

No functional change

Restore development version

Adjust time used for move based on previous score

Use less time if evaluation is not worse than for previous move and even less time if in addition no fail low encountered for current iteration.

STC: 10+0.1
ELO: 5.37 +-2.9 (95%) LOS: 100.0%
Total: 20000 W: 3832 L: 3523 D: 12645

STC: 10+0.1
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 17527 W: 3334 L: 3132 D: 11061

LTC: 60+0.6
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 28233 W: 3939 L: 3725 D: 20569

LTC: 60+0.6
ELO: 2.43 +-1.4 (95%) LOS: 100.0%
Total: 60000 W: 8266 L: 7847 D: 43887

LTC: 60+0.06
LLR: 2.95 (-2.94,2.94) [-1.00,3.00]
Total: 38932 W: 5408 L: 5207 D: 28317

Resolves official-stockfish#547

Fine tuning of unsupported pawn penalty

Adjust the unsupported pawn penalty when the pawn is supporting 2 pawns
(for example g7 in f6-g7-h6)

Passed STC
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 23833 W: 4384 L: 4158 D: 15291

Passed LTC
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 42711 W: 5918 L: 5655 D: 31138

Bench: 8390233

Resolves official-stockfish#549

Retire CenterBind

And compensate in the PSQT.

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 27714 W: 5161 L: 5052 D: 17501

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 36354 W: 5008 L: 4909 D: 26437

Bench: 8603285

Resolves official-stockfish#556

Tune time management for LTC

60+0.6:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 102533 W: 14270 L: 13842 D: 74421

Resolves official-stockfish#558

Update comments in LMR step

No functional change

Resolves official-stockfish#564

Adjust reductions based on history and cmh tables

STC:
LLR: 4.06 (-2.94,2.94) [0.00,5.00]
Total: 149395 W: 28029 L: 27208 D: 94158

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 9628 W: 1368 L: 1217 D: 7043

bench: 8076724

Resolves official-stockfish#565

Assorted English grammar changes

No functional change

Resolves official-stockfish#567

Rewrite time formula

Time management is really too complex, our aim is
to simplify it, but for time being at least rewrite
in an understandable way.

No functional change.

Makefile: Allow specifying compiler executable

No functional change

Resolves official-stockfish#570

Remove redundant -std=c++0x flag

This flag is functionally identical to '-std=c++11' flag which
is part of standard flags.

No functional change

Resolves official-stockfish#571

Depth margin parameter-tweak in TT-save

Verified that is improvement with multiple threads:

LLR: 2.95 (-2.94,2.94) [0.00,4.00] sprt @ 30+0.3 th 3
Total: 14817 W: 2103 L: 1915 D: 10799

LLR: 2.96 (-2.94,2.94) [0.00,4.00] sprt @ 15+0.15 th 7
Total: 10264 W: 1498 L: 1321 D: 7445

Verified that is not a significant regression with a single thread:

LLR: 2.96 (-2.94,2.94) [-4.00,0.00] sprt @ 60+0.6 th 1
Total: 23975 W: 3294 L: 3210 D: 17471

Resolves official-stockfish#575

Retire RootNode template

There is no reason to compile 3 different copies of search(). PV nodes are on
the cold path, and PvNode is a template parameter, so there is no cost in
computing:

const bool RootNode = PvNode && (ss-1)->ply == 0;

And this simplifies code a tiny bit as well.

Speed impact is negligible on my machine (i7-3770k, linux 4.2, gcc 5.2):

            nps   +/-
test    2378605  3118
master  2383128  2793
diff      -4523  2746

Bench: 7751425

No functional change.

Resolves official-stockfish#568

Do not probe syzygy bases when castling is possible

Almost no functional change. Bench is unchanged.

Resolves official-stockfish#230
Resolves official-stockfish#573

rotating symmetric patterns with increasing skipsize

STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00] sprt @ 5+0.1 th 21
Total: 7068 W: 1121 L: 975 D: 4972

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00] sprt @ 12+0.12 th 21
Total: 26691 W: 3594 L: 3481 D: 19616

No functional change with a single thread

Resolves official-stockfish#574

Time management simplification

10+0.1:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 41963 W: 7967 L: 7883 D: 26113

60+0.6:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 132314 W: 17939 L: 17969 D: 96406

Resolves official-stockfish#580

Document HalfDensityMap

No functional change.

Resolves official-stockfish#584

Remove Weights

Removed remaining redundant weights for pawn structure,
passed pawns, space and king safety by redistributing them
into individual evaluation terms.

STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 15173 W: 2790 L: 2659 D: 9724

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 43433 W: 5936 L: 5846 D: 31651

Bench: 7156237

Resolves official-stockfish#586

Fix futility pruning bug

PredictedDepth can be negative, causing the futility_margin to be negative.
It will be very difficult to tweak moveCount pruning and reduction formula, as they are tuned to prevent this behavior.

No functional change

Resolves official-stockfish#587

History Stat Formula Simplification

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 67476 W: 12561 L: 12521 D: 42394

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 111923 W: 15147 L: 15149 D: 81627

Bench: 8430465

Resolves official-stockfish#588

Remove slowMover

Removes a slowMover and one paramater from move_importance function.

STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 77023 W: 14456 L: 14433 D: 48134

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 37175 W: 5190 L: 5092 D: 26893

Resolves official-stockfish#589

Revert "Remove slowMover"

This reverts commit 77fa960.

Resolves official-stockfish#590

Simplify Reduction Formula

Formula now only contains one coefficient. Making it much easier to tune.

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 187443 W: 34858 L: 35028 D: 117557

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 88329 W: 11982 L: 11953 D: 64394

Bench: 7521394

Resolves official-stockfish#591

Pass endgame value to evaluate_scale_factor()

No functional change

Resolves official-stockfish#592

Clean up depth reduction calculation

Might also be a slight speed up

No functional change

Resolves official-stockfish#593

Tweak initiative formula

Give more weight to the pawns number and
the vertical king distance in evaluate_initiative()

Passed STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 26729 W: 5067 L: 4825 D: 16837

and LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 60480 W: 8338 L: 8016 D: 44126

Bench: 8295162

Resolves official-stockfish#594

Passed pawn bonus simplification

STC: (yellow)

LLR: -2.96 (-2.94,2.94) [0.00,4.00]
Total: 86114 W: 16063 L: 15921 D: 54130

LTC:

LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 14347 W: 2025 L: 1896 D: 10426

Bench: 8576437

Resolves official-stockfish#595

Add followup moves history for move ordering

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 7955 W: 1538 L: 1378 D: 5039

LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 5323 W: 778 L: 642 D: 3903

Bench: 8261839

Resolves official-stockfish#599

Assorted cleanup of latest commits

No functional change.

Resolves official-stockfish#601

Simplify Safe Checks

STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 11796 W: 2211 L: 2074 D: 7511

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 14324 W: 1935 L: 1806 D: 10583

Bench: 8075202

Resolves official-stockfish#600

A small simplification in movepick.h

No functional change

Resolves official-stockfish#597

Simplify pawns King Safety calculation

STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 130209 W: 23516 L: 23581 D: 83112

LTC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 33541 W: 4563 L: 4460 D: 24518

Bench: 8644370

Resolves official-stockfish#604

Raise endgame passed pawn and material values

STC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 136149 W: 25213 L: 24588 D: 86348

LTC:
LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 54637 W: 7533 L: 7238 D: 39866

Bench: 8546808

Resolves official-stockfish#608

Bonus for loose enemies

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 30504 W: 5743 L: 5485 D: 19276

LTC:
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 11936 W: 1651 L: 1493 D: 8792

Bench: 8880041

Resolves official-stockfish#606

Rewrite bsfq management

Use compiler intrinsics when possible to
avoid writing platform specific asm code.

Tested on Windows 7 with MSVC 2013 and mingw 4.8.3 (32 and 64 bit)
and on Linux Mint with g++ 4.8.4 and clang 3.4 (32 and 64 bit).

No functional change

Resolves official-stockfish#609

Guard against UB in lsb/msb

lsb(b) and msb(b) are undefined when b == 0. This can lead to subtle bugs, where
the resulting code behaves differently on different configurations:
- It can be the home grown software LSB/MSB
- It can be the compiler generated software LSB/MSB (when using compiler
  intrinsics without the right compiler flags to allow compiler to use hardware
  LSB/MSB). Which of course depends on the compiler.
- It can be hardware LSB/MSB generated by the compiler.
- Not to mention that hardware LSB/MSB can return different value on different
  hardware when b == 0.

No functional change

Resolves official-stockfish#610

A combo patch of two tuning patches

STC:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 14223 W: 2700 L: 2494 D: 9029

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,4.00]
Total: 66294 W: 9065 L: 8739 D: 48490

Bench: 7607385

Resolves official-stockfish#612

32-bit/64-bit Makefile fix

Counter intuitively, make build ARCH=x86-32 does NOT produce a 32-bit compile
when running a 64-bit OS. Nor would ARCH=x86-64 produce a 64-bit compile when
running a 32-bit OS (assuming it compiled w/o errors).

No functional change

Resolves official-stockfish#621

Simplify popcnt

Also a speedup(about 1%) on 64-bit w/o hardware popcnt

Retire Max15 and Full template parameters
(Contributed by Marco Costalba)

Now that we have just SW and HW versions, use
template default parameter to get rid of explicit
template parameters.

Retire bitcount.h and move the only defined
function to bitboard.h

No functional change

Resolves official-stockfish#620

Backward simplication

On top of the usual conditions
a) some opponent in front (but no lever)
b) some neighbours (in front) (but no neighbour behind or same rank)
c) < rank_5

to find out if a pawn is backward we look at the squares in front of this pawn to reach the same rank as the next neighbour.

In current master, a pawn is backward if any of those squares is controlled by an enemy pawn on an adjacent file

In this version, a pawn is ALSO backward if any of those squares is occupied by an enemy pawn.

STC:
http://tests.stockfishchess.org/tests/view/56fe7efd0ebc59301a3541f1
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 19051 W: 3557 L: 3433 D: 12061

LTC:
http://tests.stockfishchess.org/tests/view/56febc2d0ebc59301a354209
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 40810 W: 5619 L: 5526 D: 29665

Bench: 7525245

Resolves official-stockfish#614

Undefended King Ring

There was already a penalty for squares only defended by King (undefended)

This test records a penalty for completely undefended squares in the so called extended king-ring
(so if we exclude squares defended by a Kg8 for example, we only look at h6 g6 and f6)

We also exclude squares occupied by opponent pieces in this computation,
based on the following results

Was yellow at STC
LLR: -2.97 (-2.94,2.94) [0.00,5.00]
Total: 112499 W: 20649 L: 20293 D: 71557

and passed LTC
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 36805 W: 5100 L: 4857 D: 26848

Bench: 8430233

Resolves: official-stockfish#619

Small passed pawn simplification

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 21993 W: 4197 L: 4078 D: 13718

LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 67213 W: 9135 L: 9077 D: 49001

Bench: 7482426

Resolves official-stockfish#622

Fix Travis Cl

Broken after "32-bit/64-bit Makefile fix" commit.

Ubuntu "Precise" 12.04.5 supports multilib only until
g++ 4.6 that is not enough to compile Stockfish.

So move to Ubuntu 14.04.4 LTS (Trusty Tahr)

No functional change.

Hide global visibility when not needed

Also move PieceValue definition in psqt.cpp,
where it is initialized.

Fix a warning in popcount16() with Intel compiler

No functional change.

Fix last search info carried over to mate position

When starting search in a mate or stalemate position, Stockfish does not
even care to reinitialize and start worker threads. However after search
all threads are checked for the best move.

This can lead to bestmove and info beeing carried over from the last
search.

Example session:

    setoption name threads value 7
    go movetime 4000
    position startpos moves f2f3 e7e5 g2g4 d8h4
    go movetime 4000

Actual output is like (almost always):

    [...]
    bestmove e2e4
    info depth 0 score mate 0
    info depth 20 seldepth 29 multipv 1 score cp 28 [...] pv e2e4
    bestmove e2e4

Expected output / output after fix:

    [...]
    bestmove e2e4 ponder e7e6
    info depth 0 score mate 0
    bestmove (none)

Resolves official-stockfish#623

StateInfo is usually allocated on the stack by search()

And passed in do_move(), this ensures maximum efficiency and
speed and at the same time unlimited move numbers.

The draw back is that to handle Position init we need to
reserve a StateInfo inside Position itself and use at
init time and when copying from another Position.

After lazy SMP we don't need anymore this gimmick and we can
get rid of this special case and always pass an external
StateInfo to Position object.

Also rewritten and simplified Position constructors.

Verified it does not regress with a 3 threads SMP test:
ELO: -0.00 +-12.7 (95%) LOS: 50.0%
Total: 1000 W: 173 L: 173 D: 654

No functional change.

Add a second level of follow-up moves

STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 6438 W: 1229 L: 1077 D: 4132

LTC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 4000 W: 605 L: 473 D: 2922

bench: 7378965

Resolves official-stockfish#636

Fix incorrect draw detection

In this position we should have draw for repetition:

position fen rnbqkbnr/2pppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 moves g1f3 g8f6 f3g1
go infinite

But latest patch broke it.

Actually we had two(!) very subtle bugs, the first is that Position::set()
clears the passed state and in particular 'previous' member, so
that on passing setupStates, 'previous' pointer was reset.

Second bug is even more subtle: SetupStates was based on std::vector
as container, but when vector grows, std::vector copies all its contents
to a new location invalidating all references to its entries. Because
all StateInfo records are linked by 'previous' pointer, this made pointers
go stale upon adding more element to setupStates. So revert to use a
std::deque that ensures references are preserved when pushing back new
elements.

No functional change.

Remove some pointless micro-optimizations

Seems to give around 1% speed-up for CPUs with popcnt support.
Seems to give a very minor speed-up for CPUs without popcnt.

No functional change

Resolves official-stockfish#646

Use -O3 for all compilers (including ICC)

There seems to be no benefit from using -fast over -O3 with icc.
So use -O3 everywhere.

No functional change

Resolves official-stockfish#652

Use FMHs to assist with LMR formula.

STC:
LLR: 2.99 (-2.94,2.94) [0.00,5.00]
Total: 52232 W: 9654 L: 9304 D: 33274

LTC:
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 115988 W: 15550 L: 15049 D: 85389

Bench: 7890808

Resolves official-stockfish#651

Isolated pawn simplification

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 117822 W: 21697 L: 21744 D: 74381

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 92307 W: 12330 L: 12305 D: 67672

Bench: 8813983

Resolves official-stockfish#659

Remove useless -mbmi flag in Makefile

I could not find anything documented that is necessary that prepending -mbmi to -mbmi2 gives some benefit.
Instead at
https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html#x86-Built-in-Functions

The following built-in functions are available when -mbmi is used. All of them generate the machine instruction that is part of the name.
unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int);
unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long);

The following built-in functions are available when -mbmi2 is used. All of them generate the machine instruction that is part of the name.
unsigned int _bzhi_u32 (unsigned int, unsigned int)
unsigned int _pdep_u32 (unsigned int, unsigned int)
unsigned int _pext_u32 (unsigned int, unsigned int)
unsigned long long _bzhi_u64 (unsigned long long, unsigned long long)
unsigned long long _pdep_u64 (unsigned long long, unsigned long long)
unsigned long long _pext_u64 (unsigned long long, unsigned long long)

and at
https://gcc.gnu.org/ml/gcc/2014-02/msg00204.html

( "... The real optimization comes from being able to use pext
(parallel bit extract), which can implement several bextr expressions in
parallel.")

Apart from that we don't use all -msse -msse2 -msse3 -msse4.2 etc. but just -msse3 (or -msse4.2) only.

As regards to the speedup within noise level - this pull request is actually reversal of mcostalba#198 wherein prepending -mbmi to -mbmi2 was claimed to be 0.3% faster and here (removing -mbmi) gives 0.4% speed gain.

Use popcount intrinsic with Interl compiler

It seems that icc used our fallback version of popcount.
Now use intrinsics.

icc version 16.0.2 (gcc version 5.3.0 compatibility)
bmi2 compile
uname -r 4.5.1-1-ARCH

20xbench gives a nice speedup
./stockfish-icc-master 2161515 +- 34462
./stockfish-icc-sse42 2260857 +- 50349

Fix LazySMP when searching to a fixed depth.

Currently, helper threads will only search up to the
specified depth limit. Now let them search until the
main thread has finished the specified depth.

On the other hand, we don't want to pick a thread with
a higher search depth.

This may be considered cheating. ;-)

No functional change.

Fix a warning with MSVC

Introduced by 2dd24dc ("Use popcount intrinsic with Intel")

No functional change.

Simplify History LMR Formula

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 41713 W: 7589 L: 7504 D: 26620

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 41353 W: 5484 L: 5391 D: 30478

Bench: 8946983

Retire __popcnt64 intrinsic

Just use _mm_popcnt_u64() that is available
both for MSVC abd Intel compiler.

Verified on MSVC that the produced assembly
has the hardware 'popcnt' instruction.

No functional change.

Unsafe checks

Introducing a new multi-purpose penalty related to King safety, which
includes all kind of potential checks (from unsafe or unavailable
squares currently occupied by some other piece)

This will indirectly detect and reward some pins, discovered checks, and
motifs such as square vacation, or rook behind its pawn and aligned with
King (example Black Rg8, g7 against Kg1),
and penalize some pawn blockers (if they move, it allows a discovered
check by the pawn).

And since it looks also at protected squares, it detects some potential
defense overloading.

Finally, the rook contact checks had been removed some time ago. This
test will give a small bonus for them, as well as for bishop contact
checks.

Passed STC
http://tests.stockfishchess.org/tests/view/5729ec740ebc59301a354b36
LLR: 2.94 (-2.94,2.94) [0.00,5.00]
Total: 13306 W: 2477 L: 2296 D: 8533

and LTC
http://tests.stockfishchess.org/tests/view/572a5be00ebc59301a354b65
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 20369 W: 2750 L: 2565 D: 15054

bench: 9298175

Merge good and bad quiets

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 58613 W: 10779 L: 10723 D: 37111

LTC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 33608 W: 4539 L: 4436 D: 24633

Bench: 9441294

Double pawn simplification

Try doubled pawn simplification, with psq
table compensation.

STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 36094 W: 6558 L: 6463 D: 23073

LTC:
LLR: 2.94 (-2.94,2.94) [-3.00,1.00]
Total: 102352 W: 13417 L: 13404 D: 75531

Bench: 8716243

Fix a multiPV bug in lazy SMP

Where the helper threads were not doing multiPV at all.

Regression tested sprt @ 5+0.05 th 7

LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 73918 W: 11891 L: 11853 D: 50174

bench: 8716243

Assorted pruning tweaks

LTC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 38257 W: 5206 L: 4961 D: 28090

STC:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 16550 W: 3110 L: 2914 D: 10526

Bench: 8428997

More detailed dependence of time allocation on the magnitude of score change

10+0.1:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 5657 W: 1130 L: 979 D: 3548

60+0.6:
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 36884 W: 5002 L: 4762 D: 27120

bench: 8428997

Simplify doubled pawn

Only use doubled pawn malus when the doubled pawns are on consecutive squares.

Passed STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 7678 W: 1469 L: 1325 D: 4884

And LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 26739 W: 3562 L: 3449 D: 19728

Bench: 8211685

Teach check_blockers to check also non-king pieces

This is a prerequisite for next patch

No functional change.

Pins or discovered attacks on the opponent's queen

Bonus for pins or discovered attacks on the opponent's queen

STC:
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 32020 W: 5914 L: 5652 D: 20454

LTC:
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 10946 W: 1530 L: 1375 D: 8041

Bench: 7031649

Tuned values for piece check and attack unit factors

A middle ground patch of two successful tuning patches,
one at STC, the other at LTC, which now passed both.

LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 67893 W: 12777 L: 12384 D: 42732

LLR: 2.95 (-2.94,2.94) [0.00,4.00]
Total: 30165 W: 4189 L: 3960 D: 22016

bench: 9209507
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet