Skip to content

Commit

Permalink
NNUE evaluation threshold
Browse files Browse the repository at this point in the history
The idea is to use NNUE only on quite balanced material positions. This bring a big speedup on research since NNUE eval is slower than classical eval for most of the hardwares and specially on unbalanced positions with LazyEval.

STC: https://tests.stockfishchess.org/tests/view/5f2c2680b3ebe5cbfee85b61
LLR: 2.95 (-2.94,2.94) {-0.50,1.50}
Total: 3168 W: 560 L: 400 D: 2208
Ptnml(0-2): 21, 294, 819, 404, 46

LTC: https://tests.stockfishchess.org/tests/view/5f2c2ca6b3ebe5cbfee85b69
LLR: 2.98 (-2.94,2.94) {0.25,1.75}
Total: 3200 W: 287 L: 183 D: 2730
Ptnml(0-2): 4, 149, 1191, 251, 5

closes #2916

Bench 4746616
  • Loading branch information
MJZ1977 authored and vondele committed Aug 6, 2020
1 parent 84f3e86 commit 3dca13a
Showing 1 changed file with 11 additions and 5 deletions.
16 changes: 11 additions & 5 deletions src/evaluate.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -107,9 +107,10 @@ using namespace Trace;
namespace {

// Threshold for lazy and space evaluation
constexpr Value LazyThreshold1 = Value(1400);
constexpr Value LazyThreshold2 = Value(1300);
constexpr Value LazyThreshold1 = Value(1400);
constexpr Value LazyThreshold2 = Value(1300);
constexpr Value SpaceThreshold = Value(12222);
constexpr Value NNUEThreshold = Value(500);

// KingAttackWeights[PieceType] contains king attack weights by piece type
constexpr int KingAttackWeights[PIECE_TYPE_NB] = { 0, 0, 81, 52, 44, 10 };
Expand Down Expand Up @@ -941,9 +942,14 @@ namespace {
Value Eval::evaluate(const Position& pos) {

if (Eval::useNNUE)
return NNUE::evaluate(pos);
else
return Evaluation<NO_TRACE>(pos).value();
{
Value balance = pos.non_pawn_material(WHITE) - pos.non_pawn_material(BLACK);
balance += 200 * (pos.count<PAWN>(WHITE) - pos.count<PAWN>(BLACK));
// Take NNUE eval only on balanced positions
if (abs(balance) < NNUEThreshold)
return NNUE::evaluate(pos);
}
return Evaluation<NO_TRACE>(pos).value();
}

/// trace() is like evaluate(), but instead of returning a value, it returns
Expand Down

14 comments on commit 3dca13a

@LouisZulli
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But now you are not really honoring Use NNUE = true. A user who sets that might want all evaluations to come from the net, not only certain ones. Maybe you need more than just true/false for Use NNUE.

@vondele
Copy link
Member

@vondele vondele commented on 3dca13a Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's the way we should go, we use it in the best way possible, that is what 'Use NNUE' should mean (IMO).

@LouisZulli
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other UCI options (Analysis Contempt type combo default Both var Off var White var Black var Both) give the user a choice.

Instead of Use NNUE, could simply have an option Evaluation type combo default Hybrid var Classical var NNUE var Hybrid.

By the way, this commit gives about 15% speed-up on bench for my system (bmi2, avx2).

@vondele
Copy link
Member

@vondele vondele commented on 3dca13a Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is so 'simple', there will be other terms added to NNUE, and 'pure NNUE' will clearly be less strong. Eventually nets will be optimized for this hybrid mode, and it would be just wrong to use it outside of the hybrid context.

@LouisZulli
Copy link

@LouisZulli LouisZulli commented on 3dca13a Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then why have Use NNUE at all?

Or, why would the default for it be false?

@vondele
Copy link
Member

@vondele vondele commented on 3dca13a Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main reasons would be

  • To maintain the classical eval, which is a remarkable piece of chess software.
  • To have the strongest engine on weaker hardware
  • To be able to play without the necessity of downloading a net

@mstembera
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note we don't give users the option to disable lazy eval for classic either so this is consistent.

@stockchess
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LouisZulli Cant you get pure NNNE by putting NNUEThreshold = large value. 500 = 1/2 pawn I think so could put NNUEThreshold = 20000 or similar.

@LouisZulli
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stockchess Sure. What I can do isn't really the point here.

@syzygy1
Copy link
Contributor

@syzygy1 syzygy1 commented on 3dca13a Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how this will fare in real games where NNUE understands the unbalanced position and the regular eval may not have a clue. In general, stitching together two entirely separate evals seems very hacky.

@vondele
Copy link
Member

@vondele vondele commented on 3dca13a Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well... real games as opposed to fishtest games? However, is the situation very different from playing with table bases, or dedicate endgame evaluations functions, also that's two different evals?

Having said this, this is all early days, and I'm sure that these things are 'details' that will evolve quickly. My expectation is that nets will be trained for these things, and might become better for the subset of positions they have to deal with.

@syzygy1
Copy link
Contributor

@syzygy1 syzygy1 commented on 3dca13a Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vondele
I agree, this is still early days. I also agree there is very little reason to refuse a patch that is cleary gaining Elo.
Tablebases are different since those evals fit "perfectly" with the regular eval (ignoring queen sacs to reach a very difficult but theoretically won endgame).
Specialised endgame evals are also a bit different, at least those that are meant to guide the search to a win (but I guess they can clash sometimes with the regular eval).

Somehow something seems wrong with gaining Elo from calling a (still not cheap) "more approximate" eval. Why not just use the regular fast lazy eval if we expect a cutoff? If we do not expect a cutoff, shouldn't it be better to use the more accurate NNUE eval?

Instead of looking at the "absolute" material balance, it might make sense to compare with alpha and/or beta. (This does clash with caching the static eval in the TT, though.)

I wouldn't be surprised if a lot of trade offs that are now explicitly or implicitly present in the search have different outcomes with an NNUE eval. It seems inevitable that the NNUE and non-NNUE branches will diverge (even if it is just a single branch now).

Actually, I expect that SF may go in many different directions now with people doing their own non-compatible experiments. Not a bad thing at all!

@vondele
Copy link
Member

@vondele vondele commented on 3dca13a Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, many things to be tried out, that's a good thing. I think one reason for the success is that actually the NNUE and classical eval are rather commensurate.

The Elo gain of this patch really is a result of speedup, I think, since nps increases roughly 15% with this patch. It can still use the regular lazy eval if it goes in the classical branch, and presumably does so quite often.

Interestingly, one never got lazyEval to work based on comparisons with alpha/beta or the score at rootpos. I do agree that the current 'absolute material balance' is a bit rough, there is already a first patch that makes it somewhat more detailed (https://tests.stockfishchess.org/tests/view/5f2c9e2261e3b6af64881eba)

@syzygy1
Copy link
Contributor

@syzygy1 syzygy1 commented on 3dca13a Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't really mean comparing with alpha but more like comparing with the material balance in the PV node where the current branch is branching off from. But of course that node doesn't have to be a quiet node, so I am not sure any of this makes sense...

Please sign in to comment.