Permalink
Browse files

Retire 'Cowardice' and 'Aggressiveness' UCI options

They are not self-describing and create a lot of user
requests about them.

Given that the values are already well tuned there
is no need to expose them as UCI options.

No functional change.
  • Loading branch information...
1 parent 2d60995 commit bff65a211fcd626c170831878c90a3b440861e2a @mcostalba committed Jan 4, 2013
Showing with 18 additions and 21 deletions.
  1. +0 −2 polyglot.ini
  2. +18 −17 src/evaluate.cpp
  3. +0 −2 src/ucioption.cpp
View
@@ -24,8 +24,6 @@ Mobility (Endgame) = 100
Passed Pawns (Middle Game) = 100
Passed Pawns (Endgame) = 100
Space = 100
-Aggressiveness = 100
-Cowardice = 100
Min Split Depth = 4
Max Threads per Split Point = 5
Threads = 1
View
@@ -75,8 +75,8 @@ namespace {
const int GrainSize = 8;
// Evaluation weights, initialized from UCI options
- enum { Mobility, PassedPawns, Space, KingDangerUs, KingDangerThem };
- Score Weights[6];
+ enum { Mobility, PassedPawns, Space };
+ Score Weights[3];
typedef Value V;
#define S(mg, eg) make_score(mg, eg)
@@ -88,7 +88,7 @@ namespace {
//
// Values modified by Joona Kiiski
const Score WeightsInternal[] = {
- S(252, 344), S(216, 266), S(46, 0), S(247, 0), S(259, 0)
+ S(252, 344), S(216, 266), S(46, 0)
};
// MobilityBonus[PieceType][attacked] contains mobility bonuses for middle and
@@ -195,6 +195,10 @@ namespace {
// the strength of the enemy attack are added up into an integer, which
// is used as an index to KingDangerTable[].
//
+ // King safety evaluation is asymmetrical and different for us (root color)
+ // and for our opponent. These values are used to init KingDangerTable.
+ const int KingDangerWeights[] = { 259, 247 };
+
// KingAttackWeights[PieceType] contains king attack weights by piece type
const int KingAttackWeights[] = { 0, 0, 2, 2, 3, 5 };
@@ -281,19 +285,16 @@ namespace Eval {
void init() {
- Weights[Mobility] = weight_option("Mobility (Middle Game)", "Mobility (Endgame)", WeightsInternal[Mobility]);
- Weights[PassedPawns] = weight_option("Passed Pawns (Middle Game)", "Passed Pawns (Endgame)", WeightsInternal[PassedPawns]);
- Weights[Space] = weight_option("Space", "Space", WeightsInternal[Space]);
- Weights[KingDangerUs] = weight_option("Cowardice", "Cowardice", WeightsInternal[KingDangerUs]);
- Weights[KingDangerThem] = weight_option("Aggressiveness", "Aggressiveness", WeightsInternal[KingDangerThem]);
-
- // King safety is asymmetrical. Our king danger level is weighted by
- // "Cowardice" UCI parameter, instead the opponent one by "Aggressiveness".
- // If running in analysis mode, make sure we use symmetrical king safety. We
- // do this by replacing both Weights[kingDangerUs] and Weights[kingDangerThem]
- // by their average.
+ Weights[Mobility] = weight_option("Mobility (Middle Game)", "Mobility (Endgame)", WeightsInternal[Mobility]);
+ Weights[PassedPawns] = weight_option("Passed Pawns (Middle Game)", "Passed Pawns (Endgame)", WeightsInternal[PassedPawns]);
+ Weights[Space] = weight_option("Space", "Space", WeightsInternal[Space]);
+
+ int KingDanger[] = { KingDangerWeights[0], KingDangerWeights[1] };
+
+ // If running in analysis mode, make sure we use symmetrical king safety.
+ // We do so by replacing both KingDanger weights by their average.
if (Options["UCI_AnalyseMode"])
- Weights[KingDangerUs] = Weights[KingDangerThem] = (Weights[KingDangerUs] + Weights[KingDangerThem]) / 2;
+ KingDanger[0] = KingDanger[1] = (KingDanger[0] + KingDanger[1]) / 2;
const int MaxSlope = 30;
const int Peak = 1280;
@@ -302,8 +303,8 @@ namespace Eval {
{
t = std::min(Peak, std::min(int(0.4 * i * i), t + MaxSlope));
- KingDangerTable[1][i] = apply_weight(make_score(t, 0), Weights[KingDangerUs]);
- KingDangerTable[0][i] = apply_weight(make_score(t, 0), Weights[KingDangerThem]);
+ KingDangerTable[0][i] = apply_weight(make_score(t, 0), make_score(KingDanger[0], 0));
+ KingDangerTable[1][i] = apply_weight(make_score(t, 0), make_score(KingDanger[1], 0));
}
}
View
@@ -70,8 +70,6 @@ void init(OptionsMap& o) {
o["Passed Pawns (Middle Game)"] = Option(100, 0, 200, on_eval);
o["Passed Pawns (Endgame)"] = Option(100, 0, 200, on_eval);
o["Space"] = Option(100, 0, 200, on_eval);
- o["Aggressiveness"] = Option(100, 0, 200, on_eval);
- o["Cowardice"] = Option(100, 0, 200, on_eval);
o["Min Split Depth"] = Option(msd, 4, 7, on_threads);
o["Max Threads per Split Point"] = Option(5, 4, 8, on_threads);
o["Threads"] = Option(cpus, 1, MAX_THREADS, on_threads);

13 comments on commit bff65a2

@Kingdefender
Contributor

Aww :) I know some people on the Rybka forum really like their Stockfish with an increased Aggressiveness. But it is okay, maybe they will leave it in the Open Chess version, with Gaviotabase access. I hear there are now also Lomosonov bases, and these are 7 men, soon available in Aquarium! Marco I thought it would be a pity if the nullmove patch would turn out be be totally a regression. The results of Jean François were good enough, although they were not really confirmed with your testing at the time Marco, but maybe there was some bug in either of your testing? I thought it might be possible that you win something by not doing the evaluation afer a nullmove, but then you simply lose more than you gained by forgoing futility pruning altogether. After a nullmove it is always supposed to be an ALL node, so all the futility pruning (and in the normal search LNR and LMP) would be in effect. Something to try would maybe be

1148  +        if (fromNull) 
1149  +        { 
1150  +            ss->staticEval = bestValue = -(ss-1)->staticEval; 
1151  +            ss->evalMargin = std::max((ss-2)->evalMargin, (ss-1)->evalMargin); 
1152  +        }

and then not skip futility pruning after a null move. I just discover that in Rainbow Serpent I am using - (ss-1)->evalMargin); but that still seemed to work...

Another idea, from Don Dailey, indirectly: in static null move pruning, there is no danger of Zugzwang I can think of, so it should not be necessary to test for not being in a pawn endgame

&&  pos.non_pawn_material(pos.side_to_move()))

I accidently tried this when Don asked about programs not using nullmove in pawn endgames and it seemed not to bad, but in Rainbow Serpent I also do nullmove pruning in quiescence search and in both quiescence search and above that in depth < RazorDepth, nullmove was switched on again, but not for greater depths. It seemed not too bad, in Rainbow Serpent that is. So there is a chance I will try this a bit more in Rainbow Serpent but then both for quiescence search, not just the static null move pruning.

A third very little idea that is now in Rainbow Serpent, it may just be a regression in Stockfish but I increased the penalty for having zero mobilty for all pieces by twenty points, and combined that with a tentative change in mobilityArea, so that it is now:

    // Do not include in mobility squares protected by enemy pawns or occupied by our pieces
    const Bitboard mobilityArea = ~(ei.attackedBy[Them][PAWN]| pos.pieces(Us))
            &  pos.pieces(Them, ROOK, QUEEN)
            &  pos.pieces(Them, BISHOP, KNIGHT);

Well, I just thought one of these might maybe help, battling the regression in the Master, sorry guys, but I don't have any elo results for any of it.

Eelco

@lucasart
Contributor

Marco,

Wouldn't it be better to remove entirely the Cowardice and Aggressiveness from the code ?
This would make your evla symetric, with the notable exception of the tempo bonus. And hopefully mean that the ss->fromNull stuff would work better (though still polluted by the tempo bonus, that you can somewhat compensate for in the search, albeit ugly).

Lucas

@Kingdefender
Contributor

I agree with you Lucas it would make the eval simpler. In Rainbow Serpent I just made the internal weights the same (symmetrical) but you can still use the UCI options, however this now makes the Nullmove eval idea probably useless. In Stockfish there is the added problem that the evalMargin is highly asymmetrical and that could be the main problem. Often there is only one side with its king under attack as Christophe Théron once pointed out. There is no clear way to get Stockfish's evalMargin from the parent node. I tried to make my idea number one work with limited testing but that failed within high errormargins. It was short matches with four threads and using Shredder's book which is not really a tournament book. It was messy but the master from today still won whatever I tried, I did not try removing the patch altogether though. First two matches with the code as above:
Stockfish 130104 - Mastermod_003 ½ -1½ +0 =1 -1
Stockfish 130104 - Mastermod_003 33 - 27 +10 =46 -4 TP +34 ELO

Then with

    else
    {
        if (fromNull)
        {
            // Approximated score. Real one is slightly higher due to tempo
            ss->staticEval = bestValue = -(ss-1)->staticEval;
            ss->evalMargin = -(ss-1)->evalMargin; // Rainbow Serpent has this, by accident
        }
        else if (tte)
        {
            // Never assume anything on values stored in TT
            if (  (ss->staticEval = bestValue = tte->static_value()) == VALUE_NONE
                ||(ss->evalMargin = tte->static_value_margin()) == VALUE_NONE)
                ss->staticEval = bestValue = evaluate(pos, ss->evalMargin);
        }
        else
            ss->staticEval = bestValue = evaluate(pos, ss->evalMargin);

and again full Futility

      // Futility pruning
      if (   !PvNode
          && !InCheck // && !fromNull
          && !givesCheck
          &&  move != ttMove
          &&  enoughMaterial
          &&  type_of(move) != PROMOTION
          && !pos.is_passed_pawn_push(move))

Stockfish 130104 - Mastermod_004 31 - 29 +12 = 38 -10 TP +11 Elo 68%[+55, -9] 95%[+101, -31] 99.7%[+151, -52]

This was not bad considering the not logical choice of a negative evalMargin... And yet stockfish's implementation does not seem to make sense, you do not use that eval you borrow, not to Stand Pat because with inverted eval you can't stand pat, not in Futility Pruning because that was disabled :), only possibly as a bestValue if it is not improved by the search, but then it is still below beta. It makes no sense, maybe it is better not to drop in qsearch at all after a nullmove...

Final try, now with the positive evalMargin.

        if (fromNull)
        {
            // Get approximate score from parent node if available.
            // Real one is slightly higher due to tempo
            if (  (ss->staticEval = bestValue = -(ss-1)->staticEval) == VALUE_NONE
    ||(ss->evalMargin = (ss-1)->evalMargin) == VALUE_NONE)
            {
    ss->staticEval = bestValue = evaluate(pos, ss->evalMargin);
    fromNull = false;
            }
        }
        else if (tte)
        {
            // Never assume anything on values stored in TT
            if (  (ss->staticEval = bestValue = tte->static_value()) == VALUE_NONE
                ||(ss->evalMargin = tte->static_value_margin()) == VALUE_NONE)
                ss->staticEval = bestValue = evaluate(pos, ss->evalMargin);
        }
        else
            ss->staticEval = bestValue = evaluate(pos, ss->evalMargin);

        // Stand pat. Return immediately if static value is at least beta
        if (bestValue >= beta)
        {
            if (!tte && !fromNull)
                TT.store(pos.key(), value_to_tt(bestValue, ss->ply), BOUND_LOWER,
                         DEPTH_NONE, MOVE_NONE, ss->staticEval, ss->evalMargin);

            return bestValue;
        }

and full Futility.

Stockfish 130104 - Mastermod_005 31 - 29 +10 =42 -8 TP +11 Elo 68%[+55, -7] 95%[+101, -25] 99.7%[+151, -44]
This is what the Shredder GUI outputs for the Elo and margins, unfortunately I have to copy it by hand. This seemed the most logical choice to code an alternative, of course you would need more games, however +11 Elo is not nothing.

Regards, Eelco

@Kingdefender
Contributor

In this Mastermod_005 however I see a bug in the code because ss->evalMargin is here undetermined after fromNull. So that is still a thing to try, it seems like a clear bug. I will try in the Shredder GUI with another match of 60 games but it is not my favorite way of testing, and I have a bit enough of errormargins of -155 points. Later, please. Maybe tomorrow!

Eelco

@lucasart
Contributor

Elco,

You need to be more careful in your testing methodology. I'm just taking an example:

Stockfish 130104 - Mastermod_004 31 - 29 +12 = 38 -10 TP +11 Elo 68%[+55, -9] 95%[+101, -31] 99.7%[+151, -52]

That's 31+29+12=72 games. There are two reasons why this is meaningless:

  • you stopped the experiment after 72 games, but that "stopping rule" is biaised (based on your appreciation). unless you left the screen off for a few hours, and pressed Ctrl+C, and switch it back on to see the resuls, you should never early stop, otherwise you introduce enormous and unmeasurable biais. Note that due to early stopping, none of the LOS and confidence interval are correct. Those are based on the assumptio n that the number of games was decided initally and never changed.
  • even if you did decide on the number of games N=72 at the beginning of the experiment, 31-29-12 gives you only a likelyhood of superiority of 60.1%. That's way to smal to be accepted. And remember that there are other sources of noise that the model doesn't capture (concurrency introduces some variability in the system ressources allocated to each process, and obviously book selection, even if you play positions twice with colors reversed).

For such minor changes, I suggest you play 4000 games. Using cutechess-cli, and with 7 games in parrallel on my i7 (leaving some free CPU power for the rest and avoid polluting the experiment), and for games in 10"+0.1", that's (10+0.1_60)_2/7*4000/3600=5.08 hours.

It's a bit slow, but it forces you to be much more methodic and careful. You code a patch, test it thouroughlly, and leave it overnight to run. In the morning, you only commit if the experiment is finished (early stopping is the root of all evil), and if the LOS is > 90%. Well sometimes, a code change removes a ton of ugly code, and with a LOS of 50% you could accept it on that basis. There's always a margin for appreciation.

Lucas

@lucasart
Contributor

what i meant to say by switching the screen off and pressing ctrl+c, is that the only stopping rule that is not biaised is random stopping (uncorrelated to the results)

@Kingdefender
Contributor

Hello Lucas, thanks but no there was no stopping rule... I know for that at least you need an enormously positive or negative result. And then still the statistical theory is very complex. It was simply a test of 60 games each time, one minute for each engine. I should have used less threads in this test because I could see in Task Manager that neither of the engines seemed to get enough time to go to full load this way, it could be the task manager just is not up to measuring this correctly, but then I have no way to be sure other than just looking at the task manager. I just took my chances, because this was better than not meauring it at all. It is just that none of the 3 approaches tried seem very promising and they have to be better than Stockfish 13-012-04 because that in itself is already a regression, in all probability. I then thought there was a bug in Mastermod_005, but that was false alarm. The code ideas I don't think are bad. But you are dealing with the eval asymmetries in Stockfish and you can't do anything about that in the search itself. The only other approach left for my piece of code I see at the moment is that if eval is based on a real searchresult in hash, that could in some case override a shorter nullmove search. You only need the nullmove search then for getting a threatmove, but you already know that a real search failed high, so a threatmove from a nullmove failing low is of some but a bit limited use. And you maybe should not allow the nullmove to fail high on a real search, only on a second TT result in the nullmove search, but to prevent that you can also decide not to do a nullmove search in the first place. I think I already tried something like this years back and it did not really make a dent. But that is the only thing I can think of right now. I appreciate the advice Lucas but I already spend way too much time on the other chessprogram. Sorry, but for now I am not goingto set up a Cutechess testing environment. I just don't like doing testing much that way. No fun anymore and I just don't think I can do all that for two programs.

This is my last try then for you guys this weekend:

// Step 8. Null move search with verification search (is omitted in        PV nodes)

// Null move dynamic reduction based on depth
Depth R = 3 * ONE_PLY + depth / 4;

// Null move dynamic reduction based on value
if (eval - PawnValueMg > beta)
    R += ONE_PLY;

if (   !PvNode
    && !ss->skipNullMove
    &&  depth > ONE_PLY
    && !inCheck
    &&  eval >= beta
    &&  abs(beta) < VALUE_MATE_IN_MAX_PLY
    &&  pos.non_pawn_material(pos.side_to_move())
    && !(depth < 6 * ONE_PLY
         && tte
         && tte->depth() > depth-R
         && ttValue >= beta
         && tte->type() & BOUND_LOWER))
{
    ss->currentMove = MOVE_NULL;        

    pos.do_null_move<true>(st);
    (ss+1)->skipNullMove = true;

but I don't have this under testing yet, just Sune's pawn ending where it is not doing so bad:

6k1/1pp4p/p1p5/8/3P4/P4K2/1P4PP/8 b - -

Engine: Mastertest 006 (Athlon, 2009 MHz, one thread only, 48 MB)
by Tord Romstad, Marco Costalba and Joona Kiiski

1/01 0:00 -0.44 1...Kf7 (13) 13

2/02 0:00 -0.60 1...Kf7 2.Ke4 (66) 66

3/03 0:00 -0.44 1...Kf7 2.Ke4 Ke6 (149) 149

4/04 0:00 -0.40 1...Kf7 2.Ke4 Ke6 3.g4 (343) 343

5/05 0:00 -0.44 1...Kf7 2.Ke4 Ke6 3.g4 Kd6 (603) 603

6/06 0:00 -0.40 1...Kf7 2.Ke4 Ke6 3.g4 h6 4.b3 (1.558) 1558

7/07 0:00 -0.44 1...Kf7 2.Ke4 Ke6 3.g4 h6 4.h3 Kd6 (2.879) 2879

8/11 0:00 -0.48 1...Kf7 2.Ke4 Ke6 3.g4 h6 4.h3 b6
5.h4 Kf6 6.b3 Ke6 (6.445) 6445

9/11 0:00 -0.48 1...Kf7 2.Ke4 Ke6 3.g4 h6 4.h3 b6
5.h4 Kf6 6.b3 Ke6 (10.940) 683

10/11 0:00 -0.48 1...Kf7 2.g4 Ke6 3.Ke4 h6 4.h3 b6
5.h4 Kf6 6.b3 Ke6 7.g5 (21.207) 662

11/14 0:00 -0.40 1...Kf7 2.Ke4 Kf6 3.b3 h5 4.g3 Ke6
5.h3 Kd6 6.g4 hxg4 7.hxg4 (45.289) 718

12/14 0:00 -0.40 1...Kf7 2.Ke4 Kf6 3.b3 h5 4.g3 Kg5
5.h3 a5 6.Ke5 b6 7.Ke4 (69.145) 875

13/20 0:00 -0.44 1...Kf7 2.Ke4 Ke6 3.g4 a5 4.b3 Kd6
5.g5 c5 6.dxc5+ Kxc5 7.Ke5 b6 8.Ke4 c6 (159.396) 847

14/22 0:00 -0.92 1...Kf7 2.g4 Ke6 3.h4 Kf7 4.g5 Kg6
5.Kg4 Kg7 6.h5 a5 7.Kf5 Kf7 8.h6 Ke7
9.g6 (760.170) 900

15/22 0:00 -0.92 1...Kf7 2.g4 Ke6 3.h4 Kf7 4.g5 Kg6
5.Kg4 Kg7 6.h5 a5 7.Kf5 Kf7 8.h6 Ke7
9.g6 (796.595) 894

16/22 0:01 -0.96 1...Kf7 2.g4 Kg6 3.h4 a5 4.Kf4 h6
5.b3 Kf6 6.g5+ hxg5+ 7.hxg5+ Kg6
8.Kg4 Kg7 9.Kf5 Kf7 10.Kf4 Kg6
11.Kg4 Kg7 12.Kf5 Kf7 13.Kf4 Kg6 (1.031.755) 904

17/26 0:01 -1.01 1...Kf7 2.g4 Kg6 3.h4 Kf6 4.Kf4 Kg6
5.h5+ Kh6 6.Kf5 a5 7.Kf4 b6 8.Kf5 b5
9.Kf6 b4 10.Kf5 bxa3 11.bxa3 Kg7 (1.380.081) 929

18/28 0:02 -1.17 1...Kf7 2.g4 Kf6 3.h4 a5 4.Kf4 Kg6
5.h5+ Kh6 6.Kf5 a4 7.Kf4 b6 8.Kf5 c5
9.dxc5 bxc5 10.Kf4 c6 11.Kf5 (2.434.637) 973

19/28 0:03 -1.09++ 1...Kf7 2.g4 Kf6 3.h4 a5 4.Kf4 h6
5.b3 b6 6.h5 c5 7.dxc5 bxc5 8.a4 c6
9.g5+ hxg5+ 10.Kg4 c4 11.bxc4 c5
12.h6 Kg6 (3.311.091) 976

19/31 0:03 -1.33-- 1...Kf7 2.g4 Kf6 3.h4 a5 4.Kf4 h6
5.b3 b6 6.h5 c5 7.dxc5 bxc5 8.a4 c6
9.Ke4 Kg5 10.Kf3 Kf6 11.Kf4 Kg7
12.g5 (3.775.368) 985

19/31 0:05 -1.51-- 1...Kf7 2.h4 Kf6 3.g4 a5 4.Kf4 h6
5.b3 b6 6.h5 c5 7.dxc5 bxc5 8.a4 c6
9.Ke4 Kg5 10.Kf3 Kf6 11.Kf4 Kg7
12.g5 (5.028.461) 990

19/38 0:06 -1.73 1...Kf7 2.h4 Kf6 3.g4 a5 4.Kf4 a4
5.g5+ Kg6 6.Kg4 h6 7.h5+ Kg7 8.g6 Kf6
9.Kg3 Kg7 10.Kf4 Kf6 11.Ke4 b6
12.Kf4 Ke6 13.Ke4 Kf6 14.Kf4 Ke6 (6.420.962) 987

20/38 0:10 -1.77 1...Kf7 2.g4 h6 3.Kf4 a5 4.Kf5 a4
5.h4 b6 6.g5 hxg5 7.Kxg5 Kg7 8.h5 Kh7 (9.733.196) 964

21/38 0:11 -1.77 1...Kf7 2.g4 h6 3.Kf4 a5 4.Kf5 a4
5.h4 b6 6.g5 hxg5 7.Kxg5 Kg7 8.h5 Kh7
9.h6 c5 10.dxc5 bxc5 11.Kh5 (10.877.797) 961

22/38 0:14 -1.77 1...Kf7 2.g4 h6 3.Kf4 a5 4.Kf5 a4
5.h4 b6 6.g5 hxg5 7.Kxg5 Kg7 8.h5 Kh7
9.h6 c5 10.dxc5 bxc5 11.Kh5 (13.516.689) 954

23/40 0:21 -1.73 1...Kf7 2.g4 Kg6 3.h4 a5 4.Kf4 a4
5.h5+ Kf7 6.g5 Kf8 7.Kf5 Kf7 8.Kf4 Kf8
9.Kf5 Kf7 10.Kf4 (20.239.243) 931

24/40 0:25 -1.81-- 1...Kf7 2.g4 Kg6 3.h4 a5 4.Kf4 a4
5.h5+ Kf7 6.Kf5 Ke7 7.Kg5 Kf7 8.Kh6 Kg8
9.g5 b6 10.g6 hxg6 11.Kxg6 Kf8 12.h6 Kg8 (23.415.412) 927

24/40 0:29 -1.89-- 1...Kf7 2.g4 Kg6 3.h4 a5 4.Kf4 a4
5.h5+ Kf7 6.Kf5 Ke7 7.Kg5 (27.199.160) 928

24/40 0:34 -2.02-- 1...Kf7 2.g4 Kg6 3.h4 a5 4.Kf4 a4
5.h5+ Kf7 6.Kf5 Ke7 7.Kg5 Kf7 8.Kh6 Kg8
9.g5 b6 10.g6 hxg6 11.Kxg6 Kf8 12.h6 Kg8
13.h7+ Kh8 14.Kh6 (31.695.945) 925

24/43 0:47 -2.06 1...Kf7 2.g4 Kg6 3.h4 a5 4.b3 Kf6
5.Ke4 h6 (43.419.098) 905

25/43 1:03 -2.06 1...Kf7 2.g4 Kg6 3.h4 a5 4.b3 Kf6
5.Ke4 h6 6.Kf4 b6 7.Ke3 Ke6 (55.907.259) 887

26/43 1:24 -2.14-- 1...Kf7 2.g4 Kg6 3.h4 a5 4.b3 Kf6
5.Ke4 h6 6.Kf4 Kf7 7.Kf5 b6 8.h5 (74.105.446) 875

26/45 1:43 -2.22-- 1...Kf7 2.g4 Kg6 3.h4 a5 4.b3 Kf6
5.Ke4 (90.407.781) 870

26/45 2:17 -2.34-- 1...Kf7 2.g4 Kg6 3.h4 a5 4.b3 Kf6
5.Ke4 Kg6 6.Ke5 Kf7 7.Kf5 h6 8.g5 hxg5
9.hxg5 Kg7 10.g6 Kg8 11.Kf6 b6
12.Kf5 Kf8 13.Kg4 c5 14.dxc5 (118.781.823) 860

26/45 2:52 -2.26 1...Kf7 2.g4 Kg6 3.h4 a5 4.b3 Kf6
5.Ke4 Kg6 6.Ke5 Kf7 7.Kf5 h6 8.g5 (146.205.703) 849

27/45 3:21 -2.26 1...Kf7 2.g4 Kg6 3.h4 a5 4.b3 Kf6
5.Ke4 Kg6 6.Ke5 Kf7 7.Kf5 h6 8.b4 (169.923.104) 841

28/45 3:56 -2.34-- 1...Kf7 2.g4 Kg6 3.h4 a5 4.b3 Kf7
5.Kf4 Kf6 6.Ke4 h6 7.Kf4 b6 8.Ke4 Ke6
9.b4 a4 10.g5 hxg5 11.hxg5 Kf7
12.Kf5 Kg7 13.Ke6 Kg6 14.Kd7 Kxg5 (198.291.693) 838

28/48 5:56 -2.42-- 1...Kf7 2.Kf4 Kf6 3.g4 a5 4.b3 Kg6
5.h4 Kf6 6.g5+ Kg6 7.Kg4 h6 8.gxh6 (296.191.023) 830

28/51 7:54 -2.54-- 1...Kf7 2.g4 a5 3.b3 Ke6 4.Ke4 Kd6
5.g5 a4 6.bxa4 Ke7 7.h4 Kf7 8.h5 Kg7 (392.363.812) 826

28/54 9:36 -2.72-- 1...Kf7 2.g4 a5 3.b3 Ke6 4.Ke4 Kd6
5.g5 a4 6.bxa4 Kd7 (472.580.348) 819

28/54 11:44 -3.00-- 1...Kf7 2.g4 a5 3.b3 Ke6 4.Ke4 Kd6
5.g5 a4 6.bxa4 Kd7 7.h4 (572.577.927) 812

28/55 15:03 -3.15 1...Kf7 2.g4 Kf6 3.Kf4 Kg6 4.h4 Kf6
5.b4 h6 6.g5+ hxg5+ 7.hxg5+ Kg6
8.Kg4 Kf7 9.Kf5 b6 (724.777.802) 802

29/55 16:57 -3.07++ 1...Kf7 2.g4 a5 (812.103.193) 798

29/55 17:29 -2.98++ 1...Kf7 2.g4 a5 (835.243.099) 795

29/55 18:34 -3.23-- 1...Kf7 2.g4 a5 3.b3 Ke6 4.Ke4 b6
5.h4 c5 6.dxc5 bxc5 7.g5 Kd6 8.h5 Ke6 (887.419.161) 796

29/55 19:39 -2.86++ 1...Kf7 2.g4 a5 (936.348.160) 793

29/55 22:31 -3.41-- 1...Kf7 2.g4 a5 (1.071.208.364) 792

29/58 26:47 -3.82-- 1...Kf7 2.g4 a5 3.b3 Ke6 4.Ke4 Kf6
5.h4 Ke6 6.g5 Kf7 (1.266.397.673) 788

29/59 27:43 -2.59++ 1...Kf7 2.g4 a5 3.b3 Ke6 4.Ke4 Kf6
5.h4 Kg6 6.Ke5 Kf7 7.Kf5 (1.308.585.570) 786

29/59 32:14 -4.43-- 1...Kf7 2.g4 a5 3.b3 Ke6 4.Ke4 Kf6
5.h4 Ke6 6.g5 Kf7 7.Kf5 Ke7 8.g6 hxg6+
9.Kxg6 Kf8 (1.513.660.959) 782

29/66 33:36 -3.91 1...Kf7 2.g4 a5 3.b3 Ke6 4.Ke4 Kf6
5.h4 Kg6 6.Kf4 Kf6 7.g5+ Kg6 8.Kg4 Kg7
9.Kf5 Kf7 10.h5 Ke7 11.g6 (1.575.850.776) 781

30/66 34:52 -3.83++ 1...Kf7 2.g4 a5 3.b3 Kg7 4.h4 Kf6
5.Ke4 Kg6 (1.630.593.133) 779

30/66 35:23 -3.75++ 1...Kf7 2.g4 a5 (1.651.466.866) 777

30/66 35:53 -3.63++ 1...Kf7 2.g4 a5 (1.671.954.350) 776

30/66 36:26 -3.45++ 1...Kf7 2.g4 a5 3.h4 a4 (1.694.997.030) 775

30/66 37:01 -3.18++ 1...Kf7 2.g4 a5 (1.720.543.571) 774

Regards, Eelco

@mcostalba
Owner
@lucasart
Contributor

Sorry, I didnt realize that even with the default aggressiveness / cowardice, SF tends to be a little more conservative with its own king safety than the opponent's

const int KingDangerWeights[] = { 259, 247 };

So that is the only source of asymmetry, apart from tempo bonus ?

@mcostalba
Owner
@Kingdefender
Contributor

The results of my last attempt to "save" the fromNull are a bit better, but within errorbars obviously. I still think there could be something in this one. I did watch the whole match, I saw no big irregularities. The book is sometimes giving positions with a slight advantage for one of the sides and playing with a book means you do not get exact same positions with reversed colors. So there is quite a bit of luck possible and it is just 60 games.
From the perspective of Mastermod_006 versus the version of Marco of january 4:

Mastermod_006 Stockfish 130104 32.0 - 28.0 (60: +11 =42 -7) 53.3% TP = +23 Elo 68%->[+4, +66] 95%->[-13, +111] 99.7%->[-32, +160]

+4 as the lowest estimate within +/- one standard deviation seems a reasonable guess to me. This would help "soften" any regression elsewhere, if there really is one. I mean, Stock is still only selftesting, so against a gauntlet of opponents there might not be any real regression... There might be a number of effects that you would at least change by a second method of measurement. But let me try be on topic just a little bit, what do you guys think of this patch? The code above I could not preview in Internet Explorer 8 (no longer supported by GitHub) anymore on my old Athlon for some reason, so the indents were all wrong again, sorry about that. Testing for VALUE_NONE in values from the search stack was maybe a bit over the top (read: better make an assert or something?) but sometimes I like 'defensive programming' if that is a real term. And that is what Stockfish now does do if you get a TTValue from hash directly (I copied that piece of code quickly into Rainbow Serpent, thanks Marco!).

Eelco

@glinscott
Contributor

Eelco, thanks for the patch, but I think it would be better to test your new fromNull idea against stockfish without the original fromNull idea. That way we can get a clearer idea of if it is a net-benefit.

Also, tests with 60 games are not a good indicator of results unfortunately. A few thousand games is probably required, unless the idea is incredibly good. It sucks waiting so long for results, but otherwise it is so easy to introduce regressions.

@Kingdefender
Contributor

Hi Gary,

Yes, I agree both tests would be good but do we have data yet if it was a real regression? I don't think the measurement old fromNull against no fromNull in the master now is in yet from Marco? How many elo? I mean Jean-Francois measured it at

Final result after 5000 games :
Score of c581b7e vs a878312: 1163 - 970 - 2867 [0.519] 5000
ELO: 13.35 +- 99%: 12.71 95%: 9.65
LOS: 100.00%
Wins: 1163 Losses: 970 Draws: 2867 Total: 5000

so there was no regression then measured by him in several tests... Of course I would run the test again but I just wanted to get an opinion on the code. Actually I still suspect this (fromNull patch) might not actually be a regression :) Sorry I mean, I still have to see new data on this one. But if there really is a regression here or elsewhere have we identified it yet?

Regards, Eelco

Update: I am doing a test of ten matches of 60 games each of your suggestion Gary. It takes a while even with this limited number of games.The results so far are in favour of Stockfish with fromNull disabled, i just put in fromNull = false, I think that is good enough? Anyway it is winning almost all matches sofar, clearly, so that confirms the other testresults against the patch, yours and Marco's.

Eelco

Four matches are finished, where Stockfish 130104 II is Gary's suggested version that has fromNull disabled:

Stockfish 130104 II - Mastermod_006 32.0 - 28.0 (60: +9 =46 -5) 53.3% TP = +23 Elo 68%->[+7, +66] 95%->[-7, +111] 99.7%->[-22, +160]

Stockfish 130104 II - Mastermod_006 29.5 - 30.5 (60: +7 =45 -8) 49.1% TP = -5 Elo 68%->[-50, +11] 95%->[-97, +28] 99.7%->[-147 +45]

Stockfish 130104 II - Mastermod_006 33.5 - 26.5 (60: +16 =35 -9) 55.8% TP = +40 Elo 68%->[+17, +81] 95%->[-6, +125] 99.7%->[-29 +173]

Stockfish 130104 II - Mastermod_006 32.0 - 28.0 (60: +12 =40 -8) 53.3.8% TP = +23 Elo 68%->[+3,+66] 95%->[-16, +111] 99.7%->[-36 +160]

The rest will follow at a later date.

(Friday update:) I did one short match yesterday at one minute + one second Fischer bonus instead of game per minute and that has a bit better result. Games lost by Mastermod_006 in other matches seemed often lost quickly in the opening or middlegame, which could be a King safety effect, and there were some saves in the endgame from seemingly very bad scores. So maybe an effect of more searchdepth is just a bit strengthened now. But it is too few games, you could also argue it would help the other side to have more time in the ending...

At 1' + 1":

Stockfish 130104 II - Mastermod_006 29.0 - 31.0 (60: +6 =46 -8) 48.3% TP = -11 Elo 68%->[-55, +4] 95%->[-101, +20] 99.7%->[-151, +36]

(Update:) next matches however at 1' + 1":

Stockfish 130104 II - Mastermod_006 33.0 - 27.0 (60: +13 =40 -7) 55.0% TP = +34 Elo 68%->[+15, +76] 95%->[-4, +120] 99.7%->[-24, +169]

Stockfish 130104 II - Mastermod_006 30.0 - 30.0 (60: +10 =40 -10) 50.0% TP = 0 Elo 68%->[-45, +45] 95%->[-92, +92] 99.7%->[-143, +143]

Please sign in to comment.