Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bonus for rook/queen attacking pawns on same rank #23

Closed
wants to merge 1 commit into from

Conversation

glinscott
Copy link
Contributor

Based off of the idea from @RyanTaker, this did very well in game testing.

Wins: 3390 Losses: 2972 Draws: 11323
LOS: 99.999992%
ELO: 8.213465 +- 99%: 6.746506 95%: 5.124415
Win%: 51.181792 +- 99%: 0.969791 95%: 0.736740

@mcostalba
Copy link
Owner

Thanks Gary,

could you please post test conditions?

I will retest at longer tc for verification in next days.

Marco

On Sun, Sep 16, 2012 at 4:04 PM, Gary Linscott notifications@github.comwrote:

Based off of the idea from @RyanTaker https://github.com/RyanTaker,
this did very well in game testing.

Wins: 3390 Losses: 2972 Draws: 11323
LOS: 99.999992%
ELO: 8.213465 +- 99%: 6.746506 95%: 5.124415

Win%: 51.181792 +- 99%: 0.969791 95%: 0.736740

You can merge this Pull Request by running:

git pull https://github.com/glinscott/Stockfish flatten_rook7th

Or view, comment on, or merge it at:

#23
Commit Summary

  • Bonus for rook/queen attacking pawns on same rank

File Changes

  • M src/evaluate.cpp (29)

Patch Links

@glinscott
Copy link
Contributor Author

40/4+0.05 for cutechess.

The full command line is:
cutechess-cli -repeat -recover -rounds 24000 -resign 3 500 -draw 20 5 -concurrency 4 -engine cmd=base proto=uci option.Threads=1 name=base -engine cmd=stockfish proto=uci option.Threads=1 -each tc=40/4+0.05 book=varied.bin

@mcostalba
Copy link
Owner

I have started the test, in few days I'll post results.

In the mean time I have cooked this cheap version that should be a bit
faster although not equivalent:

if ( (Piece == ROOK || Piece == QUEEN)
&& relative_rank(Us, s) > RANK_5)
{
// Pawns on same rank as rook or queen
Bitboard pawns = pos.pieces(Them, PAWN) & RankBB[rank_of(s)];
if (pawns)
score += (more_than_one(pawns) ? 3 : 1)
* (Piece == ROOK ? RookBonusPerPawn :
QueenBonusPerPawn);

// Queen or rook on 7th rank
if (     relative_rank(Us, s) == RANK_7
   && relative_rank(Us, pos.king_square(Them)) == RANK_8)
    score += (Piece == ROOK ? RookOn7thBonus : QueenOn7thBonus);

}

@glinscott
Copy link
Contributor Author

Thanks! The speed improvement looks nice. I'll give that a run tonight at 4".

@glinscott
Copy link
Contributor Author

I'll let the modified version keep running for a while, so far it's still within error bars:
Wins: 2359 Losses: 2138 Draws: 8887
LOS: 99.950931%
ELO: 5.711500 +- 99%: 7.753670 95%: 5.889452
Win%: 50.821877 +- 99%: 1.114947 95%: 0.847014

@glinscott
Copy link
Contributor Author

Still within error bars, but seems not as good as first version. I'm running 64 bit though, so impact of popcount probably not as high as it would be on 32 bit.

Wins: 3327 Losses: 3100 Draws: 12641
LOS: 99.768400%
ELO: 4.118110 +- 99%: 6.494266 95%: 4.933123
Win%: 50.592616 +- 99%: 0.934153 95%: 0.709667

@mcostalba
Copy link
Owner

Thanks Gary,

My partial result with 15"+0.05 per game is Wins: 667 Losses: 561 Draws:
1769

I'd like to commit in one of the two patches currently under test by me and
you and release it with the 2.3.1

Normally a fix release is a "no functional change" to avoid people to
retest, but in this case I'd think to commit in to avoid the newly official
release 2.3.1 to be obsoleted on the same day by the private compiles.

@glinscott
Copy link
Contributor Author

Cool, still very much within error bars, but looking solid! This is with the popcount version? It seems slightly stronger.

@mcostalba
Copy link
Owner

Yes it is with the popcount version. I am a little bit worried about
introducing a regression in the 32 bit version due to slower popcount().

@mcostalba
Copy link
Owner

On Thu, Sep 20, 2012 at 7:23 AM, Marco Costalba mcostalba@gmail.com wrote:

Yes it is with the popcount version. I am a little bit worried about
introducing a regression in the 32 bit version due to slower popcount().

Putting a dbg_hit_on():

        if (relative_rank(Us, s) >= RANK_5) {

            dbg_hit_on(relative_rank(Us, s) == RANK_5);

            const Bitboard pawns = pos.pieces(Them, PAWN) &

RankBB[rank_of(s)];
if (pawns) {
score += (Piece == ROOK ? RookBonusPerPawn :
QueenBonusPerPawn) * popcount(pawns);
}
}

Shows on a bench run: Total 1600773 Hits 758737 hit rate (%) 47

So most of the times the code is triggered by relative_rank(Us, s) ==
RANK_5, I am tempted to easy the pressure committing:

if (relative_rank(Us, s) > RANK_5) {
.....
}

Instead of your original, this should about half the calls to
popcount. What do you think?

@glinscott
Copy link
Contributor Author

So, I just took another look at the test I had run with your optimization,
and realized I munged it up. I had left a changed queen attacks pawn value
in, and I in fact tested with rank > 5. Bad code here:
glinscott/Stockfish@master...rook7th.

I'm running a new test with your optimized version, with the correct queen
value, and rank >= 5 now, and we'll see how it goes.

I had run a rank > 5 test with popcount a while back, and it didn't do as
well I seem to recall, but I may have changed some weights as well. So I'm
a bit leery of changing it. We could do an if (pawns), if (more_than_one)
popcount style check. That should be hit very infrequently.

On Thu, Sep 20, 2012 at 1:52 AM, Marco Costalba notifications@github.comwrote:

On Thu, Sep 20, 2012 at 7:23 AM, Marco Costalba mcostalba@gmail.com
wrote:

Yes it is with the popcount version. I am a little bit worried about
introducing a regression in the 32 bit version due to slower popcount().

Putting a dbg_hit_on():

if (relative_rank(Us, s) >= RANK_5) {

dbg_hit_on(relative_rank(Us, s) == RANK_5);

const Bitboard pawns = pos.pieces(Them, PAWN) &
RankBB[rank_of(s)];
if (pawns) {
score += (Piece == ROOK ? RookBonusPerPawn :
QueenBonusPerPawn) * popcount(pawns);
}
}

Shows on a bench run: Total 1600773 Hits 758737 hit rate (%) 47

So most of the times the code is triggered by relative_rank(Us, s) ==
RANK_5, I am tempted to easy the pressure committing:

if (relative_rank(Us, s) > RANK_5) {
.....
}

Instead of your original, this should about half the calls to
popcount. What do you think?


Reply to this email directly or view it on GitHubhttps://github.com//pull/23#issuecomment-8717421.

@glinscott
Copy link
Contributor Author

In progress run with your optimization (this time without messing things up :). Looks pretty good, so I'd say we go with your version. I'll let this keep running to get a better comparison though.

ELO: 10.18 +- 99%: 13.64 95%: 10.36
LOS: 99.95%
Wins: 816 Losses: 689 Draws: 2830 Total: 4335

@mcostalba
Copy link
Owner

On Thu, Sep 20, 2012 at 6:33 PM, Gary Linscott notifications@github.comwrote:

In progress run with your optimization (this time without messing things
up :). Looks pretty good, so I'd say we go with your version. I'll let this
keep running to get a better comparison though.

ELO: 10.18 +- 99%: 13.64 95%: 10.36
LOS: 99.95%
Wins: 816 Losses: 689 Draws: 2830 Total: 4335


Reply to this email directly or view it on GitHubhttps://github.com//pull/23#issuecomment-8736289.

Thanks ! yes please, let it go. I will also switch to test cheaper
version when done with the current, probably tomorrow.

@glinscott
Copy link
Contributor Author

Still looking very solid. I think it's safe to go with your version, assuming the longer test pans out for popcount version.

ELO: 8.49 +- 99%: 9.44 95%: 7.17
LOS: 100.00%
Wins: 1649 Losses: 1427 Draws: 5965 Total: 9041

@glinscott
Copy link
Contributor Author

I've updated the rook7th branch with the version I'm testing currently, just to make sure we are on the same page :).

glinscott/Stockfish@master...rook7th

@mcostalba
Copy link
Owner

Actually we are not on the same page !

IN my version codition is:

(Piece == ROOK || Piece == QUEEN) && relative_rank(Us, s) > RANK_5

instead of

(Piece == ROOK || Piece == QUEEN) && relative_rank(Us, s) >= RANK_5

So you are testing including RANK_5

To avoid misunderstandings I have pushed a new branch:

https://github.com/mcostalba/Stockfish/commits/major_attacks_pawns

Where you can see in the latest 3 commits the original version with
popcount I have tested, then teh cheap version (without rank 5) that
shows a regression although number of games is low and the latest
commit is the one I am testing now.

@glinscott
Copy link
Contributor Author

Ah, yes, so, having the condition be >= RANK_5 is important I think. Here are the final results from the rook7th branch, with your optimization.

ELO: 8.22 +- 99%: 5.79 95%: 4.40
LOS: 100.00%
Wins: 4406 Losses: 3837 Draws: 15757 Total: 24000

@glinscott
Copy link
Contributor Author

In the major_attacks_pawns branch, I see "if (relative_rank(Us, s) == RANK_5)", which could explain the regression.

@mcostalba
Copy link
Owner

On Fri, Sep 21, 2012 at 2:08 PM, Gary Linscott notifications@github.comwrote:

In the major_attacks_pawns branch, I see "if (relative_rank(Us, s) ==
RANK_5)", which could explain the regression.

Actually this is what I am testing now becuase I have understood rank 5 is
important, pelase see teh two previous commits logs message for test
results.

@glinscott
Copy link
Contributor Author

Ah, sorry, I missed that. So, for the cheap version commit, you had the
rank > 5 test. It would be interesting if rank == 5 was as good, but my
suspicion says no (although who knows in testing!). I'd bet on rank >= 5,
with no popcount, which seems equivalent to the popcount version when
tested at my time control.

On Fri, Sep 21, 2012 at 8:36 AM, Marco Costalba notifications@github.comwrote:

On Fri, Sep 21, 2012 at 2:08 PM, Gary Linscott notifications@github.comwrote:

In the major_attacks_pawns branch, I see "if (relative_rank(Us, s) ==
RANK_5)", which could explain the regression.

Actually this is what I am testing now becuase I have understood rank 5 is
important, pelase see teh two previous commits logs message for test
results.


Reply to this email directly or view it on GitHubhttps://github.com//pull/23#issuecomment-8763171.

@mcostalba
Copy link
Owner

Ok, your suspicious was right, after 1K games results are inconclusive
(same ELO), so I have pushed and started to test the one you bet on ;-),
could you please verify that the version now under test:

https://github.com/mcostalba/Stockfish/tree/major_attacks_pawns

It is the correct one?

The 'bench' signature of this candidate is: 5714962

@glinscott
Copy link
Contributor Author

Argh, my bench is 4937286. I checked for the difference, and I was running
with @RyanTaker's updated RookBonusPerPawn and QueenBonusPerPawn. I can't
test the right thing apparently! The bonuses I ran the test that finished
the 24k game test with are:

const Score RookBonusPerPawn = make_score(3, 38);
const Score QueenBonusPerPawn = make_score(1, 30);

And the originals from the popcount version:

const Score RookBonusPerPawn = make_score(3, 48);
const Score QueenBonusPerPawn = make_score(1, 40);

I am now re-running the test with the 48,40 values for non-popcount,
rank >= 5, and I do see bench of 5714962 for that. I think either set of values
would be fine. But since the 24k game test was done with the 38,30, might
be better to go with those.

Gary

On Fri, Sep 21, 2012 at 9:01 AM, Marco Costalba notifications@github.comwrote:

Ok, your suspicious was right, after 1K games results are inconclusive
(same ELO), so I have pushed and started to test the one you bet on ;-),
could you please verify that the version now under test:

https://github.com/mcostalba/Stockfish/tree/major_attacks_pawns

It is the correct one?

The 'bench' signature of this candidate is: 5714962


Reply to this email directly or view it on GitHubhttps://github.com//pull/23#issuecomment-8763743.

@glinscott
Copy link
Contributor Author

Well, results are certainly not conclusive, but not looking great for the 48,40 test right now.

ELO: -13.44 +- 99%: 36.79 95%: 27.93
LOS: 6.58%
Wins: 95 Losses: 117 Draws: 383 Total: 595

@glinscott
Copy link
Contributor Author

Still pretty bad with 48,40. Looks like those weights are pretty elo sensitive!

ELO: -14.30 +- 99%: 25.21 95%: 19.15
LOS: 0.76%
Wins: 195 Losses: 246 Draws: 823 Total: 1264

@jromang
Copy link
Contributor

jromang commented Sep 21, 2012

I'm trying to run some tests in my side with cutechess-cli ; what tool do
you use to calculate the elo ?

2012/9/21 Gary Linscott notifications@github.com

Still pretty bad with 48,40. Looks like those weights are pretty elo
sensitive!

ELO: -14.30 +- 99%: 25.21 95%: 19.15
LOS: 0.76%
Wins: 195 Losses: 246 Draws: 823 Total: 1264


Reply to this email directly or view it on GitHubhttps://github.com//pull/23#issuecomment-8766814.

@glinscott
Copy link
Contributor Author

It's a really simple python script. https://gist.github.com/3762136.

@jromang
Copy link
Contributor

jromang commented Sep 21, 2012

Thanks ! :-)
My results so far : 603053b vs sf_2.3
ELO: 3.42 +- 99%: 31.59 95%: 23.97
LOS: 67.54%
Wins: 158 Losses: 150 Draws: 504 Total: 812

2012/9/21 Gary Linscott notifications@github.com

It's a really simple python script. https://gist.github.com/3762136.


Reply to this email directly or view it on GitHubhttps://github.com//pull/23#issuecomment-8768148.

@mcostalba
Copy link
Owner

Ok, I will check how is going my running test later this evening.

I am making up my mind that the most wise choice is to release with the
popcount version, that is the only one tested by both of us ;-) with good
result.

Then after release we can tune the parameters with a CLOP+cutechess
sessions, always with the popcount version, then as last step, once the
optimal coefficients have been found, trying to downgrade to the simpler
no-popcount version and verify if it still holds.

Because a CLOP sessions takes some days and I'd really would like to
release tomorrow, I'd go with the popcount version.

Comments?

@jromang
Copy link
Contributor

jromang commented Sep 21, 2012

If you have a CLOP howto, I would be happy to launch the tests :-)

So far, here are the esults of your branch :
Score of 603053b vs sf_2.3: 545 - 562 - 1893 [0.497] 3000

2012/9/21 Marco Costalba notifications@github.com

Ok, I will check how is going my running test later this evening.

I am making up my mind that the most wise choice is to release with the
popcount version, that is the only one tested by both of us ;-) with good
result.

Then after release we can tune the parameters with a CLOP+cutechess
sessions, always with the popcount version, then as last step, once the
optimal coefficients have been found, trying to downgrade to the simpler
no-popcount version and verify if it still holds.

Because a CLOP sessions takes some days and I'd really would like to
release tomorrow, I'd go with the popcount version.

Comments?


Reply to this email directly or view it on GitHubhttps://github.com//pull/23#issuecomment-8774879.

@glinscott
Copy link
Contributor Author

@mcostalba, Yes, I think I agree with you :). We can tune afterwards, but popcount tested solidly at both time controls. Tuning for speed/32 bit seems like an excellent job for CLOP.

@jromang, CLOP is a bit of work to set up, and requires that the parameters to be tuned are accessible as uci options, so requires minor source changes usually. But it would be great having someone else tuning things! What platform are you running on?

@mcostalba
Copy link
Owner

OK, I'd suggest you can stop it (not good).

With cutechess it is already included a glue-script *clop-cutechess-cli.py *to
connect CLOP:

https://github.com/cutechess/cutechess/tree/master/tools

It is very easy to use. I can send you an example later this evening when I
came back home.

@jromang
Copy link
Contributor

jromang commented Sep 21, 2012

I'm on linux (64bit), no problems for the uci options changes...but I don't
know where to start with clop :-) I think I will have to expose the
parameters I want to tune as UCI parameters ?

2012/9/21 Gary Linscott notifications@github.com

@mcostalba https://github.com/mcostalba, Yes, I think I agree with you
:). We can tune afterwards, but popcount tested solidly at both time
controls. Tuning for speed/32 bit seems like an excellent job for CLOP.

@jromang https://github.com/jromang, CLOP is a bit of work to set up,
and requires that the parameters to be tuned are accessible as uci options,
so requires minor source changes usually. But it would be great having
someone else tuning things! What platform are you running on?


Reply to this email directly or view it on GitHubhttps://github.com//pull/23#issuecomment-8775240.

@mcostalba
Copy link
Owner

On Fri, Sep 21, 2012 at 7:54 PM, Jean-Francois Romang <
notifications@github.com> wrote:

I'm on linux (64bit), no problems for the uci options changes...but I
don't
know where to start with clop :-) I think I will have to expose the
parameters I want to tune as UCI parameters ?

Yes, you have to expose the parameters as UCI parameters, but this is very
easy, then you have to refer to those parametrs in CLOP config file, then
bind it to the cutechess glue script.

It is more difficult to explain than to do: later this evening I will setup
an example tuning branch and everything will be clear :-)

@jromang
Copy link
Contributor

jromang commented Sep 21, 2012

I made the UCI changes :
41fce19dd9c3b42df1c59c498dd533534a10b73e
Is this the right way to do it ?

@mcostalba
Copy link
Owner

On Fri, Sep 21, 2012 at 9:19 PM, Jean-Francois Romang <
notifications@github.com> wrote:

I made the UCI changes :
41fce1941fce19
Is this the right way to do it ?


Reply to this email directly or view it on GitHubhttps://github.com//pull/23#issuecomment-8777690.

No it is not.

I have pushed branch "clop_tuning":

https://github.com/mcostalba/Stockfish/tree/clop_tuning

Where I have done all the setup to tune endgames values for both rook and
queen in a step by step approach so to make it clear how it works, please
read each of the 3 commits and should be evident. In case of doubts ask.

@jromang
Copy link
Contributor

jromang commented Sep 21, 2012

Thanks marco, this is crystal clear :-)
I think my computer will be 'clopping' a lot in the future !

2012/9/22 Marco Costalba notifications@github.com

On Fri, Sep 21, 2012 at 9:19 PM, Jean-Francois Romang <
notifications@github.com> wrote:

I made the UCI changes :
41fce19<
41fce19dd9c3b42df1c59c498dd533534a10b73e>

Is this the right way to do it ?


Reply to this email directly or view it on GitHub<
https://github.com/mcostalba/Stockfish/pull/23#issuecomment-8777690>.

No it is not.

I have pushed branch "clop_tuning":

https://github.com/mcostalba/Stockfish/tree/clop_tuning

Where I have done all the setup to tune endgames values for both rook and
queen in a step by step approach so to make it clear how it works, please
read each of the 3 commits and should be evident. In case of doubts ask.


Reply to this email directly or view it on GitHubhttps://github.com//pull/23#issuecomment-8782032.

@jromang
Copy link
Contributor

jromang commented Sep 22, 2012

Here are my results so far (still running) :

Results
Plot1
Plot2

If I understand it well, it seems RookOn7thBonus and QueenOn7thBonus are not usefull regarding elo gains ?

@mcostalba
Copy link
Owner

Have you double checked everything works? Namely the options are actually
changed in SF?

Before a long session test I add some file logging in SF to be sure
parameters are actually changed according to the values in CLOP, just to be
sure to run 30K games for nothing ;-)

You can use Log class that comes handy in this cases.

@mcostalba
Copy link
Owner

I have pushed a patch to clop_tuning to better clarify what I mean:

a2976fd

@glinscott
Copy link
Contributor Author

It looks like RookBonusPerPawn/QueenBonusPerPawn are much more ELO sensitive, as CLOP is focusing in on that narrow area. Might be worth trying just optimizing those. Also, the values it has right now could be good, CLOP's elo estimates are sometimes off by a bit (usually too optimistic, but not always).

@jromang
Copy link
Contributor

jromang commented Sep 22, 2012

I will let my test run until I have some 'Max' values, but the second test
confirms that RookBonusPerPawn/QueenBonusPerPawn is more sensitive.
Thanks for all your explanations about CLOP Gary...I'm really having fun
with this tool :-) I wish a had some cluster to run hundreds of games
simultaneoulsy !

2012/9/22 Gary Linscott notifications@github.com

It looks like RookBonusPerPawn/QueenBonusPerPawn are much more ELO
sensitive, as CLOP is focusing in on that narrow area. Might be worth
trying just optimizing those. Also, the values it has right now could be
good, CLOP's elo estimates are sometimes off by a bit (usually too
optimistic, but not always).


Reply to this email directly or view it on GitHubhttps://github.com//pull/23#issuecomment-8788459.

@glinscott glinscott closed this Sep 22, 2012
joergoster pushed a commit to joergoster/Stockfish-old that referenced this pull request Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants