Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved View of the 7th Rank #22

Closed
wants to merge 11 commits into from
Closed

Improved View of the 7th Rank #22

wants to merge 11 commits into from

Conversation

RyanTaker
Copy link
Contributor

In the old version of evaluate.cpp, the 7th rank bonus was based on the sheer fact of the rook's position and the king's.

My revision devised this bonus into two parts. I kept the old bonus, but lowered it by a large margin. It also gives a bonus based on the amount of pawns on the rook's rank.

This change is based on the idea that a rook on the 7'th rank is useless (or next to) unless there are pawns on that rank.

The most promising part of this update is that most of the games resulted in a win or a loss, showing a large place for improvement on top of it.

@Gleperlier
Copy link

Hello.
I am up for longer time control testing. Just tell we what kind of testing you want and as I am a Windows user, may I have link to download an exe ?
Gab

@jromang
Copy link
Contributor

jromang commented Sep 1, 2012

Here is a windows executable of Ryan's branch : https://dl.dropbox.com/u/3042900/stockfish.7z

@mcostalba
Copy link
Owner

Thanks Ryan for you efforts.

I have tested your first version at fast TC of 15"+0.05 just to get an idea:

Grand totals after 2815 games (crashes: 0) for ryan

ryan_2c577 - master_3df2c: 479 - 498 - 1838 ELO -2 (+- 7.7)

So I failed to measure any increase.

Best
Marco

@glinscott
Copy link
Contributor

Hi Ryan,

Very interesting idea! Is your intent to have the pawn bonus be applied whenever a rook/queen is on the same rank as enemy pawns (not just the seventh rank)?

One thing, when using bitboards, it's usually more efficient to operate directly on the bitboards. Check out the commit at glinscott@2c5acc7 for an example of how this could be done.

For the performance of the patch, an ELO gain of 10 would be fantastic. If you are seeing an ELO gain of 80, there is probably something wrong with the tests.

Thanks,
Gary

[Edit: Just saw Marco's test. I think there are good ideas to be explored here, they just need to be tested thoroughly.]

@mcostalba
Copy link
Owner

Yes, I second Gary: a failure doesn't means idea is not good, simply that perhaps should be 'massaged' a bit.

As a hint I'd suggest to first test with short TC, if you see some interesting at these TC then verify with longer ones, it is faster than directly test at longer ones (40 moves in 4 minutes it is a very long TC for normal engine development standards, OTH a test ran on just 150 games brings defenitly little information on the table).

@glinscott
Copy link
Contributor

Just as an example of how many games it takes to become certain of an improvement, I'm running this as a test locally at 4 seconds + 0.05/move, with the results below. It looks great so far, showing +19.7 ELO, but the error bar at 99% is +-50 ELO. So, in reality it could be a -30 ELO change, or +70 ELO change. And that is after 336 games.

Wins: 71 Losses: 52 Draws: 213
LOS: 95.622362%
ELO: 19.667636 +- 99%: 49.785935 95%: 32.394194
Win%: 52.827381 +- 99%: 7.036745 95%: 4.609341

@glinscott
Copy link
Contributor

And to show how fickle the testing gods can be, here is an update 20 minutes later:

Wins: 134 Losses: 128 Draws: 474
LOS: 64.426182%
ELO: 2.832418 +- 99%: 33.190988 95%: 21.697846
Win%: 50.407609 +- 99%: 4.758079 95%: 3.116726

@RyanTaker
Copy link
Contributor Author

Thank you for your interest in my project. @glinscott The intent of this branch is as you thought.
It started as being excessively for the 7th rank, but was later expanded in scope.

While testing, some elements should be noted.

Please ensure that the build you are running is compiled on the same computer as the other.
In addition, blitz is notorious for being a bad indicator of improvement.

My own tests have yielded:
Tournament game - 4 mins per 40 moves
Ryan Stockfish - Benchmark Stockfish

Wins: 32
Losses: 19
Draws: 31

ELO increase: 56 points

Regardless, thank you for your testing.

@glinscott
Copy link
Contributor

Your idea seems very promising, that's why I gave it a test! I'm testing on Linux, so the binaries are definitely compiled on the same system ;).

Those are solid results indeed, and warrant continuing the test. However, if we compute the 95% error bar, it's still +- 69 elo. 99% error is at +-109 elo. You'll want to play at least 1000 games before the error bars start becoming reasonable. That's why so many engine authors are testing at super-blitz controls. There just isn't enough time/computer power to verify otherwise.

@RyanTaker
Copy link
Contributor Author

To me, it seems that the largest problem is performance.
EDIT: Your commit seems to be a much more efficient method of performing the change.

In 1 second control, an extra 10% speed is very important and may yield 3 extra ply (based on computer speeds)
In 3 hour control, an extra 10% will potentially do nothing.

This should be noted when testing. Perhaps when performance issues are worked on, the tests will work better

Although I am not an expert, I believe that using pure mathematics to find the error margin is inaccurate. In computer chess, a code change will cause the computer to better in some positions and potentially worse in others.
Because of this, the positions by which a computer does poorly on, is usually the same position that it will continue to do poorly on.

Therefor, although there is certainly still a margin of error in computer chess, the amount of games required to prove improvement is lower than the most common application, polling.

Regardless, if you disagree, you are much more qualified than I am in the subject of computer chess..

@RyanTaker
Copy link
Contributor Author

I tested the speed and found that my change makes the nodes/second 7% slower. Due to this, it makes more sense to test glinscott's revision as it resolves the bulk of the performance issues.

Although the tests are inconclusive, glinscott's version tested at only 2% slower.
@mcostalba this should be enough to be an increase in ELO

@glinscott
Copy link
Contributor

We can increase the speed even a little more by only doing the rook/queen attacking pawns past relative rank 4. I'm testing with that locally. No results yet unfortunately.

Attacking pawns by rooks/queens is already calculated in the evaluate_threats function, so there is a little overlap there. Might be worth decreasing the entries for rook attacks pawn, queen attacks pawn a little.

@mobilewebdevs
Copy link

Any results from the optimization? It was sounding promising.

@glinscott
Copy link
Contributor

I tried a few things, but couldn't get anything showing gains beyond error bars.

@RyanTaker
Copy link
Contributor Author

I have a new computer that should be able to better for testing. I am going to implement a temporary UCI option and work on the bonus amount.

@mcostalba
Copy link
Owner

If you want to do some serious testing I'd suggest to greatly increase the
number of tested games to at least 5000.

OTH you can reduce time control to say 15"+0.1 per game, you can also test
with a single thread if you, like me, don' t like to hear the fan noise ;-)

All in all each test it requires about 2 days.

@RyanTaker
Copy link
Contributor Author

Right now, I am going for a 90% LOS. It should also be more accurate as I am running it with the bounds of half-moves instead of time. This should not cause an inaccuracy as they speed is the same as the other variations.

My current test is displays a 85% LOS at the moment.

@mcostalba
Copy link
Owner

No testing at fixed depth is not accurate because it is an artificial
limit: at midgame depth is different than at endgame and also each move can
requires different depths.

The correct testing is with time limits. You can test at fixed depths as a
kind of pre-filtering, and then verify at time control limits. Please take
this just as a comment by someone that has done a lot of tests in the past:
of course you are free to do whatever you want :-)

@RyanTaker
Copy link
Contributor Author

Would this mistake modify the result of an LOS test? Also, what goal do you generally aim for in an LOS test?

@mcostalba
Copy link
Owner

LOS test has a meaning if you play at least some thousand games. Sorry but
there are no shortcuts in testing (at least none that I know).

As I said testing at fixed depth does not give a defenitive answer so a
verification testing at time limit is mandatory for a patch to be
committed, at least in SF.

@RyanTaker
Copy link
Contributor Author

Hmm, from what I gathered, LOS calculations account for the variation in testing. For example, if a computer wins 10 out of 10 games, there is no sufficient chance of the other computer being better.

Also, I am not currently testing against the current branch, but instead my own version with only the bonus adjusted.

I am going to reset the test to your recommended time constraints. One question though, does it make sense to test 3 versions in gauntlet or simply 2 versions and go from there?

@RyanTaker
Copy link
Contributor Author

Another quick questions, do testing suites work well? They seem like a good idea, but it seems to have its flaws.

@glinscott
Copy link
Contributor

Testing suites can give an indication if a change is helpful, but games are really the gold standard.

I tested the latest change I made (restricting the rook/queen bonus to rank 6 and above), and here are the results:
Wins: 2887 Losses: 2799 Draws: 10472
LOS: 87.837990%
ELO: 1.892229 +- 99%: 7.053875 95%: 5.358304
Win%: 50.272311 +- 99%: 1.014852 95%: 0.770973

So, still well within margin of error, even though LOS is 88%. Also, to determine the improvement, you should test against the version without any changes.

@RyanTaker
Copy link
Contributor Author

Even to determine improvement over the other version? I am looking to put up the best possible version so that I will be able to get a solid elo improvement in the pushed version.

@glinscott
Copy link
Contributor

The number of games it takes to get a significant result is the problem when testing modified versions against each other. Eventually you will have to test against the baseline anyway, to confirm it's an improvement.

Majorly improved the bonus amount to better evaluate a position.
We do not give a bonus if the piece is on the 8th or 1st rank, as pawns can never be there.
In addition, we check to ensure that their is at least one pawn on the rank before calculating its population.
@RyanTaker
Copy link
Contributor Author

@glinscott Your version in which it must be above a certain rank seems to be unnecessary with my new optimization. If their are no pawns on the rank, then it will not calculate the population. In my experiments, this should cut down the time of calculating the bonus (excluding the construction of the bitboard) by 7 times.

In addition, we no longer check for pawns when the rook is on the eighth or first rank, as it often is.

@glinscott
Copy link
Contributor

That is a good optimization. However, the idea behind checking only rank 5 or greater was also to limit the bonus a little. Probably doesn't make much difference though.

For your new patch, the condition for checking for rank 1 or 8 is backwards. Also, the check for the rook being on rank 7 is incorrect.

@glinscott
Copy link
Contributor

One really useful tip when doing performance optimizations that Marco showed me. Make sure that the stockfish bench command total nodes matches after doing your optimization.

@glinscott
Copy link
Contributor

I tweaked things a bit, and the current version is looking promising after a few thousand games. Also added in your optimization, which did indeed help NPS. https://github.com/glinscott/Stockfish/compare/rook7th

Wins: 873 Losses: 733 Draws: 2758
LOS: 99.976244%
ELO: 11.149789 +- 99%: 13.601650 95%: 10.327687
Win%: 51.604033 +- 99%: 1.951972 95%: 1.482893

Fixed problems with the 7th rank bonus (thanks to Glinscott).
Correctly implemented the previous change in value.
Reimplemented the optimization cutting the calculation time when a rook or queen is on the first or eight rank.
@RyanTaker
Copy link
Contributor Author

Your corrections to my code were implemented. Besides a formatting change, I removed your six rank plus edit and added back my optimization that cuts out the calculation of the eight and first rank.

My major bonus amount tweak was added back as I accidentally pushed the testing version, not the compile-ready version.

@glinscott
Copy link
Contributor

Cool, did you get a chance to check the bench nodes though? The check for the rook being on rank 1 or 8 is still backwards.

Also, I'm not sure we really need that check, as the "if (pawns)" check is pretty quick by itself.

@glinscott
Copy link
Contributor

@RyanTaker, I'm now running a test of your current version (with the fixed rank 1/8 check), against this version #23. Will update in a few hours when the error bars are reasonable.

@RyanTaker
Copy link
Contributor Author

Alright, thank you for helping out

@glinscott
Copy link
Contributor

Current results are inconclusive, but looking like it could be a small help. This is running this pull version against #23.

Wins: 813 Losses: 763 Draws: 2997
LOS: 89.600674%
ELO: 3.722938 +- 99%: 13.269499 95%: 10.077641
Win%: 50.535753 +- 99%: 1.907707 95%: 1.449266

@RyanTaker
Copy link
Contributor Author

Okay, we will have to see how mcostalba's tests do on your pull.

@RyanTaker
Copy link
Contributor Author

I am deleting this pull as it has been made irrelevant by glinscott's pushed version.

@RyanTaker RyanTaker closed this Sep 22, 2012
@glinscott
Copy link
Contributor

Thanks for the awesome idea Ryan! Your weights were really good as well. How did you pick them?

@RyanTaker
Copy link
Contributor Author

I started deriving them from the original 7th rank bonus, dividing them by 5. From there, just did small tests for tuning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants