New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved View of the 7th Rank #22
Conversation
The 7th rank is not a large bonus if there are no pawns on the rank.
|
Hello. |
|
Here is a windows executable of Ryan's branch : https://dl.dropbox.com/u/3042900/stockfish.7z |
|
Thanks Ryan for you efforts. I have tested your first version at fast TC of 15"+0.05 just to get an idea: Grand totals after 2815 games (crashes: 0) for ryan ryan_2c577 - master_3df2c: 479 - 498 - 1838 ELO -2 (+- 7.7) So I failed to measure any increase. Best |
|
Hi Ryan, Very interesting idea! Is your intent to have the pawn bonus be applied whenever a rook/queen is on the same rank as enemy pawns (not just the seventh rank)? One thing, when using bitboards, it's usually more efficient to operate directly on the bitboards. Check out the commit at glinscott@2c5acc7 for an example of how this could be done. For the performance of the patch, an ELO gain of 10 would be fantastic. If you are seeing an ELO gain of 80, there is probably something wrong with the tests. Thanks, [Edit: Just saw Marco's test. I think there are good ideas to be explored here, they just need to be tested thoroughly.] |
|
Yes, I second Gary: a failure doesn't means idea is not good, simply that perhaps should be 'massaged' a bit. As a hint I'd suggest to first test with short TC, if you see some interesting at these TC then verify with longer ones, it is faster than directly test at longer ones (40 moves in 4 minutes it is a very long TC for normal engine development standards, OTH a test ran on just 150 games brings defenitly little information on the table). |
|
Just as an example of how many games it takes to become certain of an improvement, I'm running this as a test locally at 4 seconds + 0.05/move, with the results below. It looks great so far, showing +19.7 ELO, but the error bar at 99% is +-50 ELO. So, in reality it could be a -30 ELO change, or +70 ELO change. And that is after 336 games. Wins: 71 Losses: 52 Draws: 213 |
|
And to show how fickle the testing gods can be, here is an update 20 minutes later: Wins: 134 Losses: 128 Draws: 474 |
|
Thank you for your interest in my project. @glinscott The intent of this branch is as you thought. While testing, some elements should be noted. Please ensure that the build you are running is compiled on the same computer as the other. My own tests have yielded: Wins: 32 ELO increase: 56 points Regardless, thank you for your testing. |
|
Your idea seems very promising, that's why I gave it a test! I'm testing on Linux, so the binaries are definitely compiled on the same system ;). Those are solid results indeed, and warrant continuing the test. However, if we compute the 95% error bar, it's still +- 69 elo. 99% error is at +-109 elo. You'll want to play at least 1000 games before the error bars start becoming reasonable. That's why so many engine authors are testing at super-blitz controls. There just isn't enough time/computer power to verify otherwise. |
|
To me, it seems that the largest problem is performance. In 1 second control, an extra 10% speed is very important and may yield 3 extra ply (based on computer speeds) This should be noted when testing. Perhaps when performance issues are worked on, the tests will work better Although I am not an expert, I believe that using pure mathematics to find the error margin is inaccurate. In computer chess, a code change will cause the computer to better in some positions and potentially worse in others. Therefor, although there is certainly still a margin of error in computer chess, the amount of games required to prove improvement is lower than the most common application, polling. Regardless, if you disagree, you are much more qualified than I am in the subject of computer chess.. |
|
I tested the speed and found that my change makes the nodes/second 7% slower. Due to this, it makes more sense to test glinscott's revision as it resolves the bulk of the performance issues. Although the tests are inconclusive, glinscott's version tested at only 2% slower. |
|
We can increase the speed even a little more by only doing the rook/queen attacking pawns past relative rank 4. I'm testing with that locally. No results yet unfortunately. Attacking pawns by rooks/queens is already calculated in the evaluate_threats function, so there is a little overlap there. Might be worth decreasing the entries for rook attacks pawn, queen attacks pawn a little. |
|
Any results from the optimization? It was sounding promising. |
|
I tried a few things, but couldn't get anything showing gains beyond error bars. |
|
I have a new computer that should be able to better for testing. I am going to implement a temporary UCI option and work on the bonus amount. |
|
If you want to do some serious testing I'd suggest to greatly increase the OTH you can reduce time control to say 15"+0.1 per game, you can also test All in all each test it requires about 2 days. |
|
Right now, I am going for a 90% LOS. It should also be more accurate as I am running it with the bounds of half-moves instead of time. This should not cause an inaccuracy as they speed is the same as the other variations. My current test is displays a 85% LOS at the moment. |
|
No testing at fixed depth is not accurate because it is an artificial The correct testing is with time limits. You can test at fixed depths as a |
|
Would this mistake modify the result of an LOS test? Also, what goal do you generally aim for in an LOS test? |
|
LOS test has a meaning if you play at least some thousand games. Sorry but As I said testing at fixed depth does not give a defenitive answer so a |
|
Hmm, from what I gathered, LOS calculations account for the variation in testing. For example, if a computer wins 10 out of 10 games, there is no sufficient chance of the other computer being better. Also, I am not currently testing against the current branch, but instead my own version with only the bonus adjusted. I am going to reset the test to your recommended time constraints. One question though, does it make sense to test 3 versions in gauntlet or simply 2 versions and go from there? |
|
Another quick questions, do testing suites work well? They seem like a good idea, but it seems to have its flaws. |
|
Testing suites can give an indication if a change is helpful, but games are really the gold standard. I tested the latest change I made (restricting the rook/queen bonus to rank 6 and above), and here are the results: So, still well within margin of error, even though LOS is 88%. Also, to determine the improvement, you should test against the version without any changes. |
|
Even to determine improvement over the other version? I am looking to put up the best possible version so that I will be able to get a solid elo improvement in the pushed version. |
|
The number of games it takes to get a significant result is the problem when testing modified versions against each other. Eventually you will have to test against the baseline anyway, to confirm it's an improvement. |
Majorly improved the bonus amount to better evaluate a position.
We do not give a bonus if the piece is on the 8th or 1st rank, as pawns can never be there. In addition, we check to ensure that their is at least one pawn on the rank before calculating its population.
|
@glinscott Your version in which it must be above a certain rank seems to be unnecessary with my new optimization. If their are no pawns on the rank, then it will not calculate the population. In my experiments, this should cut down the time of calculating the bonus (excluding the construction of the bitboard) by 7 times. In addition, we no longer check for pawns when the rook is on the eighth or first rank, as it often is. |
|
That is a good optimization. However, the idea behind checking only rank 5 or greater was also to limit the bonus a little. Probably doesn't make much difference though. For your new patch, the condition for checking for rank 1 or 8 is backwards. Also, the check for the rook being on rank 7 is incorrect. |
|
One really useful tip when doing performance optimizations that Marco showed me. Make sure that the stockfish bench command total nodes matches after doing your optimization. |
|
I tweaked things a bit, and the current version is looking promising after a few thousand games. Also added in your optimization, which did indeed help NPS. https://github.com/glinscott/Stockfish/compare/rook7th Wins: 873 Losses: 733 Draws: 2758 |
Fixed problems with the 7th rank bonus (thanks to Glinscott). Correctly implemented the previous change in value.
Reimplemented the optimization cutting the calculation time when a rook or queen is on the first or eight rank.
|
Your corrections to my code were implemented. Besides a formatting change, I removed your six rank plus edit and added back my optimization that cuts out the calculation of the eight and first rank. My major bonus amount tweak was added back as I accidentally pushed the testing version, not the compile-ready version. |
|
Cool, did you get a chance to check the bench nodes though? The check for the rook being on rank 1 or 8 is still backwards. Also, I'm not sure we really need that check, as the "if (pawns)" check is pretty quick by itself. |
|
@RyanTaker, I'm now running a test of your current version (with the fixed rank 1/8 check), against this version #23. Will update in a few hours when the error bars are reasonable. |
|
Alright, thank you for helping out |
|
Current results are inconclusive, but looking like it could be a small help. This is running this pull version against #23. Wins: 813 Losses: 763 Draws: 2997 |
|
Okay, we will have to see how mcostalba's tests do on your pull. |
|
I am deleting this pull as it has been made irrelevant by glinscott's pushed version. |
|
Thanks for the awesome idea Ryan! Your weights were really good as well. How did you pick them? |
|
I started deriving them from the original 7th rank bonus, dividing them by 5. From there, just did small tests for tuning. |
In the old version of evaluate.cpp, the 7th rank bonus was based on the sheer fact of the rook's position and the king's.
My revision devised this bonus into two parts. I kept the old bonus, but lowered it by a large margin. It also gives a bonus based on the amount of pawns on the rook's rank.
This change is based on the idea that a rook on the 7'th rank is useless (or next to) unless there are pawns on that rank.
The most promising part of this update is that most of the games resulted in a win or a loss, showing a large place for improvement on top of it.