-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change of the sent to the gui score #1868
Comments
Centipawn scoring has no meaning nevertheless, just people happen to agree that the measurement is based on a pawn's worth is roughly 100, as you quoted in its original form. Changing it to any value that you see fit is simple because you can make the change and compile the code, but it also has no meaning to have it applied to everyone else. |
@noobpwnftw the "GUI" may adjudicate "Resign at N centipawns" earlier in the 100 / PawnValueEg vs 70 / PawnValueEg; |
I agree with @noobpwnftw , just increase the N value to make it resign later or change the code for your own personal use. There is no right or wrong answer - it is was the user decides. |
In my opinion, the problem isn't the resign you can set via the gui but the coherence with the same gui (in cp) score. We know we can't change it and it's the following: For example, if you launch, in infinite analysis and disactivated contempt, Stockfish, the score visualized by the gui is about 36 and to me it's too optimistic. |
What is this "the GUI" which has the following properties (that requires changing this engine used on millions of devices)?
|
Every known gui: Fritz, Arena, etc... |
According to Stack Exchange "decisive advantage" threshold is 150 between engines, 200 between GMs, etc. Using Stockfish evaluations, a regression was done which yielded a "winning chances" curve. Perhaps a similar experiment could be conducted for other engines: |
Internally, we can continue to process our scores as we like. From chessbase: The winning chance curve published in the post just above is from 2014. one would have to repeat the experiment, and such result could be use to produce a calibrated output. |
This is not the question at all. |
@amchess I do not understand, if that is not the question, why is it not possible for people to adjust the centipawn scale as they see fit by building a custom version of Stockfish? |
Only one final thing: I made this modification on my derivative ShashChess and a correspondence chess GM told me its evaluation is aligned, for example, with Houdini and Komodo (and the guis). |
For what it's worth, I do think the current Stockfish evaluation is "inflated," in the sense that if you analyze the start position with odds, the evals are much higher in magnitude than you'd expect. b1-knight odds a1-rook odds queen odds g2-pawn odds (I didn't want to cherrypick the worst pawn to remove, which is likely f2, so I picked one of the worse ones, keeping in mind that this is balanced by the fact that it's white to move) Normally you'd expect (as an approximation) -3 for knight odds at low depth, -5 for rook and -9 for queen, but these evals are higher by about 30-50%. As it turns out, If people think this is an issue, one possibility that doesn't require periodically calibrating output vs. winrate is to use Of course, there are objections to this approach:
I just wanted to give this as a suggestion to keep the discussion going, because not only is adjudication affected, but also many people use Stockfish to analyze human games and would benefit from knowing that, for example, +1 does correspond to a pawn up on average. |
since when depth 20 is "low"? |
It's "low" enough that the eval is quite stable around depth 15-25. It's enough to see that white has a much worse position by a knight, but that's about it. This is why I chose a quiet position rather than a tactical position. But I realize an argument can still be made that depth 20 is not "low." Instead, let's look at the static evals instead of depth 20 evals: So, it's true that these evals appear less inflated than before, but they're still inflated by 20-40%, and the point still stands. (For the g2-pawn, from benefit of hindsight, removal of g2 gives partial compensation by allowing immediate fianchettoing. The static eval sees a +0.31 mobility bonus as a result, so this wasn't a great example of a position worth -1.) This is a reflection of the fact that the ratio |
I’m not opposed to changing this - I would just hope that any changes are well thought out as I don’t believe the real solution is a simple multiple x by y. (Although if that is the solution , it would not bother me ). |
g2 pawn odds are -0.89. |
The sole reason for this is the +0.31 mobility bonus in the static eval, because the f1-bishop now has 2 squares instead of 0. (The bonus is -48 for 0 squares and 16 for 2 squares, so
PawnValueMg is not the actual value of a pawn, and KnightValueMg is not the actual value of a knight. As I stated in previous posts, the piece square table and other bonuses like mobility skew the piece values. If I change PawnValueMg from 136 to 36 and increase the PSQT for midgame pawns by 100, does Stockfish now think a knight is worth Here's a good way to find how many pawns a knight is worth:
|
I agree with @Vizvezdenec , our pawn eval is "deflated" because this raw eval is a small part of pawn evaluation relatively to other pieces. @man4 is also right that our evaluation features are not orthogonal, it would maybe be nice to change piece value/PSQT so that PSQTs average to 0. Unfortunately this is not a trivial change since we use pawn value in other places in the code. Edit : To illustrate, the knight PSQT averages to -12 and -14 for mg and eg respectively. To be fair, it would be more interesting to measure what is the average PSQT of pieces in a game. Also raw piece value is used for SEE, so it would never be non functional to remove 100 to the raw value to add 100 to the psqt. |
To get back to the discussion, I think that what should be done is to have a clearer correspondence between win rate and evaluation, maybe like Houdini does. |
Score sent to "gui"(in this case cutechess-cli) has major implications since |
What I'm curious about here is: if PSQT and pawn values were re-normalized to increase a pawn's value and reduce the bonuses (which I assume would be a functional change) and/or other pawn parameters, could Elo be gained? Granted, these bonuses are already quite low, and this is a rather open-ended question, but still: Lines 93 to 102 in 14c4a40
Lines 154 to 173 in 14c4a40
|
@ddugovic Following my comment I had done the following tuning, but without success. Feel free to do it however you like though, I agree that there is probably elo to be gained. http://tests.stockfishchess.org/tests/view/5c1cb6980ebc5902ba128ff2 (There was a first tuning in which i forgot to set the base engine to my tuning branch, and i took the output value for the second tuning nonetheless, figuring it should be ok, if you want to redo it properly) |
The Stockfish score is not at all aligned with the gui meaning, like other knowns engines.
Based on my tests, in my opinion, on the uci.cpp file, simply I propose to make this modification (not affecting the playing strenght):
ss << "cp " << v * 100 / PawnValueEg;
to
ss << "cp " << v * 70 / PawnValueEg;
The text was updated successfully, but these errors were encountered: