-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draw faint shaded areas around the eval line to indicate likeliness of a decisive result #248
base: master
Are you sure you want to change the base?
Draw faint shaded areas around the eval line to indicate likeliness of a decisive result #248
Conversation
a decisive result according to WDL * When eval is close to zero, this allows users to more easily identify which "drawn" positions are actually dead drawn vs. unclear/complex/sharp * When eval favors one side, this allows users to more easily identify which portions of the game contained more counterplay/complexity despite the advantage
Reduce potential merge conflicts w/ rooklift#248
Reduce potential merge conflicts w/ rooklift#248
cf4f1ae
to
db77356
Compare
Reduce potential merge conflicts w/ rooklift#248
Reduce potential merge conflicts w/ rooklift#248
Help to reduce merge conflicts w/ rooklift#237
db77356
to
279e905
Compare
It's an interesting idea. The main line of the graph seems to be missing for the start position somehow? |
Also, I suspect a lot of people want a WDL graph, which I don't think this quite is? (Not that I'd promise to accept such a thing.) |
Oh, thanks! Fixed be6715d |
By the way, I recall telling this to someone but not exactly who I told it to - Nibbler is barely maintained these days, and the codebase is a mess; I'm not enthusiastic about making any changes at all, unless there's bugs or feature requests from Lc0 devs... At least a couple of people maintain their own Nibbler forks (e.g. this one) and that might be the happier way to proceed. |
Oh, yes! Maybe a top-level message on the README.md that endorses a specific fork could be a great way to gather community interest in one place Otherwise, https://github.com/rooklift/nibbler/forks currently lists 59 different forks so new contributors (such as myself) will simply end up pushing PRs into the main repo instead. |
rooklift#248 (comment) Draw as a "WDL graph" instead
Take it from someone who has been in Software Engineering for a long time… all codebases eventually become a mess. The more widely used a product is, the more of a mess its code becomes over time 🙂
In any case, whether you ultimately decide to accept/merge this or not, I think you're right that the WDL version turned out to be both (a) more aesthetically pleasing, and (b) easier for a new user to understand — so the pull request has been updated into the WDL version. |
Hmm - is it simply drawing the Draw score as centred on the main line of the graph? I don't think that would be correct, e.g. in this image: The infobox tells me the current position has Black win of only 15 out of 1000, but it's certainly drawn as if it's more than that. I think to do it correctly you would need to actually use all 3 of the WDL numbers? (Or at least 2, the third can be inferred...) |
Yeah.
I did test various things, but ultimately ran into the same question that @Naphthalin describes when trying to read winrates directly from the engine:
This way, the centipawn value coming from If we were to calculate, by hand, the "translated" win% and loss% the same way as in #237 (comment) by treating |
Interesting work, I really like the shaded background! I personally see two possibilities; one is displaying the actual WDL (and leaving it to the user to use adequate WDL sharpness through the contempt settings), the other is to do even more maths, and directly use the |
@Naphthalin We'd want to use nibbler/files/src/renderer/50_table.js Line 40 in 34dba0a
right? |
No, idea would be to calculate |
Got it, yes I see that hereAnd, would we want a 50% confidence interval (rather than 95%) so that it expresses something akin to:
UPDATE:Worked! It's looking good: I've committed 107710b and will update the remaining screenshots |
rooklift#248 (comment) Draw as a "WDL graph" instead
3f345dd
to
51c4687
Compare
* Handle null values correctly
Directly use the `WDL_mu` stddev formula `2 / (ln(1/W-1) + ln(1/L-1))` to calculate the interval around the eval line.
I just realized that optically it suggests the opposite of what your initial plots did (i.e. the draw area going to 100% in case of a draw), and if you stick to the idea of the shaded areas representing uncertainty about the eval, I think the currently black part should be shaded instead. There are probably some edge case issues with |
"I just realized that optically it suggests the opposite of what your initial plots did (i.e. the draw area going to 100% in case of a draw), and if you stick to the idea of the shaded areas representing uncertainty about the eval, I think the currently black part should be shaded instead"
"There are probably some edge case issues with `W` or `L` being reported as 0 you need to deal with, e.g. by scaling WDL by 0.994 and adding 0.2% to each of W,D,L when calculating the line."
Yup, incorporated bfe752a Looks like there was a similar warning here as well https://github.com/LeelaChessZero/lc0/blob/076299b1f1ca21993b2c5e82ab3e80edb5367057/src/mcts/search.cc#L232-L236
Aha yes, committed 696c576 |
"There are probably some edge case issues with `W` or `L` being reported as 0 you need to deal with, e.g. by scaling WDL by 0.994 and adding 0.2% to each of W,D,L when calculating the line."
2cf337d
to
bfe752a
Compare
I doubt I'm going to accept this honestly. As I say I barely maintain Nibbler these days. |
(and leaving it to the user to use adequate WDL sharpness through the contempt settings)
In the interest of reducing maintenance burden to the bare minumum, I'll leave everything in "display raw WDL" mode so everything can be kept as simple as possible, codewise. Screenshots have been updated at the top, and it does add a nice aesthetic to Nibbler overall. Maybe this final simplified version is simple enough to be worth considering? Anyway the final decision is still yours, of course — hopefully this helps in some small part to make the decision easier. |
Showing "decisive" vs. "dead drawn" in some way allows Nibbler users to:
Testing
According to Lc0 evaluating Kramnik vs. Topalov's World Chess Championship 2006 Round 4:
36. …Qh4
but Topalov plays37. Ra1
rather than 37. e4, to keep the position sharp47. …Bxc4 48. Raxc4
The "Immortal Draw" from 1872: Karl Hamppe vs. Philipp Meitner
↓
(screenshot using the famous Karpov vs. Kasparov World Chess Championship 1987, Game 24)
↓
According to Lc0 evaluating Gelfand vs. Anand World Chess Championship 2012 Round 7:
25. …f6
by Anand, giving up his final winning chances(one of the more "exciting" draws in recent history: Karjakin vs. Carlsen World Chess Championship 2016, Game 2)
↓
And, just for fun: Deep Blue vs. Humankind 1997 Game 2
Old screenshots (ignore)
without laplace smoothing
[]
[]
[]
[]
raw drawrate centered on eval line