-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adjust scale and optimism #4036
Conversation
The double-kill numbers are small, so I am not sure how important they are, although they do seem consistent across these tests, progressively weaker in the single / double / triple tests. For more aggressive play (higher contempt as it used to be) presumably a lower draw ratio is the real aim and the draw ratio in the passed LTC (46.5%) seems no worse than recent LTC yellow tests and better than some, contradicting the lower double kill rates. (Fewer wins as black but more as white?) Maybe others have thoughts on that? So let's go with this elo gain if we can, maybe it will lead to further changes ... The real issue is the failed STC. Does that indicate good scaling (good) or an unreliable patch that has only passed one test (bad)? Up to maintainers to decide ... FYI, if this is to be merged, maintainers prefer to have a single commit with the PR comment ready to merge, so that suggests a new branch & PR instead of this one, but we could just keep this PR for discussion for now. |
@xoto10's scaleopt tune resulted in a yellow LTC, but the main parameter shift looked almost exactly like the tune rate reduction schedule, so I tried exaggerating that param. It worked! (?) LTC green: https://tests.stockfishchess.org/tests/view/628c709372775f382300f03e LLR: 2.93 (-2.94,2.94) <0.50,3.00> Total: 70112 W: 18932 L: 18584 D: 32596 Ptnml(0-2): 66, 6904, 20757, 7274, 55 STC red (not terribly so): https://tests.stockfishchess.org/tests/view/6290e4441e7cd5f29966bdc8 LLR: -2.96 (-2.94,2.94) <0.00,2.50> Total: 59976 W: 15919 L: 16018 D: 28039 Ptnml(0-2): 250, 6791, 15974, 6754, 219 xoto10's first yellow LTC: https://tests.stockfishchess.org/tests/view/6288a33f817227d3e5c5b05d Given the success of this first tweak, I tried twice more to further exaggerate the param, giving two yellows. double exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e140372775f38230129a6 triple exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e2caf72775f3823012d45 For all tests listed here, master's double-kill rate noticeably exceeded the patch's double-kill rate. This seems to indicate that this patch loses a bit of aggression, against xoto's original goal, but LTC elo is LTC elo. bench 6410652
meh I was hoping that this PR would preserve the commit history, giving you credit, but I just squashed it myself and force pushed. Also rebased on master, which I think was the cause of the bench test error. What needs to be improved about the commit message? |
You could edit out the "xoto thoughts?" line in github, but I think the real question is whether maintainers want any more tests running. I imagine they'll get back to us after the weekend ... |
I don't think more tests are needed, LTC pass with simple param tweaks. |
Nice work @dubslow ! 👍 😆 |
xoto10's scaleopt tune resulted in a yellow LTC, but the main parameter shift looked almost exactly like the tune rate reduction schedule, so further increases of that param were tried. Joint work xoto10 and dubslow. passed LTC: https://tests.stockfishchess.org/tests/view/628c709372775f382300f03e LLR: 2.93 (-2.94,2.94) <0.50,3.00> Total: 70112 W: 18932 L: 18584 D: 32596 Ptnml(0-2): 66, 6904, 20757, 7274, 55 failed STC: https://tests.stockfishchess.org/tests/view/6290e4441e7cd5f29966bdc8 LLR: -2.96 (-2.94,2.94) <0.00,2.50> Total: 59976 W: 15919 L: 16018 D: 28039 Ptnml(0-2): 250, 6791, 15974, 6754, 219 similar LTC's were yellow first yellow LTC: https://tests.stockfishchess.org/tests/view/6288a33f817227d3e5c5b05d double exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e140372775f38230129a6 triple exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e2caf72775f3823012d45 closes official-stockfish#4036 bench 6410652
@xoto10's scaleopt tune resulted in a yellow LTC, but the main parameter shift looked almost exactly like the tune rate reduction schedule, so I tried exaggerating that param. It worked!
LTC green: https://tests.stockfishchess.org/tests/view/628c709372775f382300f03e
LLR: 2.93 (-2.94,2.94) <0.50,3.00>
Total: 70112 W: 18932 L: 18584 D: 32596
Ptnml(0-2): 66, 6904, 20757, 7274, 55
STC red (not terribly so): https://tests.stockfishchess.org/tests/view/6290e4441e7cd5f29966bdc8
LLR: -2.96 (-2.94,2.94) <0.00,2.50>
Total: 59976 W: 15919 L: 16018 D: 28039
Ptnml(0-2): 250, 6791, 15974, 6754, 219
xoto10's first yellow LTC: https://tests.stockfishchess.org/tests/view/6288a33f817227d3e5c5b05d
Given the success of this try, I tried twice more to further exaggerate the param, giving two very-borderline yellows.
double exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e140372775f38230129a6
triple exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e2caf72775f3823012d45
For all tests listed here, master's double-kill rate noticeably exceeded the patch's double-kill rate. This seems to indicate that this patch loses a bit of aggression, against xoto's original goal, but LTC elo is LTC elo.
xoto, thoughts?
bench 6800167