adjust scale and optimism #4036

dubslow · 2022-05-28T00:20:24Z

@xoto10's scaleopt tune resulted in a yellow LTC, but the main parameter shift looked almost exactly like the tune rate reduction schedule, so I tried exaggerating that param. It worked!

LTC green: https://tests.stockfishchess.org/tests/view/628c709372775f382300f03e
LLR: 2.93 (-2.94,2.94) <0.50,3.00>
Total: 70112 W: 18932 L: 18584 D: 32596
Ptnml(0-2): 66, 6904, 20757, 7274, 55

STC red (not terribly so): https://tests.stockfishchess.org/tests/view/6290e4441e7cd5f29966bdc8
LLR: -2.96 (-2.94,2.94) <0.00,2.50>
Total: 59976 W: 15919 L: 16018 D: 28039
Ptnml(0-2): 250, 6791, 15974, 6754, 219

xoto10's first yellow LTC: https://tests.stockfishchess.org/tests/view/6288a33f817227d3e5c5b05d
Given the success of this try, I tried twice more to further exaggerate the param, giving two very-borderline yellows.
double exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e140372775f38230129a6
triple exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e2caf72775f3823012d45

For all tests listed here, master's double-kill rate noticeably exceeded the patch's double-kill rate. This seems to indicate that this patch loses a bit of aggression, against xoto's original goal, but LTC elo is LTC elo.

xoto, thoughts?

bench 6800167

xoto10 · 2022-05-28T04:36:04Z

The double-kill numbers are small, so I am not sure how important they are, although they do seem consistent across these tests, progressively weaker in the single / double / triple tests. For more aggressive play (higher contempt as it used to be) presumably a lower draw ratio is the real aim and the draw ratio in the passed LTC (46.5%) seems no worse than recent LTC yellow tests and better than some, contradicting the lower double kill rates. (Fewer wins as black but more as white?) Maybe others have thoughts on that?

So let's go with this elo gain if we can, maybe it will lead to further changes ... The real issue is the failed STC. Does that indicate good scaling (good) or an unreliable patch that has only passed one test (bad)? Up to maintainers to decide ...

FYI, if this is to be merged, maintainers prefer to have a single commit with the PR comment ready to merge, so that suggests a new branch & PR instead of this one, but we could just keep this PR for discussion for now.

@xoto10

@xoto10's scaleopt tune resulted in a yellow LTC, but the main parameter shift looked almost exactly like the tune rate reduction schedule, so I tried exaggerating that param. It worked! (?) LTC green: https://tests.stockfishchess.org/tests/view/628c709372775f382300f03e LLR: 2.93 (-2.94,2.94) <0.50,3.00> Total: 70112 W: 18932 L: 18584 D: 32596 Ptnml(0-2): 66, 6904, 20757, 7274, 55 STC red (not terribly so): https://tests.stockfishchess.org/tests/view/6290e4441e7cd5f29966bdc8 LLR: -2.96 (-2.94,2.94) <0.00,2.50> Total: 59976 W: 15919 L: 16018 D: 28039 Ptnml(0-2): 250, 6791, 15974, 6754, 219 xoto10's first yellow LTC: https://tests.stockfishchess.org/tests/view/6288a33f817227d3e5c5b05d Given the success of this first tweak, I tried twice more to further exaggerate the param, giving two yellows. double exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e140372775f38230129a6 triple exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e2caf72775f3823012d45 For all tests listed here, master's double-kill rate noticeably exceeded the patch's double-kill rate. This seems to indicate that this patch loses a bit of aggression, against xoto's original goal, but LTC elo is LTC elo. bench 6410652

dubslow · 2022-05-29T08:45:47Z

meh I was hoping that this PR would preserve the commit history, giving you credit, but I just squashed it myself and force pushed. Also rebased on master, which I think was the cause of the bench test error.

What needs to be improved about the commit message?

xoto10 · 2022-05-29T10:12:58Z

You could edit out the "xoto thoughts?" line in github, but I think the real question is whether maintainers want any more tests running. I imagine they'll get back to us after the weekend ...

vondele · 2022-05-29T13:54:30Z

I don't think more tests are needed, LTC pass with simple param tweaks.

xoto10 · 2022-05-29T18:00:07Z

Nice work @dubslow ! 👍 😆

xoto10's scaleopt tune resulted in a yellow LTC, but the main parameter shift looked almost exactly like the tune rate reduction schedule, so further increases of that param were tried. Joint work xoto10 and dubslow. passed LTC: https://tests.stockfishchess.org/tests/view/628c709372775f382300f03e LLR: 2.93 (-2.94,2.94) <0.50,3.00> Total: 70112 W: 18932 L: 18584 D: 32596 Ptnml(0-2): 66, 6904, 20757, 7274, 55 failed STC: https://tests.stockfishchess.org/tests/view/6290e4441e7cd5f29966bdc8 LLR: -2.96 (-2.94,2.94) <0.00,2.50> Total: 59976 W: 15919 L: 16018 D: 28039 Ptnml(0-2): 250, 6791, 15974, 6754, 219 similar LTC's were yellow first yellow LTC: https://tests.stockfishchess.org/tests/view/6288a33f817227d3e5c5b05d double exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e140372775f38230129a6 triple exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e2caf72775f3823012d45 closes official-stockfish#4036 bench 6410652

dubslow force-pushed the scaleopt branch from 808019f to 4541760 Compare May 29, 2022 08:44

vondele added the to be merged Will be merged shortly label May 29, 2022

vondele closed this in 4c7de9e May 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adjust scale and optimism #4036

adjust scale and optimism #4036

dubslow commented May 28, 2022 •

edited

xoto10 commented May 28, 2022

dubslow commented May 29, 2022

xoto10 commented May 29, 2022

vondele commented May 29, 2022

xoto10 commented May 29, 2022

adjust scale and optimism #4036

adjust scale and optimism #4036

Conversation

dubslow commented May 28, 2022 • edited

xoto10 commented May 28, 2022

dubslow commented May 29, 2022

xoto10 commented May 29, 2022

vondele commented May 29, 2022

xoto10 commented May 29, 2022

dubslow commented May 28, 2022 •

edited