Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjust scale and optimism #4036

Closed
wants to merge 1 commit into from
Closed

Conversation

dubslow
Copy link
Contributor

@dubslow dubslow commented May 28, 2022

@xoto10's scaleopt tune resulted in a yellow LTC, but the main parameter shift looked almost exactly like the tune rate reduction schedule, so I tried exaggerating that param. It worked!

LTC green: https://tests.stockfishchess.org/tests/view/628c709372775f382300f03e
LLR: 2.93 (-2.94,2.94) <0.50,3.00>
Total: 70112 W: 18932 L: 18584 D: 32596
Ptnml(0-2): 66, 6904, 20757, 7274, 55

STC red (not terribly so): https://tests.stockfishchess.org/tests/view/6290e4441e7cd5f29966bdc8
LLR: -2.96 (-2.94,2.94) <0.00,2.50>
Total: 59976 W: 15919 L: 16018 D: 28039
Ptnml(0-2): 250, 6791, 15974, 6754, 219

xoto10's first yellow LTC: https://tests.stockfishchess.org/tests/view/6288a33f817227d3e5c5b05d
Given the success of this try, I tried twice more to further exaggerate the param, giving two very-borderline yellows.
double exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e140372775f38230129a6
triple exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e2caf72775f3823012d45

For all tests listed here, master's double-kill rate noticeably exceeded the patch's double-kill rate. This seems to indicate that this patch loses a bit of aggression, against xoto's original goal, but LTC elo is LTC elo.

xoto, thoughts?

bench 6800167

@xoto10
Copy link
Contributor

xoto10 commented May 28, 2022

The double-kill numbers are small, so I am not sure how important they are, although they do seem consistent across these tests, progressively weaker in the single / double / triple tests. For more aggressive play (higher contempt as it used to be) presumably a lower draw ratio is the real aim and the draw ratio in the passed LTC (46.5%) seems no worse than recent LTC yellow tests and better than some, contradicting the lower double kill rates. (Fewer wins as black but more as white?) Maybe others have thoughts on that?

So let's go with this elo gain if we can, maybe it will lead to further changes ... The real issue is the failed STC. Does that indicate good scaling (good) or an unreliable patch that has only passed one test (bad)? Up to maintainers to decide ...

FYI, if this is to be merged, maintainers prefer to have a single commit with the PR comment ready to merge, so that suggests a new branch & PR instead of this one, but we could just keep this PR for discussion for now.

@xoto10's scaleopt tune resulted in a yellow LTC, but the main parameter shift looked almost exactly like the tune rate reduction schedule, so I tried exaggerating that param. It worked! (?)

LTC green:
https://tests.stockfishchess.org/tests/view/628c709372775f382300f03e
LLR: 2.93 (-2.94,2.94) <0.50,3.00>
Total: 70112 W: 18932 L: 18584 D: 32596
Ptnml(0-2): 66, 6904, 20757, 7274, 55

STC red (not terribly so):
https://tests.stockfishchess.org/tests/view/6290e4441e7cd5f29966bdc8
LLR: -2.96 (-2.94,2.94) <0.00,2.50>
Total: 59976 W: 15919 L: 16018 D: 28039
Ptnml(0-2): 250, 6791, 15974, 6754, 219

xoto10's first yellow LTC: https://tests.stockfishchess.org/tests/view/6288a33f817227d3e5c5b05d
Given the success of this first tweak, I tried twice more to further exaggerate the param, giving two yellows.
double exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e140372775f38230129a6
triple exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e2caf72775f3823012d45

For all tests listed here, master's double-kill rate noticeably exceeded the patch's double-kill rate. This seems to indicate that this patch loses a bit of aggression, against xoto's original goal, but LTC elo is LTC elo.

bench 6410652
@dubslow
Copy link
Contributor Author

dubslow commented May 29, 2022

meh I was hoping that this PR would preserve the commit history, giving you credit, but I just squashed it myself and force pushed. Also rebased on master, which I think was the cause of the bench test error.

What needs to be improved about the commit message?

@xoto10
Copy link
Contributor

xoto10 commented May 29, 2022

You could edit out the "xoto thoughts?" line in github, but I think the real question is whether maintainers want any more tests running. I imagine they'll get back to us after the weekend ...

@vondele
Copy link
Member

vondele commented May 29, 2022

I don't think more tests are needed, LTC pass with simple param tweaks.

@vondele vondele added the to be merged Will be merged shortly label May 29, 2022
@vondele vondele closed this in 4c7de9e May 29, 2022
@xoto10
Copy link
Contributor

xoto10 commented May 29, 2022

Nice work @dubslow ! 👍 😆

dav1312 pushed a commit to dav1312/Stockfish that referenced this pull request Oct 21, 2022
xoto10's scaleopt tune resulted in a yellow LTC, but the main parameter shift looked almost exactly like the tune rate reduction schedule,
so further increases of that param were tried. Joint work xoto10 and dubslow.

passed LTC:
https://tests.stockfishchess.org/tests/view/628c709372775f382300f03e
LLR: 2.93 (-2.94,2.94) <0.50,3.00>
Total: 70112 W: 18932 L: 18584 D: 32596
Ptnml(0-2): 66, 6904, 20757, 7274, 55

failed STC:
https://tests.stockfishchess.org/tests/view/6290e4441e7cd5f29966bdc8
LLR: -2.96 (-2.94,2.94) <0.00,2.50>
Total: 59976 W: 15919 L: 16018 D: 28039
Ptnml(0-2): 250, 6791, 15974, 6754, 219

similar LTC's were yellow
first yellow LTC: https://tests.stockfishchess.org/tests/view/6288a33f817227d3e5c5b05d
double exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e140372775f38230129a6
triple exaggerate yellow: https://tests.stockfishchess.org/tests/live_elo/628e2caf72775f3823012d45

closes official-stockfish#4036

bench 6410652
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
to be merged Will be merged shortly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants