Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

current move based WDL model is 8 moves off for standard fishtest LTC data #34

Closed
robertnurnberg opened this issue Sep 5, 2023 · 13 comments

Comments

@robertnurnberg
Copy link
Contributor

That is because cutechess-cli saves pgns with FEN move counters 0 1.

See the discussion on discord.

@robertnurnberg
Copy link
Contributor Author

One way to fix historical data (for classical chess, where we can be sure an 8-move book was used), is to use the command find . -type f -name "*.pgn" -exec sed -i '/FEN/ s/0 1/0 9/g' {} + in the directory with pgns.

By the way, it may be a good idea to use a separate frc subdir in our download script for chess960. I am not sure how to catch that in the LTC overview page on fishtest, if someone could point me to an example html file, I could try to modify our download script accordingly.

@robertnurnberg
Copy link
Contributor Author

Thanks. So at the moment the date of the test is extracted from https://tests.stockfishchess.org/tests/finished?ltc_only=1, together with the testID. Ideally we would also collect the (d)frc info from there. Is that possible?

@vondele
Copy link
Member

vondele commented Sep 8, 2023

one can fetch the book used from the test info page. https://tests.stockfishchess.org/tests/view/64f9a5910de4a3bb72fbe574 if the book name contains FRC it is treated as FRC.

Having access to so test info (e.g. a json containing key information) could now be saved along with the pgns (since we store them in a separate dir), and would allow for taking some steps related to that information.

Concerning the use of 8 moves deep lines to start, yes, I was aware of that. The point is, it is related to the book used, and maybe even the software used to play the games. Game ply seems to work pretty well for the WDL model, but indeed suffers from this limitation. Ultimately there is a limit to what can be put in the model (e.g. the same FEN could be a win or a draw depending on the move counter).

@robertnurnberg
Copy link
Contributor Author

Concerning the use of 8 moves deep lines to start, yes, I was aware of that. The point is, it is related to the book used, and maybe even the software used to play the games. Game ply seems to work pretty well for the WDL model, but indeed suffers from this limitation. Ultimately there is a limit to what can be put in the model (e.g. the same FEN could be a win or a draw depending on the move counter).

I believe the 8 move offset could and should be fixed, agreed? I.e. both the fitting of the model, and the playing SF should use move counter from start position. Playing SF already does this in 99% of the cases, i.e. when used correctly by competent users.

The alternatives to fixing this would be either to go to a material based model (not sure how robust that would be at present), or to have a flat eval to wdl conversion w/o moves or material information.

@vondele
Copy link
Member

vondele commented Sep 8, 2023

I think the 8 moves offset should be fixed, but I'm not sure how this can be most cleanly done. In this case, fixing things is basically having some knowledge of the book. Probably, the code that downloads the pgns could start keeping some kind of side-info in a .json next to the pgns that documents what is there. Things like the e.g.

  • starting move
  • NormalizeToPawn actual value
  • Elo difference implied by the test
    are all things that in principle feed into the model.

I would not switch to a material based model so far, that's a bigger change.

@robertnurnberg
Copy link
Contributor Author

A quick fix for now is to only use classical chess for fitting, and use the sed-one-liner from #34 (comment). Long term we should switch to fastchess for fishtest, which will store the correct FEN (w/ correct move counters) in .pgn.
Then the only extra fix needed is in @Disservin's cpp code to read the move counter from .pgn and use that. (I can look at that over the weekend.)

The .json thing would be a bigger change, and require more changes to both the download script and the cpp code.

@Disservin
Copy link
Member

Have you tested how long that sed takes with 40gb of files ?

@robertnurnberg
Copy link
Contributor Author

Not yet, but I would hope less time than the download.

@Disservin
Copy link
Member

Regarding the analysis code you only need to add the current fullmoves * 2 to the ply I guess ?

Might be a dumb question but, how will the new model behave when the moves are shifted by 8 ?
How good will the fitted equation be for < 8 ?

@robertnurnberg
Copy link
Contributor Author

I'd store moves from now and not plies. Read counter from board, and only increase if side to move is white. Or always read from board.

@robertnurnberg
Copy link
Contributor Author

Fitting will hardly change. But we may want to move the anchor to move 40. That's for Joost to decide.

@robertnurnberg
Copy link
Contributor Author

I think this issue can be closed, as all things this repo has influence over have been fixed. The only piece of the puzzle left is to get pgns from fishtest with correct move counters. (Or convert the pgns manually.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants