Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pgn.BaseVisitor.begin_parse_san() method #984

Merged
merged 1 commit into from
Apr 25, 2023

Conversation

snowdrop4
Copy link
Contributor

@snowdrop4 snowdrop4 commented Mar 30, 2023

Add a new pgn.BaseVisitor method: begin_parse_san().

This is called when the parser encounters a SAN, but before the very expensive step of calling board.parse_san() on the SAN (which entails generating all the possible legal moves).

If the begin_parse_san() method returns SKIP, parsing of the SAN is skipped.

There are a couple of potential use-cases:

  • only parsing the first $x moves from each game mainline (perhaps you're only interested in the opening stages of each game?)
  • only parsing the first $x moves from each variation (perhaps the follow-up moves inside each variation do not matter to you?)
  • trawling through lots of PGNs to find games where a specific opening sequence was played

As parsing the moves is by far the most time-consuming part of parsing a PGN, skipping any moves can result in huge time-savings.

On a real-world codebase that reads in PGNs with variations specified for most moves (and does a lot of computations afterwards), I'm seeing around a 20% speedup in overall program runtime, entirely from skipping parsing all but the first move of each variation.

@niklasf
Copy link
Owner

niklasf commented Apr 9, 2023

Thanks, this makes a lot of sense. In the more common case where every move is needed, how much does calling an extra method for every move cost?

@snowdrop4
Copy link
Contributor Author

snowdrop4 commented Apr 12, 2023

Reading all games in a tight loop from a PGN with 74 games in it, 100 times. Measured with timeit.

Master:

run 1 = 131.732180291001
run 2 = 132.3093085839937
run 3 = 131.67879545799224

mean = 131.906761444 seconds

PR branch:

run 1 = 132.53046183301194
run 2 = 132.58470095800294
run 3 = 132.302952500002

mean = 132.472705097 seconds

For an increase of 0.004% in total runtime.

Python 3.11.2 on macOS 13.3.1 on an M1 MacBook air.

@snowdrop4
Copy link
Contributor Author

Bonus benchmark, run the same way on the PR branch, but making use of the begin_parse_san() visitor to skip all but the first move from each variation:

run 1 = 95.46104741700401
run 2 = 95.54273075000674
run 3 = 95.95566983299796

mean = 95.6531493333 seconds

@snowdrop4
Copy link
Contributor Author

Semi-unrelatedly, but would you be amenable to a PR that ported the move generation to Rust with PyO3?

@niklasf niklasf merged commit bb3a2a4 into niklasf:master Apr 25, 2023
@niklasf
Copy link
Owner

niklasf commented Apr 25, 2023

Thanks again.

Regarding #722, I am not sure, honestly. Priority would memory safety and correctness, of course, followed by simplicity of the build/release process. I've also considered ditching the aged python-chess API and making bindings for shakmaty, instead.

@niklasf niklasf added this to the v1.10.0 milestone Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants