Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


PGN Mirror. There will be dups, dirty data, errors, GM draws etc -- the data will probably need to be post-processed, filtered, deduped etc.

In the news:

Command-line tools can be 235x faster than your Hadoop cluster

The first thing to do is get a lot of game data. This proved more difficult than I thought it would be, but after some looking around online I found a git repository on GitHub from rozim that had plenty of games. I used this to compile a set of 3.46GB of data, which is about twice what Tom used in his test. The next step is to get all that data into our pipeline

You can’t perform that action at this time.