Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

ChessData

PGN Mirror. There will be dups, dirty data, errors, GM draws etc -- the data will probably need to be post-processed, filtered, deduped etc.

In the news:

Command-line tools can be 235x faster than your Hadoop cluster

The first thing to do is get a lot of game data. This proved more difficult than I thought it would be, but after some looking around online I found a git repository on GitHub from rozim that had plenty of games. I used this to compile a set of 3.46GB of data, which is about twice what Tom used in his test. The next step is to get all that data into our pipeline

You can’t perform that action at this time.