This was an analysis of a poker database of 120 million hands. I hope to pick this back up using Hadoop in the future, since most of the analysis requires aggregation.
Walkthrough of beginning part of the analysis is below:
- Part A
- Part B
- Part C
- Trying D3-based animation library in python
- Part D
- Charts generated by comparing different dimensions are in Stage E
The other parts are mostly just a lot of python scripts to manipulate the csv files (which total roughly 160 GB) chunk by chunk.