This was a class project for MIT's 6.824 distributed computer systems class.
We (Eben and Jonas) implemented a basic version of Berkeley's resilient distributed datasets (aka Spark) in Python
The original RDD paper is here: https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf