Skip to content
Course labs from Berkeley course on Spark, written in jupyter notebooks
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Exercise problems from cs110 The notebooks can be run from databricks hosted spark instance at The data files are also available by following the comments in each notebook. Most of the datafiles are being hosted internally on Databrick's s3 instance and they can only be access going through their notebook.

Caution, this code was written in Python 2.6 and Spark 1.6, some of the code needs to be changed if running under Spark 2.0 as rdd to DF conversions need to be handled differently

You can’t perform that action at this time.