Skip to content
Course labs from Berkeley course on Spark, written in jupyter notebooks
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
cs110_lab1_power_plant_ml_pipeline.ipynb
cs110_lab2_als_prediction.ipynb
cs110_lab3a_word_count_rdd.ipynb
cs110_lab3b_text_analysis_and_entity_resolution.ipynb

README.md

Spark-exercise-problems

Exercise problems from cs110 The notebooks can be run from databricks hosted spark instance at http://community.cloud.databricks.com The data files are also available by following the comments in each notebook. Most of the datafiles are being hosted internally on Databrick's s3 instance and they can only be access going through their notebook.

Caution, this code was written in Python 2.6 and Spark 1.6, some of the code needs to be changed if running under Spark 2.0 as rdd to DF conversions need to be handled differently

You can’t perform that action at this time.