Skip to content
Branch: master
Go to file

Latest commit


Failed to load latest commit information.
Latest commit message
Commit time

This project contains Apache Spark experiments I implemented as part of learning Spark.

These experiments are based on:

  • Python 3.5
  • Apache Spark 2.X
  • pyspark
  • IPython / Jupyter

The experiments are in the form of Jupyter notebooks that may be executed.

This guide assumes you're using some *nix / MacOS when it comes to OS commands.


You'll need the above dependencies to run the notebooks. You can install the dependencies yourself, but if you don't already have them set up, an easier way to go is to use this ready-made Docker image:

To install and run:

  1. Install Docker:
  2. Create and run a Docker container based on the above pyspark-notebook image:
> export SPARK_EXPERIMENTS=`pwd`
> docker run -d -p 8888:8888 -v $SPARK_EXPERIMENTS:/home/jovyan/work --name spark-experiments jupyter/pyspark-notebook
  1. Open
  2. If the above URL doesn't work, use the IP address of the Docker machine:
> docker-machine ip <name of your running Docker machine>   # See 'docker-machine ls' if you don't know the name

Once you're done, you can remove the above Docker container using:

> docker stop spark-experiments && docker rm spark-experiments

Exported notebooks

Following are HTML exports of the notebooks:


Apache Spark experiments, organized into Jupyter / IPython notebooks



No releases published
You can’t perform that action at this time.