Skip to content

loum/jupyter-pyspark

Repository files navigation

Jupyter Notebook (with PyPI Apache Spark)

Overview

The Jupyter Notebook on Docker with its own Apache Spark compute engine.

Quick links

Quick start

Impatient and just want Jupyter with Apache Spark quickly? Place your notebooks under the notebook directory and optionally set your Python dependencies in your requirements.txt file. Then run:

docker run --rm -d\
 --name jupyter-pyspark\
 --hostname jupyter-pyspark\
 --env JUPYTER_PORT=8889\
 --volume $PWD/notebooks:/home/dummy/notebooks\
 --volume $PWD/requirements.txt:/requirements.txt\
 --publish 8889:8889\
 loum/jupyter-pyspark:latest

To get the URL of your local server:

docker exec -ti jupyter-pyspark bash -c "jupyter notebook list"

Prerequisites

Getting started

Get the code and change into the top level git project directory:

git clone https://github.com/loum/jupyter-pyspark.git && cd jupyter-pyspark

NOTE: Run all commands from the top-level directory of the git repository.

For first-time setup, get the Makester project:

git submodule update --init

Keep Makester project up-to-date with:

make submodule-update

Setup the environment:

make init

Getting help

There should be a make target to get most things done. Check the help for more information:

make help

Docker container image management

Image build and tagging

make image-buildx

Container image build tagging convention used is:

  • <jupyter-version>-<spark-version>-<image-release-number>
  • latest

Image searches

Search for existing Docker image tags with command:

make image-search

Interact with Jupyter as Docker container

To start the container and wait for the Jupyter Notebook service to initiate:

make controlled-run

Once all services stablise you will should be presented with a list of running Jupyter Notebook servers:

Currently running servers:
http://0.0.0.0:8889/?token=5ffb5233ac5d52371fa4b7cfcc9aaaf425e749574ae32fc3 :: /home/dummy/notebooks

Browse to the URL to start interacting with the notebooks.

To stop:

make container-stop

top