- Overview
- Quick links
- Quick start
- Prerequisites
- Getting started
- Getting help
- Docker container image management
- Interact with Jupyter as Docker Container
The Jupyter Notebook on Docker with its own Apache Spark compute engine.
Impatient and just want Jupyter with Apache Spark quickly? Place your notebooks under the notebook
directory and optionally set your Python dependencies in your requirements.txt
file. Then run:
docker run --rm -d\
--name jupyter-pyspark\
--hostname jupyter-pyspark\
--env JUPYTER_PORT=8889\
--volume $PWD/notebooks:/home/dummy/notebooks\
--volume $PWD/requirements.txt:/requirements.txt\
--publish 8889:8889\
loum/jupyter-pyspark:latest
To get the URL of your local server:
docker exec -ti jupyter-pyspark bash -c "jupyter notebook list"
Get the code and change into the top level git
project directory:
git clone https://github.com/loum/jupyter-pyspark.git && cd jupyter-pyspark
NOTE: Run all commands from the top-level directory of the
git
repository.
For first-time setup, get the Makester project:
git submodule update --init
Keep Makester project up-to-date with:
make submodule-update
Setup the environment:
make init
There should be a make
target to get most things done. Check the help for more information:
make help
make image-buildx
Container image build tagging convention used is:
<jupyter-version>-<spark-version>-<image-release-number>
latest
Search for existing Docker image tags with command:
make image-search
To start the container and wait for the Jupyter Notebook service to initiate:
make controlled-run
Once all services stablise you will should be presented with a list of running Jupyter Notebook servers:
Currently running servers:
http://0.0.0.0:8889/?token=5ffb5233ac5d52371fa4b7cfcc9aaaf425e749574ae32fc3 :: /home/dummy/notebooks
Browse to the URL to start interacting with the notebooks.
To stop:
make container-stop