Docker Image - Jupyter Notebook

This image can be used for local development.

The local Spark installation supports:

AWS S3
GCP BigQuery
GCP GS

It doesn't support AWS Glue Data Catalog.

How to buid this Docker image

    task build-docker-image

How to run a Docker container with local credentials

Start a Docker container from the Docker image by running:

    task run-docker-container

    # or

   docker run -it --rm \
   -p 8888:8888 \
   -p 4040:4040 \
   --user root \
   -e GRANT_SUDO=yes \
   -e AWS_PROFILE=$AWS_PROFILE \
   -v "$HOME/.aws":/home/jovyan/.aws \
   -v "$HOME":/home/jovyan/work \
   local.io/mleuthold/jupyter/pyspark-notebook:spark-3.2.0

-v "$HOME":/home/jovyan/work mounts HOME folder into Docker container to access all local files
-e AWS_PROFILE=$AWS_PROFILE use locally provided AWS profile in Docker container
-v "$HOME/.aws":/home/jovyan/.aws use locally provided AWS credentials in Docker container

Miscellaneous

Example code to read data from S3:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .master("local") \
    .appName("Word Count") \
    .config("spark.some.config.option", "some-value") \
    .getOrCreate()

spark.read.parquet('s3a://BUCKET/PATH/*.snappy.parquet')

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.config/jupytext		.config/jupytext
.ipython/profile_default		.ipython/profile_default
.jupyter		.jupyter
opt/conda/share/jupyter/lab/settings		opt/conda/share/jupyter/lab/settings
usr/local/spark/conf		usr/local/spark/conf
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Taskfile.yml		Taskfile.yml
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.config/jupytext

.config/jupytext

.ipython/profile_default

.ipython/profile_default

.jupyter

.jupyter

opt/conda/share/jupyter/lab/settings

opt/conda/share/jupyter/lab/settings

usr/local/spark/conf

usr/local/spark/conf

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

Taskfile.yml

Taskfile.yml

poetry.toml

poetry.toml

pyproject.toml

pyproject.toml

Repository files navigation

Docker Image - Jupyter Notebook

How to buid this Docker image

How to run a Docker container with local credentials

Miscellaneous

About

Releases

Packages

Languages

License

mleuthold/docker-images

Folders and files

Latest commit

History

Repository files navigation

Docker Image - Jupyter Notebook

How to buid this Docker image

How to run a Docker container with local credentials

Miscellaneous

About

Resources

License

Stars

Watchers

Forks

Languages