Skip to content

Docker build for Apache Spark based on lightweight Alpine Linux image.

License

Notifications You must be signed in to change notification settings

riyadparvez/spark-in-docker

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alpine Linux

Many Docker Apache Spark images are based on heavy-weight Debian images. This is based on Alpine Linux which is optimized for containers and light-weight.

spark

A Spark container. Use it in a standalone cluster with the accompanying docker-compose.yml, or as a base for more complex recipes.

docker example

To run SparkPi, run the image with Docker:

docker run --rm -it -p 4040:4040 riyadparvez/spark bin/run-example SparkPi 10

To start spark-shell with your AWS credentials:

docker run --rm -it -e "AWS_ACCESS_KEY_ID=YOURKEY" -e "AWS_SECRET_ACCESS_KEY=YOURSECRET" -p 4040:4040 riyadparvez/spark bin/spark-shell

To do a thing with Pyspark

echo "import pyspark\nprint(pyspark.SparkContext().parallelize(range(0, 10)).count())" > count.py
docker run --rm -it -p 4040:4040 -v $(pwd)/count.py:/count.py riyadparvez/spark bin/pyspark /count.py

docker-compose example

To create a simplistic standalone cluster with docker-compose:

docker-compose up

The SparkUI will be running at http://${YOUR_DOCKER_HOST}:8080 with one worker listed. To run pyspark, exec into a container:

docker exec -it dockerspark_master_1 /bin/bash
bin/pyspark

To run SparkPi, exec into a container:

docker exec -it dockerspark_master_1 /bin/bash
bin/run-example SparkPi 10

license

MIT

About

Docker build for Apache Spark based on lightweight Alpine Linux image.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published