$ sudo docker pull seedotech/spark:2.4.0
$ sudo docker network create --driver=bridge spark
# The 1st argument (CREATE_SPARK_MASTER) is set to YES if you want to create a spark master, the default is YES
# The 2nd argument (SPARK_MASTER_IP) is the spark master ip (the ip of the host machine of the spark master docker container), the default is spark-master
# The 3rd argument (SPARK_WORKER_NUMBER) is the number of workers you want to create, the default is 2
# The default spark cluster has 3 nodes includes 1 master and 2 workers
$ start_containers.sh
# If you want to create 1 spark worker in other machine (assume that the ip of the spark master is 192.168.1.10), you can do like that:
$ start_containers.sh NO 192.168.1.10 1
$ sudo docker exec -it spark-master bash
Access to the http://spark-master:8080 to monitor the spark cluster.
$ sudo docker network create --driver=bridge spark-hadoop
e.g
$ sudo docker run -itd \
--net=spark-hadoop \
-p 50070:50070 \
-p 8088:8088 \
-e HADOOP_SLAVE_NUMBER=$HADOOP_SLAVE_NUMBER \
--name hadoop-master \
--hostname hadoop-master \
seedotech/hadoop:2.9.2 &> /dev/null
e.g
$ sudo docker run -itd \
--net=spark-hadoop \
-p 6066:6066 -p 7077:7077 -p 8080:8080 \
-e IS_SPARK_MASTER=YES \
--name spark-master \
--hostname spark-master \
seedotech/spark:2.4.0 &> /dev/null
$ spark-submit --class com.seedotech.spark.SparkJavaApp --master spark://spark-master:7077 --deploy-mode cluster hdfs://hadoop-master:9000/apps/spark/spark-java-1.0.jar hdfs://hadoop-master:9000/apps/spark/demo.txt hdfs://hadoop-master:9000/apps/spark/out
Use logs to trace bugs/issues
$ sudo docker logs -f spark-master
$ sudo docker logs -f spark-worker1
NOTE: you might run failed in the second time, it's because of the folder hdfs://hadoop-master:9000/apps/spark/out has been created. To run successfully please remove it before running (use http://localhost:50070/explorer.html to upload/delete folders/files).