Create n-node cluster and Run spark job on Docker
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Docker_WordCount_Spark-1.0.jar
Dockerfile
README.md
RemoveContainersAndImages.sh
RunSparkJobOnDocker.sh
docker-compose.yml
sample.txt

README.md

Create n-node Cluster and Run Jobs on Docker

Prerequisites

  1. java - Installation Instruction
  2. git - Installation Instruction
  3. docker - Installation Instruction

To Run

  1. Clone this repo to your local
  2. Execute the script : RunSparkJobOnDocker.sh

RunSparkJobOnDocker.sh Details

This repository contains all the required files to create a n-node spark cluster and run a simple app on it. In this project, the script RunSparkJobOnDocker.sh does the following:

  1. Pull the Spark image from docker-hub. Tag : 2.2.1
  2. Build and Create a n-node cluster. Here I'm creating a 3-node cluster. This can be changed by specifying docker-compose up -d --scale slave=$number_of_nodes
  3. Wait for 10 seconds so that Docker fully establishes the network connections.
  4. Run the job on the cluster. You can either pull this Source Code and build with Gradle or try something of your own.
  5. Optional - after successful completion of job, bring down the cluster by running docker-compose down.

Spark UI:

Master - localhost:8080
History Server - localhost:18080
Executors - The port bindings can be found by running docker ps -a. Eg:

Pavans-MacBook-Pro:create-and-run-spark-job pavanpkulkarni$ docker ps -a
CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS              PORTS                                                                                                                 NAMES
30a51b5f5a77        pavanpkulkarni/spark_image:2.2.1   "/usr/bin/supervisor…"   12 seconds ago      Up 16 seconds       4040/tcp, 6066/tcp, 7077/tcp, 8080/tcp, 18080/tcp, 0.0.0.0:32854->8081/tcp                                            create-and-run-spark-job_slave_3

So, the executor can be accessed using localhost:32854

Spark Docker Image Details

Check this repo for Docker image details

Remove all Containers and Images

Run the script RemoveContainersAndImages.sh to remove all the containers and images.