Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
4e063cf
commit 61fe4aa
Showing
1 changed file
with
15 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,24 @@ | ||
# Spark-2.1.0 Create and Run Jobs on Docker | ||
|
||
This repository contains all the required files to create a n-node spark cluster and run a simple app on it. In this project, the script [RunSparkJobOnDocker.sh](RunSparkJobOnDocker.sh) does the following: | ||
### Prerequisites | ||
1. java - [Installation Instruction](https://www.java.com/en/download/help/download_options.xml) | ||
2. git - [Installation Instruction](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) | ||
3. docker - [Installation Instruction](https://docs.docker.com/engine/installation/) | ||
|
||
### To Run | ||
1. Clone this repo to your local | ||
2. Execute the script : [RunSparkJobOnDocker.sh](blob/master/RunSparkJobOnDocker.sh) | ||
|
||
### RunSparkJobOnDocker.sh Details | ||
This repository contains all the required files to create a n-node spark cluster and run a simple app on it. In this project, the script [RunSparkJobOnDocker.sh](blob/master/RunSparkJobOnDocker.sh) does the following: | ||
1. Pull the image from [docker-hub](https://hub.docker.com/r/pavanpkulkarni/spark_image_2.0.1/) | ||
2. Create a n-node cluster. in our case, we are creating a 5-node cluster. This can changes by specifying `docker-compose scale slave=n` | ||
3. Next, we build an image which acts as expternal image that will submit the job on the cluster that we just created. | ||
4. Run the job on the cluster. You can either clone this [Source Code](https://github.com/pavanpkulkarni/SparkDocker) or try something of your own. | ||
4. Run the job on the cluster. You can either pull this [Source Code](https://github.com/pavanpkulkarni/SparkDocker) or try something of your own. | ||
5. Finally, when the job is executed, we bring down the cluster. | ||
|
||
# Spark Job Description. | ||
This is a simple spark job in Scala to read a file - [sample.txt](sample.txt) and perform a basic Word Count on this file. | ||
### Spark Job Description. | ||
This is a simple spark job in Scala to read a file - [sample.txt](blob/master/sample.txt) and perform a basic Word Count on this file. | ||
|
||
# Image Dockerfile | ||
### Image Dockerfile | ||
[Dockerfile](https://github.com/pavanpkulkarni/docker-spark-image_2.1.0) |