Grid’5000 Apache Spark Cluster

This script will deploy a basic Apache Spark Cluster in our reserved nodes in Grid’5000. Hope to improve it in the future, any help is welcomed.

Dependencies

Python 3.x

How to run it

First of all we have to clone this repository in the frontend node in Grid’5000.

Download the disk image and the env file from here, then move them inside the repository folder.

Reserve resources

To reserve nodes in Grid’5000 you just have to run the following command (adapted to your situation):

frontend > oarsub -t deploy -p "cluster='suno'" -I -l nodes=4,walltime=2 -k

In this example we are in the Sophia region, we’re requesting 4 nodes for 2 hours in the cluster named ”suno”.

For further and more specific information follow the Grid’5000’s Getting Started tutorial.

Pre-tasks

Prepare the config file

Open the file config.conf and modify the parameters to comply your system configuration (g5k, spark and folders). Be sure to change the username with your grid5000 username.

Install Spark in your frontend

Download ad extract a binary of Spark (download 2.2.4) in your frontend home.

Run it

frontend > python3 deploy.py

To access your nodes use:

ssh root@node-name

That’s all. Simple, no?

Post-Run

Connect to Spark Master Dashboard

To connect to the Web UI, we need to open an ssh tunnel to the web service port:

localhost > ssh {{ g5k.username }}@access.grid5000.fr -N -L8080:{{ nimbus_node_address }}:8080

Now the Web Server should be reached through localhost:8080

Multi-Cluster Run

The script is able to deploy storm also in a multi-cluster environment. To make the reservation use:

frontend > oargridsub -t deploy -w '0:59:00' suno:rdef="/nodes=6",parapide:rdef="/nodes=6"

In this case, we don't enter in the job shell, so we don't have the OAR_NODE_FILE systemvariable. We can retrieve the list of the reserved machines using:

frontend > oargridstat -w -l {{ GRID_RESERVATION_ID  }} | sed '/^$/d' > ~/machines

Finally, change the configuration file specifying the location of the file just created (oar.file.location=~/machines) and write "yes" in the multi cluster option (multi.cluster=yes).

For more informations visit the Grid'5000's Multi-site jobs tutorial.

More

Check out also the other script for:

Apache Storm

Apache Flink

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.history		.history
__pycache__		__pycache__
images		images
namb/config		namb/config
.gitignore		.gitignore
README.md		README.md
ShellHandler.py		ShellHandler.py
config.conf		config.conf
deploy.py		deploy.py
engine.py		engine.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grid’5000 Apache Spark Cluster

Dependencies

How to run it

Reserve resources

Pre-tasks

Prepare the config file

Install Spark in your frontend

Run it

Post-Run

Connect to Spark Master Dashboard

Multi-Cluster Run

More

About

Releases

Packages

Languages

mirfarzam/g5k-spark-cluster

Folders and files

Latest commit

History

Repository files navigation

Grid’5000 Apache Spark Cluster

Dependencies

How to run it

Reserve resources

Pre-tasks

Prepare the config file

Install Spark in your frontend

Run it

Post-Run

Connect to Spark Master Dashboard

Multi-Cluster Run

More

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages