spark-kit: Toolkits simplifying the experiments on Spark

Typically, Spark runs in YARN, which is not convenient if we need finer control of executor placement (for example, run in a single machine with specific number of executors with exactly configuration). Standalone better suites this use cases.

Example

In order to use the spark-kit:

git clone https://github.com/stevenybw/spark-kit
cd spark-kit
source manage-standalone.sh

Get Spark official release

wget https://www.apache.org/dyn/closer.lua/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz

Check the environment and follow the direction

check_environment

Adjust the parameters in manage-standalone.sh.
Establish Spark standalone cluster with all the nodes in ${SLAVES_HOSTLIST}

reset_environment $DIST

Establish Spark standalone cluster with a single node (the first node in ${SLAVES_HOSTLIST})

reset_environment $LOCAL

Establish Spark standalone cluster with a single node (current node running the script)

reset_environment_locally $LOCAL

Check the Spark standalone resource manager master

show_master_webui

Show the command to launch a spark shell (its argument must be the same as how you setup the environments, assume distributed here)

show_spark_shell_command $DIST

Or launch a spark shell

enter_spark_shell $DIST

See the session web UI of the spark job at port 4040

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
sbin		sbin
README.md		README.md
config.sh		config.sh
manage-standalone.sh		manage-standalone.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-kit: Toolkits simplifying the experiments on Spark

Example

About

Releases

Packages

Languages

lxhAtTHU/spark-kit

Folders and files

Latest commit

History

Repository files navigation

spark-kit: Toolkits simplifying the experiments on Spark

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages