Navigation Menu

Skip to content

mauriciojost/spark-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README

This is a Spark getting started project. It mainly aims to help understanding Spark fundamentals and the impact of using different algorithms when coding Spark-based applications.

What is this for?

With this project you will be able to visualize what means using coalesce(2) against using coalesce(4) in terms of performance.

You first clone it. Then do a commit using groupBy, then a commit but using reduceBy. When done with the versions of your app (versions that you want to compare against), then you execute batch.bash providing the target commits. The framework will run the application in its different versions and, once done, you will be able to compare the performances of all of them using the beautiful Spark UI portal (DAGS, memory usage, execution time, etc.).

The input data used will be automatically generated the first time you launch the application. It will remain the same for all runs.

This is the related Google Document where I log my conclusions from some basic experiments.

How to use it?

To get started you need to first download Spark (v1.4 or older). Then launch its history server:

cd $SPARK_HOME
./sbin/start-history-server.sh

Here you will be able to visualize the performances of the execution of different versions of your application. It will allow you to compare Spark performances as you change your algorithms.

The history server can be browsed here:

http://localhost:18080/

The history server will not show anything at the beginning. Now you can launch your tests using a range of commit: from a specified commit (through it's GIT commit ID) until now, using only the commits whose message begins with the string TESTME. Launch the batch test:

cd $THIS_APP_HOME
./batch.bash <from_commit_id>

How to run a single test?

To launch a single test run (for debugging purposes):

cd $THIS_APP_HOME
./run.bash

See localhost:4040 web interface.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published