Skip to content

lightcopy/history-server

Repository files navigation

history-server

Implementation of Spark history server that uses MongoDB as a backend to store events.

Build Status

Overview

History server provides persistence and quick access to application logs without keeping data in memory. Inspired by amazing work done for Spark UI in hammerlab/spree, and makes install and ops easy, since the project is designed to be drop-in replacement for Spark history server.

This is a very early stage of the project and some notable features are missing such as RDD operation graph, event timeline, cache timeline, etc. I will be working on adding them, and contributions are always welcome.

Get the latest release

Dependencies

  • Spark 2.x
  • Java 7+
  • Mongo 3.2+ (see install below)

Download distribution

Available distributions (.tgz) are uploaded with every release and live in releases tab on GitHub. You can also build your own, see Build section.

Install and run

Download one of the distributions history-server-bin-X.Y.Z.tgz, unpack archive, and edit a few configuration parameters in conf/history-server-env.sh (see Configuration).

$ tar -xzf history-server-bin-X.Y.Z.tgz
# optionally edit configuration
$ vi conf/history-server-env.sh

Make sure that you have MongoDB running before you start application (though app will report error if database is not accessible). You can run docker container as well, in this case you do not need to change any settings in conf/history-server-env.sh (unless you also change container host/port).

$ docker run -it -p 27017:27017 mongo:3.2

Application will create database history_server and necessary tables automatically.

To launch application run:

$ sbin/start.sh

Following options can be specified with start.sh:

  • -d, --daemon=true/false launch service as daemon process
  • --help show help for script

To stop application use Ctrl-C or sbin/stop.sh. Script does not stop Mongo database or docker container as part of shutdown.

Configuration

Configuration for history server is available in conf/history-server-env.sh. You can set following options:

  • HISTORY_SERVER_HOST host to use for history server, default is localhost
  • HISTORY_SERVER_PORT port to use for history server, default is 8080
  • SPARK_EVENT_LOG_DIR directory with Spark application logs, normally configured as spark.eventLog.dir option in Spark, can be either file:/ or hdfs:/; directory should exist otherwise error is raised
  • MONGO_CONNECTION connection url to MongoDB, default is mongodb://localhost:27017
  • LOG4J_CONF_FILE - alternative path to log4j configuration file, should be in form of file:/path/to/file, if not provided default is used in conf/ directory

You can also configure logging in conf/log4j.properties, by default logging level is set to INFO.

Development

Build

If you want to build project, instructions are below:

Build requirements

  • Java 7+
  • Node 6+ (npm 3.9.5 works)

Clone repository:

git clone https://github.com/lightcopy/history-server.git
cd history-server

# Prepare code and dev files
sbt compile # pull dependencies and compile code
npm install # install frontend dependencies

To make distribution, just run bin/make-distribution. Script will compile sources, assemble jar, and create static files (html/css/js), and copy them into target/history-server-bin directory.

Following options are available:

  • --name adds suffix to the name, e.g. --name=xyz will result in target/history-server-bin-xyz
  • --tgz create .tgz archive, release directory will be removed afterwards; if not provided - only directory is created
  • --help show help for script

Note that there is no need to build distribution to test code, since repository acts like distribution (all scripts work the same way). Following process might be useful:

# build code and assembly jar
$ sbt assembly

# build static files
$ npm run dev

# run start script (Mongo should be running)
$ sbin/start.sh

start.sh will discover jars that need to be added to classpath.

Also bin/start-dev.sh script is available to test either frontend or some basic functionality. This runs server that does not require MongoDB or scanning any event logs and returns sample data when API is invoked.

You can also run individual build commands declared in package.json, e.g. to rebuild javascript code, just run npm run make_js.

Run tests

Run sbt test to launch tests.

Prepare release

Run bin/make-release with --release set to release version (e.g. 0.1.2) and --next set to next development version (e.g. 0.1.3-SNAPSHOT).