Implementation of Spark history server that uses MongoDB as a backend to store events.
History server provides persistence and quick access to application logs without keeping data in memory. Inspired by amazing work done for Spark UI in hammerlab/spree, and makes install and ops easy, since the project is designed to be drop-in replacement for Spark history server.
This is a very early stage of the project and some notable features are missing such as RDD operation graph, event timeline, cache timeline, etc. I will be working on adding them, and contributions are always welcome.
- Spark 2.x
- Java 7+
- Mongo 3.2+ (see install below)
Available distributions (.tgz) are uploaded with every release and live in releases tab on GitHub. You can also build your own, see Build section.
Download one of the distributions history-server-bin-X.Y.Z.tgz
, unpack archive, and edit a few
configuration parameters in conf/history-server-env.sh
(see Configuration).
$ tar -xzf history-server-bin-X.Y.Z.tgz
# optionally edit configuration
$ vi conf/history-server-env.sh
Make sure that you have MongoDB running before you start application (though app will report error
if database is not accessible). You can run docker container as well, in this case you do not need
to change any settings in conf/history-server-env.sh
(unless you also change container host/port).
$ docker run -it -p 27017:27017 mongo:3.2
Application will create database history_server
and necessary tables automatically.
To launch application run:
$ sbin/start.sh
Following options can be specified with start.sh
:
-d
,--daemon=true/false
launch service as daemon process--help
show help for script
To stop application use Ctrl-C
or sbin/stop.sh
. Script does not stop Mongo database or docker
container as part of shutdown.
Configuration for history server is available in conf/history-server-env.sh
. You can set
following options:
HISTORY_SERVER_HOST
host to use for history server, default is localhostHISTORY_SERVER_PORT
port to use for history server, default is 8080SPARK_EVENT_LOG_DIR
directory with Spark application logs, normally configured asspark.eventLog.dir
option in Spark, can be eitherfile:/
orhdfs:/
; directory should exist otherwise error is raisedMONGO_CONNECTION
connection url to MongoDB, default is mongodb://localhost:27017LOG4J_CONF_FILE
- alternative path to log4j configuration file, should be in form offile:/path/to/file
, if not provided default is used inconf/
directory
You can also configure logging in conf/log4j.properties
, by default logging level is set to INFO.
If you want to build project, instructions are below:
Build requirements
- Java 7+
- Node 6+ (npm 3.9.5 works)
Clone repository:
git clone https://github.com/lightcopy/history-server.git
cd history-server
# Prepare code and dev files
sbt compile # pull dependencies and compile code
npm install # install frontend dependencies
To make distribution, just run bin/make-distribution
. Script will compile sources, assemble jar,
and create static files (html/css/js), and copy them into target/history-server-bin
directory.
Following options are available:
--name
adds suffix to the name, e.g.--name=xyz
will result intarget/history-server-bin-xyz
--tgz
create.tgz
archive, release directory will be removed afterwards; if not provided - only directory is created--help
show help for script
Note that there is no need to build distribution to test code, since repository acts like distribution (all scripts work the same way). Following process might be useful:
# build code and assembly jar
$ sbt assembly
# build static files
$ npm run dev
# run start script (Mongo should be running)
$ sbin/start.sh
start.sh
will discover jars that need to be added to classpath.
Also bin/start-dev.sh
script is available to test either frontend or some basic functionality.
This runs server that does not require MongoDB or scanning any event logs and returns sample data
when API is invoked.
You can also run individual build commands declared in package.json
, e.g. to rebuild javascript
code, just run npm run make_js
.
Run sbt test
to launch tests.
Run bin/make-release
with --release
set to release version (e.g. 0.1.2
) and --next
set to
next development version (e.g. 0.1.3-SNAPSHOT
).