Timberlake is a Job Tracker for Hadoop.
JavaScript Go CSS Other
Latest commit 18b50f1 Dec 19, 2017 @abdul-stripe abdul-stripe Merge pull request #87 from stripe/test-get-job
test current functionality
Permalink
Failed to load latest commit information.
bots Merge pull request #32 from tglstory/slackbot Mar 14, 2016
css Fix tooltip (#84) Nov 14, 2017
henson Use HENSON_SERVICE in henson/restart Oct 24, 2017
img This is the Job Tracker. This is JT. This is Timberlake. Nov 21, 2014
js Fix sorting Dec 13, 2017
test Parse error logs for failed tasks. Mar 17, 2016
vendor test current functionality Dec 19, 2017
.eslintignore Add .eslintignore Oct 23, 2017
.eslintrc.json Fix DOM node refs (#76) Nov 2, 2017
.flowconfig Add multicluster support (#86) Dec 1, 2017
.gitignore test current functionality Dec 19, 2017
.ignore Request jobConf directly (#81) Nov 9, 2017
.travis.yml Add golint to test suite (#78) Nov 3, 2017
Dockerfile Request jobConf directly (#81) Nov 9, 2017
LICENSE Oh wow I guess we can OSS this now. Nov 21, 2014
Makefile Add golint to test suite (#78) Nov 3, 2017
README.md Node must be in path as well Nov 9, 2015
api_test.go test current functionality Dec 19, 2017
conf.go test current functionality Dec 19, 2017
conf_test.go Request jobConf directly (#81) Nov 9, 2017
gulpfile.js Upgrade some npm packages (#79) Nov 6, 2017
history.go test current functionality Dec 19, 2017
history_test.go Restore job counters streaming (#83) Nov 13, 2017
index.html Fix tooltip (#84) Nov 14, 2017
jenkins_build.sh Test the timberlake-slackbot binary Mar 4, 2016
job.go test current functionality Dec 19, 2017
jobclient.go rename fetchCounters --> listCounters Dec 20, 2017
jobtracker.go rename fetchCounters --> listCounters Dec 20, 2017
kill.go Load history from JHist files, not the history server Mar 8, 2016
logs.go Make namenodeAddress part of jobTracker struct Nov 3, 2017
main.go test current functionality Dec 19, 2017
package.json Add multicluster support (#86) Dec 1, 2017
sse.go Making the sse fail continue instead of break Mar 12, 2016
tasks.go Parse error logs for failed tasks. Mar 17, 2016

README.md

Timberlake is a Job Tracker for Hadoop.

Intro

Timberlake is a Go server paired with a React.js frontend. It improves on existing Hadoop job trackers by providing a lightweight realtime view of your running and finished MapReduce jobs. Timberlake exposes the counters and configuration that are the most useful, allowing you to get a quick overview of the whole cluster or dig into the performance and behavior of a single job.

It also provides waterfall and boxplot visualizations for jobs. We've found that these visualizations can be really helpful for figuring out why a job is slow. Is it launching too many mappers and overloading the cluster? Are reducers launching early and starving the mappers? Does the job have reducer skew? You can use the counters of bytes written, shuffled, and read to understand the network and I/O behavior of your jobs. And when there's a crash, Timberlake will show you tracebacks from the logs to help you debug the job.

Timberlake pairs well with Scalding and Cascading. It uses extra data from the Cascading planner to show the relationships between steps, and to clarify which jobs' outputs are used as inputs to other jobs in the flow. Visualizing that flow makes it much easier to figure out which steps are causing bottlenecks.

Finally, we've included a Slackbot that has significantly improved our Hadooping lives. The bot can notify you when your jobs start and finish, and provides links back to Timberlake.

Screenshots

Job Details

Job Details

List of Jobs

List of Jobs

Installation

The best way to install is with tarballs, which are available on the release page.

Download it somewhere on your server, and then untar it:

$ tar zxvf timberlake-v1.0.2-linux-amd64.tar.gz
$ mv -T timberlake-v1.0.2-linux-amd64 /opt/timberlake

Now you can start the server:

$ /opt/timberlake/bin/timberlake \
    --bind :8000 \
    --resource-manager-url http://resourcemanager:8088 \
    --history-server-url http://resourcemanager:19888 \
    --namenode-address namenode:9000

And optionally, start the Slackbot:

$ /opt/timberlake/bin/slack \
    --internal-timberlake-url http://localhost:8000 \
    --external-timberlake-url https://timberlake.example.com \
    --slack-url https://hooks.slack.com/services/...

You'll need to create a new Incoming Webhook to generate the Slack URL for your bot.

Building from Source

You'll need npm, go and node on your path.

$ git clone https://github.com/stripe/timberlake.git
$ cd timberlake
$ make

Limitations

Timberlake only works with the YARN Resource Manager API. It's been tested on v2.4.x and v2.5.x, but the Kill Job feature uses an endpoint that's only available in v2.5.x+.

Our cluster has 10-40 jobs running simultaneously and about 2,000 jobs running per day. Timberlake's performance has not been tested outside these bounds.