Real-time Heroku operations dashboard, powered by the Heroku event stream.
Pulse provides web-accessible, real-time metrics for the Heroku platform kernel. Heroku engineers use Pulse internally to maintain ambient awareness of Heroku's distributed infrastructure and to quickly diagnose platform issues.
Pulse works by consuming, processing, and performing statistical calculations against the platform's unified event log stream. All Heroku components log extensively and in an easily machine-parsable format; by processing these logs, we can build various platform metrics, analytics, monitoring, auditing, and debugging tools orthogonally to the critical-path components themselves. Pulse is one such tool:
The log messages that ultimately power Pulse originate from hosts across the Heroku platform. These messages are sent to a load-balanced pool of log forwarders. The log forwarders in turn distribute messages to an internal Splunk cluster for archival and long-term analytics, and also host "Aorta" servers which provide Pulse access to the log stream.
Pulse itself is a distributed application implemented with three process types:
web. Processes of the
receiver type are responsible for physically consuming a load-balanced fraction of the platform event stream, parsing each text-encoded event into a data structure, and streaming that data through the initial phase of the Pulse statistical pipeline. The
receiver processes periodically broadcast "stat fragments" corresponding to partial roll-ups of the various stats that Pulse tracks, for consumption by the
merger process. To ensure that the production Pulse deployment can process the entire Heroku log stream at peak platform load, this process type is horizontally scaled.
merger is a singleton processes responsible for providing a unified statistical view for the Pulse deployment. It continuously combines stat fragments as they arrive from the
receiver processes and periodically emits the resulting "stat snapshots" for consumption by front-end clients.
The key consumers of the stats snapshot stream are processes of the
web type. These processes consume and buffer a short history of these stat snapshots, and dump their buffer in response to web requests. Web clients use this data to display corresponding sparkline graphs.
Pulse is written in Clojure and deployed to Heroku itself using the platform's native Clojure support.
Ensure that Aorta is running at
$ cp .env.sample .env $ mate .env $ export $(cat .env) $ lein deps $ foreman start
Ensure that Aortas are running and dyno-reachable at
AORTA_URLS, and that the username in the
AORTA_URLS are scoped to this particular deployment of Pulse. Then:
$ heroku create pulse-production --stack cedar $ heroku addons:add redistogo:small $ heroku config:add ... $ git push heroku master $ heroku scale receiver=16 merger=1 web=2
$ heroku open
$ heroku config $ export REDIS_HOST="..."; export REDIS_PORT="..."; export REDIS_PASS="..." $ redis-cli -h $REDIS_HOST -p $REDIS_PORT -x $REDIS_PASS subscribe stats.merged