Skip to content

Monitoring & Alerting

akshat edited this page Dec 26, 2023 · 10 revisions

Goose emits metrics around Job execution timings, success rates, latency & queue sizes. A Metric plugin receives these metrics & forwards it to its respective backend. Goose provides StatsD as a specimen metric backend.

For Prometheus, or other metrics backends, a plugin can be injected by following this guide.

List of all Metrics

Metric Type Description
enqueued_jobs.<my-queue>.count gauge Count of jobs in my-queue
total_enqueued_jobs.count gauge Total count of all jobs
scheduled_jobs.count gauge Count of scheduled jobs
batches.count gauge Count of batches
periodic_jobs.count gauge Count of cron jobs
dead_jobs.count gauge Count of dead jobs
jobs.processed count Count of processed jobs
jobs.succeeded count Count of successful jobs
jobs.failed count Count of failed jobs
jobs.recovered count Count of orphan jobs which were recovered
batch.success count Count of batches where all jobs succeeded
batch.dead count Count of batches where all jobs died
batch.partial-success count Count of batches with mix of successful and dead jobs
execution.latency timing Latency(in ms) between enqueue -> start of execution
scheduled.latency timing Latency(in ms) between theoretical schedule time -> start execution
cron_scheduled.latency timing Latency(in ms) between theoretical schedule time -> start execution
retry.latency timing Latency(in ms) between theoretical retry time -> start of execution
job.execution_time timing Time taken to execute a job(in ms)
batch.completion_time timing Time taken to complete a batch(in ms)

Configuring StatsD

StatsD plugin an be configured in following ways:

Key Description
:enabled? Boolean flag for enabling/disabling metrics
:host Host of StatsD Aggregator
:port Port of StatsD Aggregator
:prefix Prefix for all metrics. Can be a generic term like "goose."
or specific to microservice's name
:sample-rate Sample rate of metric collection
:tags Map of key-value pairs to be attached to every metric

Usage

Note: Goose uses clj-statsd, which uses agents internally. Post stopping a worker, (shutdown-agents) must be called in order to exit the program.

(ns statsd-metrics
  (:require
    [goose.metrics.statsd :as statsd]
    [goose.worker :as w]))

(let [statsd-opts {:enabled?    true
                   :host        "localhost"
                   :port        8125
                   :prefix      "maverick."
                   :sample-rate 0.9
                   :tags        {:top :gun}}
      statsd (statsd/new statsd-opts)
      worker-opts (assoc worker-opts :metrics-plugin statsd)
      worker (w/start worker-opts)]
  ;; When shutting down worker...
  (w/stop worker)
  ;; clj-statsd uses agents internally. Call (shutdown-agents) to exit the program.
  (shutdown-agents))

Previous: Error Handling & Retries        Next: Guide to Custom Metrics Backend

Clone this wiki locally