Skip to content

litemetric guide

Mohammad A. Ali edited this page Jul 17, 2023 · 6 revisions

The Litemetric guide

Litemetric, a part of Litestack, is a low overhead, simple and generic telemetry tool that collects runtime data about Ruby and Rails applications.

Litemetric, as other components in Litestack is built on top of SQLite. It uses the embedded database engine to store and query telemetry data. As a result, Litemetric is a very low maintenance system. There is no need to setup/maintain/monitor any service aside from the application that integrates Litemetric.

Litemetric follows a simplistic approach where it tries to Nprovide easy enough APIs to cover most of the needed cases for event acquisition and measurement. It does not attempt to be an elaborate performance monitoring system. Still, it can be sufficient for many application needs, zero administrative overhead is a plus!.

Litestack components (e.g. Litejob, Litecache, Litecable) can optionally use Litemetric to report on usage and performance.

Features & Roadmap

  • Capture single/multishot events
  • Measure single/multishot events
  • Snapshot information capturing
  • In memory aggregation
  • Background aggregator
  • Background garabage collector
  • Thread safety
  • Async/Fiber Scheduler integration
  • Graceful shutdown
  • Fork resilience
  • Polyphony integration
  • Web reporting interface

How to use Litemetric?

For any class for which you need to collect metrics just include the Litemetric::Measurable module. Then we can set a unique identifier for the class by overriding the #metrics_identifier method.

Capturing and measuring events can then happen whenever required in the object methods.

# note that we only need to require litestack
# you could still do require 'litestack/litemetric'
class ImportantClass
  include Litemetric::Measurable

  # override the default identifier
  def metrics_identifier
    self.class.name
  end

  # the captured action will only be counted
  # the database will have a count of times the event was captured
  def simple
    # do something
    capture("simple")
  end

  # the measured action will also capture the runtime of the action
  # the database will have a count of times the event was measured and the total time measured
  def complex
    measure("complex") do
      # do something
    end
  end

Events can optionally have keys, to be able to differentiate them, here is an example that uses both event and key names when reporting a metrics:

class Ticker
  def change(symbol, value)
    capture("change", symbol, value)
  end
end

This will results in each symbol being unique in the metrics database, such that it can be reported on alone or in aggregation with other symbols under the same "change" action.

Sometimes an action needs to be report on multiple keys at the same time. Like for example when you need to report job insertion rate for each named queue and for all the queues at once. Litemetric provides a simple way to achieve this

  # capture multiple events in one shot
  def enqueue(queue_name, job)
    # do the action
    capture("enqueue", ["all", queue_name])
  end

  # also with measurement
  def perform(queue_name, job)
    measure(perform, ["all", queue_name]) do
      # do the action
    end
  end  

The above results in two entries being captured/measured, one for the specific queue and one that aggregates over all queues.

Configuring Litemetric

Litemetric looks for a litemetric.yml file in its working directory, the syntax and defaults for the file are as follows:

  path: path/to/your/db/file
  flush_interval: 10 # how long are events buffered before flushing to db
  summarize_intervale: 30 # delay between data summarizer runs
  snapshot_intervale: 10*60 # how often to take snapshots from client libraries

The db path should preferably be outside of your application folder, in order to prevent accidental overrides during deployment

Enabling metric collection for Litedb, Litejob & Litecache

In their respective configuration files, you need to add this directive:

 metrics: true # default is false

Events, keys and values captured for the different Litestack components are interpreted differently, here is a quick list:

Litedb

event key value
Read SQL text of read queries time taken to run the query
Write SQL text of write queries time taken to run the query
Schema change SQL text of the DDL query time taken to run the query
Pragma Pragma statements ran by SQLite time taken to run the statements

Litejob

event key value
enqueue name of the queue that received the job none
dequeue name of the queue that delivered the job none
perform name of the queue that had the job time taken to run the job

Litecache

event key value
get the key of the cache object the hit rate
set the key of the cache object none

Liteboard

Litestack comes with a simple web interface to report on the data collected by Litemetric. You can run the reporting tool by running the command liteboard in the console. You will need to provide Liteboard with the location of your metrics database file, for exact syntax use:

liteboard -h

Once started up properly, liteboard will show a simple break down of the events that were collected by Litemetric, it consists of 3 pages:

index => /

Shows the list of topics for which events were captured in the time range selected. For each topic you see the count of captured events and a historical trend of event counts over time

index

topic => /topics/:topic

Show the data collected for a specific topic, showing different event names and their counts, and in case they had values, it will show average, total, min and max values. It will also show trendlines for counts and average values over time.

topic

Optionally, a topic can publish snapshots of its state to Litemetric, and it will be displayed (if it exists) in that page

event => /topics/:topic/events/:event

This pages show data for a specific event type, listing keys and their counts, value (avg, total, min & max) and the same trend lines like in the topics page.

event

Data Aggregation & Summarization

Litemetric strives to be simple and lightweight, hence it doesn't try to keep data at the highest resolution, rather it is very aggressive in trying to aggregate and summarize data, it only keeps data a finer resolution if it is very fresh. These are the general breakdowns of data granularity:

data resolution data kept for
every 5 minutes 60 minutes
every 1 hour 24 hours
every 24 hours 7 days
every week 52 weeks

This means that if you are looking at day for the last 7 days, it will only be available on day resolution. Beyond 52 weeks (1 year), the data is still stored at the same resolution (a data point for every week per event key) but it is not currently viewable in the dashboard (should be fixed in a later release)