Is LevelDB a good fit for a RRD ? #55

jpillora · 2013-07-02T06:39:41Z

Thinking about implementing a backend for https://github.com/etsy/statsd/ using levelup. Are there any insights anyone can provide me with :) ?

rvagg · 2013-07-02T06:53:58Z

cc @wolfeidau

mcollina · 2013-07-02T09:31:06Z

The main issue with the statsd + graphite integration is that they are painfully hard to setup.
Node.js+LevelUp can provide a low-scale solution that Just Works in a npm install.

juliangruber · 2013-07-02T09:32:44Z

also have a look at tsd and levelweb.

jpillora · 2013-07-02T11:24:29Z

@mcollina exactly what I had in mind, pure node statsd, though I think we'd need to make it more like a RRD to make it usable...

dominictarr · 2013-07-02T17:21:32Z

what does RRD stand for?

ralphtheninja · 2013-07-02T20:00:33Z

Round robin database http://en.wikipedia.org/wiki/RRDtool ?

mcollina · 2013-07-02T21:57:35Z

If you want to expire stuff, you can try https://github.com/rvagg/node-level-ttl, expiring values older than X.
It works well, better than MongoDB ttl support.

wolfeidau · 2013-07-02T22:32:21Z

I had a shot at using leveldb for storing data similar to how RRD and graphite whisper files work and ran into a few challenges.

Firstly some background, RRD doesn't store the time series data, it stores a rolling series of values based on things like average, mean percentile. RRD pre-allocates the buckets for the data being stored, say averages for the intervals 1min for a week, 1hour for a month and 1 day for a year. When time series data is fed to RRD it updates these buckets to reflect the changing average across the time periods.

So back to leveldb, in my case I employed a rather simplistic sort of continuous map reduce job where data was fed in and rolled into the aggregates based on a trigger. This trigger had quite a bit of work to do, it would update each of the windows I had specified.

The resulting implementations main flaw was it just stored way to much data, this was mainly because of how level map reduce works.

I moved onto hacking on another implementation using the raw triggers and my own state table, however this had issues again with data volume and how much i churned through leveldb.

That said all is not lost, there are people using log structured data stores for this kind of data I just haven't had a chance to search for papers or ideas on how to adapt this type of data to leveldb.

jpillora · 2013-07-03T01:52:16Z

@mcollina just using the TTL isn't enough for a round robin database

Ahhh @wolfeidau, thanks for the writeup. I did think of using map though I had a feeling that there'd be a better way that involves less recomputation. I did think it may involve some statistical optimisation, which would require some math-smarts, nevertheless, here's my LevelDB RRD design:

So a Round Robin Database is essentially a circular buffer, and let's say our circular buffer can store 1Mb of data. We need to fit this data not in an array, but in a set of sorted key-value pairs. So, if we use the key naming convention:

rrd-data-<epoch time>

the data will be sorted from oldest to newest. Maybe an extra key rrd-total-size to store the current size (or we could use approximateSize).

Note: the each entry will look something like:

{
  "counters": {
    "statsd.bad_lines_seen": 0,
    "statsd.packets_received": 98,
    "bucket": 26
  },
  "timers": {},
  "gauges": {
    "gaugor": 303
  },
  "timer_data": {},
  "counter_rates": {
    "statsd.bad_lines_seen": 0,
    "statsd.packets_received": 9.8,
    "bucket": 2.6
  },
  "sets": [
    [
      "5"
    ]
  ],
  "pctThreshold": [
    90
  ]
}

So the compression step, would be: as the size reaches our arbitrary limit, we'll stream off as much of the oldest data (top of the stream) as required to fit in the new entries and statistically combine the old values it into a single value (the data would need to include the range somehow).

This would cause every batch of data to trigger this "compression" process, and I'm not how well this would perform.

Thoughts?

jpillora · 2013-07-03T04:09:46Z

Also note, this is just a rough outline, we would need to make the compress algorithm smarter, so we're not only combining the oldest data. Instead we need to combine the data by specific time periods. What we want - for example - is to use 33% capacity for data within now to -1month another 33% for -1month to -6months and then the remaining -6months to beginning of time. With configuration to set these thresholds and time periods. Also, it would handy to be able to set the amount of granularity for each compression step - though maybe by default, each compression may steps might follow S -> M -> h -> d -> m -> y. Also we need to only allow combination of the data of the same approximate granularity, for example, there'd be no point in combining 5s with 1d, the 5s data entry would just be swallowed up. We want to go for evenly balanced data. Anyway, will probably run into more issues, though that's all I've got so far. Especially if we're scaling up to GBs of data...

jpillora · 2013-07-06T01:10:51Z

Probably won't get time to start this for a few weeks, so if anyone else does, please post the link to the repo here 😄

jpillora · 2018-12-18T04:26:25Z

Blast from the past! I'll close this now, there's a lot of tools nowadays to do this: InfluxDB, Prometheus, etc

ralphtheninja · 2018-12-18T09:23:25Z

Fair enough. I realized I was jumping the gun on closing a lot of issues. Changed my mind and re-opened and moved to community repo instead, because a lot of the discussions were really interesting to keep around.

ralphtheninja closed this as completed Dec 17, 2018

ralphtheninja reopened this Dec 18, 2018

ralphtheninja changed the title ~~Question: Is LevelDB a good fit for a RRD ?~~ Is LevelDB a good fit for a RRD ? Dec 18, 2018

ralphtheninja transferred this issue from Level/levelup Dec 18, 2018

jpillora closed this as completed Dec 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is LevelDB a good fit for a RRD ? #55

Is LevelDB a good fit for a RRD ? #55

jpillora commented Jul 2, 2013

rvagg commented Jul 2, 2013

mcollina commented Jul 2, 2013

juliangruber commented Jul 2, 2013

jpillora commented Jul 2, 2013

dominictarr commented Jul 2, 2013

ralphtheninja commented Jul 2, 2013

mcollina commented Jul 2, 2013

wolfeidau commented Jul 2, 2013

jpillora commented Jul 3, 2013

jpillora commented Jul 3, 2013

jpillora commented Jul 6, 2013

jpillora commented Dec 18, 2018

ralphtheninja commented Dec 18, 2018

Is LevelDB a good fit for a RRD ? #55

Is LevelDB a good fit for a RRD ? #55

Comments

jpillora commented Jul 2, 2013

rvagg commented Jul 2, 2013

mcollina commented Jul 2, 2013

juliangruber commented Jul 2, 2013

jpillora commented Jul 2, 2013

dominictarr commented Jul 2, 2013

ralphtheninja commented Jul 2, 2013

mcollina commented Jul 2, 2013

wolfeidau commented Jul 2, 2013

jpillora commented Jul 3, 2013

jpillora commented Jul 3, 2013

jpillora commented Jul 6, 2013

jpillora commented Dec 18, 2018

ralphtheninja commented Dec 18, 2018