Distribution Storage Requirement #673

ao2017 · 2020-07-08T12:25:36Z

To improve percentiles accuracy, we are introducing a new datatype to support distribution. This task is to evaluate the storage requirements of the new data type.

ao2017 · 2020-07-17T13:52:09Z

STORAGE IN MOTION ( Memory)

Current Histogram
For every histogram by default heroic currently emits 7 data points per reporting interval. Those data points Include mean, max, min, mean, median, stdev, P99 and P75. Click here for code reference. Each data point also includes metadata ( key ,tags and attributes).
Assuming a reporting interval of 30 seconds.

Number of byte per minute = 7* NumOfSource*(MetadataSize + PointValue +PointTimestamp )2
Number of byte per minute = 14NumOfSource*(MetadataSize + 16)

New Histogram ( TDigest)
The new histogram will emit one datapoint per reporting interval
Number of byte per minute = (1038 + MetadataSize + 8)*NumberOfSource =

This was done using Tdigest and smallSizeByte serialization. With regular serialization
and typical latency dataset distribution the size is about 2048bytes.

Conclusion
The new histogram will require more memory if the metadata size is less than 64 characters. In the typical metric setup the size of the metadata is more than 64 characters.

STORAGE AT REST ( Big table)
Current Histogram:
We currently save 7 data points per source for each histogram.

Storage Per Histogram = NumberofSource* ( PointValue + PointTimestamp)7 + rowKeySize
Storage Per Histogram in byte = NumberOfSource(8 + 16) + rowKeySize = NumberOfSource*24 + rowKeySize

New Histogram(TDigest):

We don’t have to store data points from each source because we are interested in the actual distribution of the data. So data points from each source are merged before storage.

Storage Per Histogram = ( PointValue + PointTimestamp) + rowKeySize
Storage Per Histogram in byte =(1038 + 16) + rowKeySize = 1054 + rowKeySize

Conclusion
The new histogram will require more storage unless the number of sources is greater than 44.
If we store data from each source the amount of data store with the new histogram will increase by a factor of 1054.

project-bot bot added this to Inbox in Observability Kanban Jul 8, 2020

lmuhlha moved this from Inbox to To do in Observability Kanban Jul 8, 2020

lmuhlha moved this from To do to In progress in Observability Kanban Jul 8, 2020

lmuhlha assigned ao2017 Jul 8, 2020

lmuhlha added the project: adding distributions Adding distributions to Heroic label Jul 8, 2020

lmuhlha moved this from In progress to Ready to deploy in Observability Kanban Jul 17, 2020

lmuhlha moved this from Ready to deploy to Done in Observability Kanban Jul 17, 2020

ao2017 mentioned this issue Sep 3, 2020

Add distribution metric to semantic-core and ffwd-reporter spotify/semantic-metrics#79

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distribution Storage Requirement #673

Distribution Storage Requirement #673

ao2017 commented Jul 8, 2020

ao2017 commented Jul 17, 2020

Distribution Storage Requirement #673

Distribution Storage Requirement #673

Comments

ao2017 commented Jul 8, 2020

ao2017 commented Jul 17, 2020