Skip to content
This repository has been archived by the owner on Mar 27, 2021. It is now read-only.

Distribution Storage Requirement #673

Open
ao2017 opened this issue Jul 8, 2020 · 1 comment
Open

Distribution Storage Requirement #673

ao2017 opened this issue Jul 8, 2020 · 1 comment
Assignees
Labels
project: adding distributions Adding distributions to Heroic

Comments

@ao2017
Copy link
Contributor

ao2017 commented Jul 8, 2020

To improve percentiles accuracy, we are introducing a new datatype to support distribution. This task is to evaluate the storage requirements of the new data type.

@project-bot project-bot bot added this to Inbox in Observability Kanban Jul 8, 2020
@lmuhlha lmuhlha moved this from Inbox to To do in Observability Kanban Jul 8, 2020
@lmuhlha lmuhlha moved this from To do to In progress in Observability Kanban Jul 8, 2020
@lmuhlha lmuhlha added the project: adding distributions Adding distributions to Heroic label Jul 8, 2020
@ao2017
Copy link
Contributor Author

ao2017 commented Jul 17, 2020

STORAGE IN MOTION ( Memory)

Current Histogram
For every histogram by default heroic currently emits 7 data points per reporting interval. Those data points Include mean, max, min, mean, median, stdev, P99 and P75. Click here for code reference. Each data point also includes metadata ( key ,tags and attributes).
Assuming a reporting interval of 30 seconds.

Number of byte per minute = 7* NumOfSource*(MetadataSize + PointValue +PointTimestamp )2
Number of byte per minute = 14
NumOfSource*(MetadataSize + 16)

New Histogram ( TDigest)
The new histogram will emit one datapoint per reporting interval
Number of byte per minute = (1038 + MetadataSize + 8)*NumberOfSource =

This was done using Tdigest and smallSizeByte serialization. With regular serialization
and typical latency dataset distribution the size is about 2048bytes.

Conclusion
The new histogram will require more memory if the metadata size is less than 64 characters. In the typical metric setup the size of the metadata is more than 64 characters.

STORAGE AT REST ( Big table)
Current Histogram:
We currently save 7 data points per source for each histogram.

Storage Per Histogram = NumberofSource* ( PointValue + PointTimestamp)7 + rowKeySize
Storage Per Histogram in byte = NumberOfSource
(8 + 16) + rowKeySize = NumberOfSource*24 + rowKeySize

New Histogram(TDigest):

We don’t have to store data points from each source because we are interested in the actual distribution of the data. So data points from each source are merged before storage.

Storage Per Histogram = ( PointValue + PointTimestamp) + rowKeySize
Storage Per Histogram in byte =(1038 + 16) + rowKeySize = 1054 + rowKeySize

Conclusion
The new histogram will require more storage unless the number of sources is greater than 44.
If we store data from each source the amount of data store with the new histogram will increase by a factor of 1054.

@lmuhlha lmuhlha moved this from In progress to Ready to deploy in Observability Kanban Jul 17, 2020
@lmuhlha lmuhlha moved this from Ready to deploy to Done in Observability Kanban Jul 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
project: adding distributions Adding distributions to Heroic
Projects
Development

No branches or pull requests

2 participants