Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Following is a common list of questions usually asked on IRC or blueflood-discuss google groups:
Does Blueflood have a UI component?
No. Today, Blueflood is a timeseries data-store with APIs. We're primarily focused on making Blueflood a high-quality backend. We do, however, have plans to provide a modified Graphite-web front-end that is compatible with Blueflood.
Do you support other methods of queries apart from querying by metric name?
Not currently. We call this tags based querying. It is a work in progress.
What is your story for indexing metric names?
We have been playing around with using Elasticsearch for our indexing needs. The blueflood-elasticsearch module should be considered an experimental feature at this point.
Can Blueflood accept metrics from StatsD?
Yes. We just added support to accept metrics from StatsD. blueflood-statsd-backend is also an experimental feature.
What kind of rollups do you support?
Rollups occur at 5, 20, 60, 240, and 1440 (1day) minute resolution. We support two types of numeric input data, basic and pre-aggregated.
Basic data is made up of pairs of (timestamp, numeric value) with an associated metric name. These are rolled up into (min, max, variance, average). We also build histograms from these samples.
Pre-aggregated input data represents multiple numeric values associated with a timestamp and metric name. We support StatsD metric types: SET, TIMER, GAUGE, COUNTER.
Can you do custom rollups?
There is currently no easy way for us to do this.
How would I go about storage cost estimation if I want to deploy blueflood?
Storage estimate is a function of the following things:
- Number of individual metrics
- Number of datapoints you want to retain per granularity (TTL)
- Data type of datapoints (Numeric or string or boolean)
- Rollup type (Basic or histogram or set or timer or gauge)
All numeric metrics use variable length encoding. That means more shorter ints you have the average cost of storage comes down. If all your datapoints are doubles, there is no optimization happening. I'd suggest that you plan for the worst. So each numeric datapoint would be 8 bytes. This is for raw samples. You then multiply this number by number of such points you want to keep. You do that for number of metrics you want to store..
Basic rollup includes numPoint (int), min (double or int or long), max (double or int or long), average (double or int or long)and variance (double). You can do the math to figure out the worst case storage cost for a single rollup. You then multiply this number by number of such points you want to keep. You do that for number of metrics you want to store. You repeat the procedure for every granularity.
If you keep iterating this procedure for every rollup type, you'd end up with a number. Now you want to split the data into different nodes in Cassandra. Depending on the server type you use and amount of disk you have per node, you'd figure out the size of the cluster.
At some point, we'd have a tool to compute the storage cost so this would be easier for you.
My question wasn't answered, or I want to contribute. Where do I start?
Welcome! We are a friendly bunch of people and would love to help you use or, better yet, contribute to Blueflood. The fastest way to get in touch is via IRC, in #blueflood on Freenode. Feel free to send an email to our mailing list, or to create a Github Issue. We're always happy to help out.