tsm1 storage engine reports strange spikes in values #4357

dswarbrick · 2015-10-07T15:39:20Z

Using collectd to feed perpetually increasing, counter-type values (e.g. eth0 rx/tx packet counters) into InfluxDB with tsm1 results in strange outliers occasionally:

SELECT "value" FROM "iolatency_read" WHERE "host" = 'foo.example.com' AND "instance" = 'md400' AND "type" = 'req_latency' AND "type_instance" = 'lt_8' AND time > now() - 10m

1444231716020334000 3.73705375e+08
1444231736022761000 3.73705896e+08
1444231756020388000 3.73706677e+08
1444231776016345000 3.73707002e+08
1444231796016225000 3.73708699e+08
1444231816019400000 3.73708855e+08
1444231836020839000 -8.551788012208318e-238
1444231856020667000 3.73709352e+08
1444231876022680000 3.737098e+08
1444231896014367000 3.7371048e+08
1444231916020391000 3.73711277e+08
1444231936020522000 3.73711405e+08

This in turn causes derivative(mean("value"), 1s) to go bananas, and results in unusable Grafana graphs.

Nothing has changed on the sending (collectd) side - only the storage engine in InfluxDB.

The text was updated successfully, but these errors were encountered:

jwilder · 2015-10-07T15:44:41Z

Are you using the native collectd plugin?

jwilder · 2015-10-07T16:43:32Z

I'm able to reproduce some odd values using the native collectd plugin.

dswarbrick · 2015-10-07T17:25:24Z

@jwilder Yes, I'm using the native collectd plugin, just as before, when running 0.94.2.

jmcook · 2015-10-07T18:32:11Z

I'm seeing this even with telegraf data, specifically in the mem_available_percent measurement.

jwilder · 2015-10-07T18:33:45Z

I've verified that there is a bug in the float encoding when similar values are recorded. Working on a fix.

jmcook · 2015-10-07T18:38:49Z

sweet, thanks Jason.

johnl · 2015-10-07T20:46:46Z

I had a similar problem back in July: https://groups.google.com/forum/#!searchin/influxdb/collectd$20john$20leach/influxdb/ZUZj6m-29mk/YiZTnOiyCQAJ

dgryski · 2015-10-07T20:46:59Z

You're going to want to pull in dgryski/go-tsz@918e888 too

jwilder · 2015-10-07T20:58:11Z

@dgryski Thanks. Did not see that other change.

If similar float values were encoded, the number of leading bits would overflow the 5 available bits to store them (e.g. store 33 in 5 bits). When decoding, the values after the overflowed value would spike to very large and small values. To prevent the overflow, we clamp the value to 31 which is the maximum number of leading zero bits we can encoded. Fixes #4357

johnl · 2015-10-09T11:10:01Z

This looks fixed to me. I've not had one strange metric for interface_rx/if_packets since I deployed this code yesterday. Previously I'd had them fairly regularly! Thanks!

jwilder self-assigned this Oct 7, 2015

jwilder mentioned this issue Oct 7, 2015

Encoding similar values fails to decode properly dgryski/go-tsz#4

Closed

jwilder mentioned this issue Oct 7, 2015

Fix similar float values encoding overflow #4362

Merged

jwilder added the 2 - Working label Oct 7, 2015

jwilder closed this as completed in #4362 Oct 7, 2015

jwilder removed the 2 - Working label Oct 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tsm1 storage engine reports strange spikes in values #4357

tsm1 storage engine reports strange spikes in values #4357

dswarbrick commented Oct 7, 2015

jwilder commented Oct 7, 2015

jwilder commented Oct 7, 2015

dswarbrick commented Oct 7, 2015

jmcook commented Oct 7, 2015

jwilder commented Oct 7, 2015

jmcook commented Oct 7, 2015

johnl commented Oct 7, 2015

dgryski commented Oct 7, 2015

jwilder commented Oct 7, 2015

johnl commented Oct 9, 2015

tsm1 storage engine reports strange spikes in values #4357

tsm1 storage engine reports strange spikes in values #4357

Comments

dswarbrick commented Oct 7, 2015

jwilder commented Oct 7, 2015

jwilder commented Oct 7, 2015

dswarbrick commented Oct 7, 2015

jmcook commented Oct 7, 2015

jwilder commented Oct 7, 2015

jmcook commented Oct 7, 2015

johnl commented Oct 7, 2015

dgryski commented Oct 7, 2015

jwilder commented Oct 7, 2015

johnl commented Oct 9, 2015