address data corruption in chunk encoding for long chunks with >=4.55 hours of nulls #1126

Dieterbe · 2018-10-31T19:03:07Z

gorilla / go-tsz only has 14bits for the delta between the t0 of the chunk and the first point.
2^14 is 16384 seconds so about 4.55 hours.
Thus:

when using long chunks (e.g. 6 hours), and >=4.55 hours between the start of a chunk and the first point, the delta overflows and data corruption ensues. If the delta is less than 9h and the chunk has more than 1 datapoint, it is recoverable at read time, see below
If the delta is >= 4.55 hours and there is only 1 point, we cannot recover at read time
If the delta is >= 9.10 hours (this also requires chunks of more than 9 hours) the data is also not recoverable at read time

More detailed explanation of the problem based on an example testcase "no data for 5 hours, then 1 hour of 60s dense data"

it starts with t0 is 1540728000, which is stored in full.

the first point should have delta of 5h (18000), but due to the 14bit overflow
we instead store the delta of 1616 (18000-2^14)

When decoding the first point, the timestamp should have been:
1540728000 + 18000 = 1540746000
instead we get:
1540728000 + 1616 = 1540729616

From the 2nd point and onwards, we use delta of delta encoding.
so the 2nd point has a dod of -17940 (because delta should change from 18000 to 60s),
this is supported fine and we store this dod.

However, at decode time, what should have happened is:
1540746000 + 18000 -17940 = 1540746060
Instead, what happens is:
1540729616 + 1616 -17940 = 1540713292

all subsequent points have dod 0, so instead of delta:
60+0=60s
they get:
(1616 -17940) + 0 = -16324
and they keep going back in time

Dieterbe · 2018-10-31T19:10:20Z

As for how to address, there's 2 separate concerns: remediation and long term fix

remediation (being able to decode the corrupted chunk data we currently have)

This can use some more thinking, but I see 2 solutions:

points should never go back in time, so if delta+dod < 0 we can use 2^14+delta+dod instead, though this requires decoding upto the 2nd point just to get the correct value for the first point, which doesn't work well with our pointwise iterator api
we can give "hints" to our iterator. pretty sure all our >4h chunks are rollup chunks, which always have points at a timestamp that is divisible by an interval such as 30, 60, etc. When this is the case, we can tell the iterator about it, so when it decodes the first point and the timestamp is not clean it can try adding 2^14 to the delta. though we may have one or two deployments for customers with very large intervals (e.g. 1h) that have large chunks with raw data

long term fix

new chunk format that uses 15 bits? looking a bit deeper though, it seems strange to store the t0 and a delta at all. we may as well just store the timestamp of the first point in full I think.

Dieterbe · 2018-11-01T09:40:36Z

points should never go back in time, so if delta+dod < 0 we can use 2^14+delta+dod instead, though this requires decoding upto the 2nd point just to get the correct value for the first point, which doesn't work well with our pointwise iterator api

i've gone with this approach for now. when reading the first point, we clone the stream, read the upcoming dod, make adjustments as needed, and restore the stream.
this can't recover the point if there's only a single point in the chunk, and there's also the clone (allocation) hit, but otherwise seems like a decent remediation.

Dieterbe · 2018-11-01T10:08:33Z

using a quick and dirty benchmark with docker-dev-custom-cfg-kafka and

fakemetrics feed --kafka-mdm-addr localhost:9092 --mpo 10000
echo 'GET http://localhost:6060/render?target=some.id.of.a.metric.1*&from=-30min' | vegeta attack -rate 5 -duration 5000s  | vegeta report

alloc_objects overhead of go-tsz.(*bstream).clone is about 0.8% which is the thing i was most interested in. alloc_space about 0.15% (and doesn't show up for inuse which is to be expected).

woodsaj

I dont see any changes here to how chunks are handled. Just changes to comments/docs a unit test and an updated dependency.

Dieterbe · 2018-11-01T12:38:39Z

@woodsaj look at the last 2 commits, they modify tsz.go ; because it's vendored, GH doesn't show the changes by default. if we take this path, i'll move these into our go-tsz fork so that dep doesn't complain.

devdocs/gorilla-compression.md

replay · 2018-11-01T13:40:17Z

vendor/github.com/dgryski/go-tsz/tsz.go

+			return true
+		}
+		if dod+int32(tDelta) < 0 {
+			it.tDelta += 16384


Am I understanding it right that this will only work if the tDelta has been wrapped around once, but f.e. if we had 10h chunks then this fix might not work if the tDelta has been wrapped twice? I guess for us that's fine, it's just something we should remember.

correct.
note that the plan is to, after putting the remediation in place, deprecate this chunk format asap (at least for >4h chunks) and start using a chunk format that doesn't have this bug.

I wonder if that copying of .br could be saved if .dod() would have a small read buffer that can store 1 return value. If there's a value in the read buffer when .dod() is called it returns the content of the read buffer and clears it, otherwise it goes to the bstream as it does now. After that first call site of .dod() where we know that on the next call to .dod() we'll need to get the same value returned one more time, we could then just put that value into the read buffer. That would save the copying of that bstream, but it makes everything a little more complicated, so it might not be worth it.

robert-milan

I think your current fix works for most cases. As replay already pointed out though, there is a possibility that it will wrap more than once, considering our max chunkspan is 24 hours. I think this also means we will need to use 17 bits to cover the entire range, if we decide to pursue that course of action.

As to the implementation, based on your numbers it doesn't seem like the extra copying / allocation is a big deal. If that changes we could look at implementing a peek function to avoid the allocations, although that of course brings its own computational overhead. Just a thought.

Other than it looks good to me.

replay

Looks good to me.
I added one comment about how I think it could possibly improved, but I'm not sure if my suggestion is really better than the current state. I also prefer if some more people take a look at this, because I'm not fully confident about it, as this isn't simple.

This reverts commit 3dc1937, reversing changes made to 7d3a8c5.

Dieterbe added 4 commits October 31, 2018 10:29

clarify TTL in seconds

d06fd9d

clarify table properties (preFactorWindow) etc generation

657c705

accurate display of ttl

cac7680

add failing tests

db03765

Dieterbe force-pushed the chunk-4h-sparse-bugfix branch from 656a61c to 38c708d Compare November 1, 2018 10:17

woodsaj reviewed Nov 1, 2018

View reviewed changes

replay reviewed Nov 1, 2018

View reviewed changes

devdocs/gorilla-compression.md Outdated Show resolved Hide resolved

replay reviewed Nov 1, 2018

View reviewed changes

Dieterbe requested review from DanCech and robert-milan November 1, 2018 16:46

robert-milan reviewed Nov 2, 2018

View reviewed changes

replay approved these changes Nov 2, 2018

View reviewed changes

Dieterbe changed the title ~~WIP: address data corruption in chunk encoding for long chunks with >=4.55 hours of nulls~~ address data corruption in chunk encoding for long chunks with >=4.55 hours of nulls Nov 2, 2018

Dieterbe added 4 commits November 2, 2018 16:50

gorilla / go-tsz devdocs

ada1aae

update to latest go-tsz. fix #310

e661edc

single out reading the delta of delta

f84a7bc

fix reading of corrupted chunks as long as they have >1 point in them

b0e5043

Dieterbe force-pushed the chunk-4h-sparse-bugfix branch from 38c708d to 8ab46e9 Compare November 2, 2018 15:59

Dieterbe added 2 commits November 2, 2018 17:04

copy our patched go-tsz into our own package

567946d

remove unused function

a3e8219

Dieterbe force-pushed the chunk-4h-sparse-bugfix branch from 8ab46e9 to a3e8219 Compare November 2, 2018 16:04

clarify

7951cf0

Dieterbe merged commit 3dc1937 into master Nov 2, 2018

Dieterbe mentioned this pull request Nov 4, 2018

Chunk 4h sparse bugfix longterm + remediation for single-point old format chunks #1129

Merged

Dieterbe added a commit that referenced this pull request Nov 27, 2018

Revert "Merge pull request #1126 from grafana/chunk-4h-sparse-bugfix"

5d34f1a

This reverts commit 3dc1937, reversing changes made to 7d3a8c5.

Dieterbe deleted the chunk-4h-sparse-bugfix branch March 27, 2019 21:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

address data corruption in chunk encoding for long chunks with >=4.55 hours of nulls #1126

address data corruption in chunk encoding for long chunks with >=4.55 hours of nulls #1126

Dieterbe commented Oct 31, 2018 •

edited

Loading

Dieterbe commented Oct 31, 2018 •

edited

Loading

Dieterbe commented Nov 1, 2018

Dieterbe commented Nov 1, 2018 •

edited

Loading

woodsaj left a comment

Dieterbe commented Nov 1, 2018

replay Nov 1, 2018

Dieterbe Nov 1, 2018

replay Nov 2, 2018 •

edited

Loading

robert-milan left a comment

replay left a comment

address data corruption in chunk encoding for long chunks with >=4.55 hours of nulls #1126

address data corruption in chunk encoding for long chunks with >=4.55 hours of nulls #1126

Conversation

Dieterbe commented Oct 31, 2018 • edited Loading

Dieterbe commented Oct 31, 2018 • edited Loading

remediation (being able to decode the corrupted chunk data we currently have)

long term fix

Dieterbe commented Nov 1, 2018

Dieterbe commented Nov 1, 2018 • edited Loading

woodsaj left a comment

Choose a reason for hiding this comment

Dieterbe commented Nov 1, 2018

replay Nov 1, 2018

Choose a reason for hiding this comment

Dieterbe Nov 1, 2018

Choose a reason for hiding this comment

replay Nov 2, 2018 • edited Loading

Choose a reason for hiding this comment

robert-milan left a comment

Choose a reason for hiding this comment

replay left a comment

Choose a reason for hiding this comment

Dieterbe commented Oct 31, 2018 •

edited

Loading

Dieterbe commented Oct 31, 2018 •

edited

Loading

Dieterbe commented Nov 1, 2018 •

edited

Loading

replay Nov 2, 2018 •

edited

Loading