Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get chunking code ready for extraction #1941

Closed
fabxc opened this Issue Sep 2, 2016 · 13 comments

Comments

Projects
None yet
4 participants
@fabxc
Copy link
Member

fabxc commented Sep 2, 2016

Conceptually this does not seem necessary for chunk encoding in general and is a blocker for making the chunk encodings public to be reused by alternative storages. So this issue would be mostly about verifying that there is no dependency on the length being exactly 1024.

Also, I cannot find any tests for doubledelta and varbit encoding.

@beorn7

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Sep 2, 2016

If you are looking for separate low-level unit tests pre chunk encoding: There are only very few in delta_test.go and varbit_test.go). The main tests are in storage_test.go (those where each chunk encoding has to deliver the same results). You'll find many tests there that go through each chunk encoding, e.g. https://github.com/prometheus/prometheus/blob/master/storage/local/storage_test.go#L1491-L1501 . Test coverage for chunks is pretty good. (I think we had no encoding error ever in a released version.)

WRT to chunk length (the actual issue here): In principle, everything should refer to https://github.com/prometheus/prometheus/blob/master/storage/local/storage.go#L37 , which you should thus be able to change. However, that has never been tried, so there might be code paths that don't deal properly with that, and there are for sure certain limitations, e.g. wherever we save an offset within a chunk in a certain number of bytes, we cannot go above certain lengths.

As an example: https://github.com/prometheus/prometheus/blob/master/storage/local/varbit.go#L51 . A uint16 can go up to 65536, i.e. that limits the max chunk length to 8k.

There are certainly other, similar uses of offsets, which might limit the chunk length even more.

Then there is a minimum chunk length, like header and footer of a varbit chunk must not meet.
Upper and lower limit have to be established firmly, and we would need some tests for that.

@fabxc

This comment has been minimized.

Copy link
Member Author

fabxc commented Sep 2, 2016

Thanks for the insight.

Yes, we certainly have to define upper and lower bounds for each. Yes, from my quick check the global constant is used consistently. I mostly wonder whether we have any implicit assumptions of chunkLen being a power of two for example.
I could imagine chunks being embedded in further layers, which also have headers or footers, hence leaving 2^n - lenHeader bytes for the sample chunk.

Fuzz testing is certainly a very good approach here but I mostly think of it as an addition.
The code was certainly written with a lot of diligence and is possibly entirely correct, but things involving that much bit-shifting should have dedicated unit-tests directly addressing all possible states.
That's certainly a TODO item when extracting this into a chunk package. Then we should probably also drag down the fuzz testing and not (or better, not only) run it as an e2e test.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Sep 2, 2016

FWIW, I'm pretty sure I never baked in any power-of-2 assumptions into the original chunking code or the delta encoding. I'd be surprised if the varbit encoding cared about that, but I'm not an expert on that.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Sep 2, 2016

I don't recall any power-of-2 assumptions, either.

@fabxc if you care about the unit tests, you should probably file a separate issue as this one is titled quite differently. For the record: I only wrote limited unit tests as I could see how the higher-level tests would explore all the different code paths and exposed many bugs back then. Or in other words: We haven't had any bugs in released code not because the code is so diligently written but because the test coverage is so good and detected many bugs in the (apparently not so diligently written) code.

@fabxc fabxc changed the title Remove hard-coded chunk length Get chunking code ready for extraction Sep 2, 2016

@fabxc

This comment has been minimized.

Copy link
Member Author

fabxc commented Sep 2, 2016

Renamed the issue instead to capture anything we have to do to make the chunking functionality a public package.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Sep 3, 2016

I see. With the chunking code extracted, the integrated tests in storage_test.go (which is more than just the fuzzing) are not a natural part of it anymore. One could see the Prometheus storage as a test framework for the chunk implementation. ;)

Anyway, the issue title is meaningful enough now.

@fabxc

This comment has been minimized.

Copy link
Member Author

fabxc commented Oct 10, 2016

The things discussed here were completely ignored and a chunk package extracted. @juliusv

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Oct 10, 2016

I understand @juliusv 's chunk extraction as WIP. There are a number of things to clean up.

@fabxc

This comment has been minimized.

Copy link
Member Author

fabxc commented Oct 10, 2016

It's out there and being used now.

In reality, I cannot recall many incidents where extensive cleanup and
testing was added once the changes fixing the use case at hand were pushed
out.

It's not an orthogonal problem - it's tech debt.

On Mon, Oct 10, 2016, 4:23 PM Björn Rabenstein notifications@github.com
wrote:

I understand @juliusv https://github.com/juliusv 's chunk extraction as
WIP. There are a number of things to clean up.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1941 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEuA8oy58UOpFCm08cbAcxMWO8MslRFgks5qykp9gaJpZM4JzZEy
.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented Oct 10, 2016

But tech debt was not increased by the change. The lack of separation was only better excused because we could claim there is deliberately no separation. Now that we want the separation, it is tech debr that the concerns are not separated in the code. Since we are not talking about publicly advertised APIs or libraries, I find it useful to take an incremental approach. Also, the main user of the separated package is the person who separated the package, namely @juliusv . So he has some incentive to keep working on it.

But I leave it to @juliusv himself to justify.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Oct 10, 2016

Apologies, I simply did not have this issue and its demands on the radar when I did the extraction. I also think that it doesn't introduce more tech debt - you can even argue that it reduces it by at least attempting to draw some kind of clearer (still horribly messy) boundary between the chunk world and the rest of the storage.

Personally, just from a code correctness standpoint, I trust the storage fuzz tests and other high-level tests to exercise the chunk implementations quite thoroughly. I can see that it would be nicer from a code hygiene perspective to have tests directly in the package so that things like coverage reporting would work and people trust the package as a standalone entity without knowing its context. I see that as something desirable, but not super high priority.

@gouthamve

This comment has been minimized.

Copy link
Member

gouthamve commented Jun 14, 2017

Umm, with us moving to tsdb in 2.0, can this issue be closed?

@fabxc fabxc closed this Jul 3, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.