Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upGet chunking code ready for extraction #1941
Comments
This comment has been minimized.
This comment has been minimized.
|
If you are looking for separate low-level unit tests pre chunk encoding: There are only very few in WRT to chunk length (the actual issue here): In principle, everything should refer to https://github.com/prometheus/prometheus/blob/master/storage/local/storage.go#L37 , which you should thus be able to change. However, that has never been tried, so there might be code paths that don't deal properly with that, and there are for sure certain limitations, e.g. wherever we save an offset within a chunk in a certain number of bytes, we cannot go above certain lengths. As an example: https://github.com/prometheus/prometheus/blob/master/storage/local/varbit.go#L51 . A uint16 can go up to 65536, i.e. that limits the max chunk length to 8k. There are certainly other, similar uses of offsets, which might limit the chunk length even more. Then there is a minimum chunk length, like header and footer of a varbit chunk must not meet. |
beorn7
added
kind/enhancement
component/local storage
labels
Sep 2, 2016
This comment has been minimized.
This comment has been minimized.
|
Thanks for the insight. Yes, we certainly have to define upper and lower bounds for each. Yes, from my quick check the global constant is used consistently. I mostly wonder whether we have any implicit assumptions of Fuzz testing is certainly a very good approach here but I mostly think of it as an addition. |
This comment has been minimized.
This comment has been minimized.
|
FWIW, I'm pretty sure I never baked in any power-of-2 assumptions into the original chunking code or the delta encoding. I'd be surprised if the varbit encoding cared about that, but I'm not an expert on that. |
This comment has been minimized.
This comment has been minimized.
|
I don't recall any power-of-2 assumptions, either. @fabxc if you care about the unit tests, you should probably file a separate issue as this one is titled quite differently. For the record: I only wrote limited unit tests as I could see how the higher-level tests would explore all the different code paths and exposed many bugs back then. Or in other words: We haven't had any bugs in released code not because the code is so diligently written but because the test coverage is so good and detected many bugs in the (apparently not so diligently written) code. |
fabxc
changed the title
Remove hard-coded chunk length
Get chunking code ready for extraction
Sep 2, 2016
This comment has been minimized.
This comment has been minimized.
|
Renamed the issue instead to capture anything we have to do to make the chunking functionality a public package. |
This comment has been minimized.
This comment has been minimized.
|
I see. With the chunking code extracted, the integrated tests in Anyway, the issue title is meaningful enough now. |
This comment has been minimized.
This comment has been minimized.
|
The things discussed here were completely ignored and a chunk package extracted. @juliusv |
This comment has been minimized.
This comment has been minimized.
|
I understand @juliusv 's chunk extraction as WIP. There are a number of things to clean up. |
This comment has been minimized.
This comment has been minimized.
|
It's out there and being used now. In reality, I cannot recall many incidents where extensive cleanup and It's not an orthogonal problem - it's tech debt. On Mon, Oct 10, 2016, 4:23 PM Björn Rabenstein notifications@github.com
|
This comment has been minimized.
This comment has been minimized.
|
But tech debt was not increased by the change. The lack of separation was only better excused because we could claim there is deliberately no separation. Now that we want the separation, it is tech debr that the concerns are not separated in the code. Since we are not talking about publicly advertised APIs or libraries, I find it useful to take an incremental approach. Also, the main user of the separated package is the person who separated the package, namely @juliusv . So he has some incentive to keep working on it. But I leave it to @juliusv himself to justify. |
This comment has been minimized.
This comment has been minimized.
|
Apologies, I simply did not have this issue and its demands on the radar when I did the extraction. I also think that it doesn't introduce more tech debt - you can even argue that it reduces it by at least attempting to draw some kind of clearer (still horribly messy) boundary between the chunk world and the rest of the storage. Personally, just from a code correctness standpoint, I trust the storage fuzz tests and other high-level tests to exercise the chunk implementations quite thoroughly. I can see that it would be nicer from a code hygiene perspective to have tests directly in the package so that things like coverage reporting would work and people trust the package as a standalone entity without knowing its context. I see that as something desirable, but not super high priority. |
This comment has been minimized.
This comment has been minimized.
|
Umm, with us moving to |
fabxc
closed this
Jul 3, 2017
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 23, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
fabxc commentedSep 2, 2016
•
edited
Conceptually this does not seem necessary for chunk encoding in general and is a blocker for making the chunk encodings public to be reused by alternative storages. So this issue would be mostly about verifying that there is no dependency on the length being exactly 1024.
Also, I cannot find any tests for doubledelta and varbit encoding.
@beorn7