-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
return err instead of panic for corrupted chunk #6040
return err instead of panic for corrupted chunk #6040
Conversation
check that the chunk segment has enough data to read all chunk pieces. fixes: prometheus#5991 Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
Nice, and thanks for help this. 👍 (just added some comments. I am sorry if these comments disturb this pr) |
@zhulongcheng updated, thanks for the review! |
238bac4
to
f9875b6
Compare
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
f9875b6
to
2d710a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the panic is avoided by
prometheus/tsdb/chunks/chunks.go
Lines 515 to 517 in 2d710a9
if chkEnd > sgmBytes.Len() { | |
return nil, errors.Errorf("segment doesn't include enough bytes to read the chunk - required:%v, available:%v", chkEnd, sgmBytes.Len()) | |
} |
+maxChunkLengthFieldSize
here prometheus/tsdb/chunks/chunks.go
Lines 499 to 501 in 2d710a9
if sgmChunkStart+maxChunkLengthFieldSize > sgmBytes.Len() { | |
return nil, errors.Errorf("segment doesn't include enough bytes to read the chunk size data field - required:%v, available:%v", sgmChunkStart+maxChunkLengthFieldSize, sgmBytes.Len()) | |
} |
right?
PS: I haven't checked the changes in the tests yet
correct |
…idioms-comments Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried ti simplify it a bit and added a test to ensure the correct behavior.
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
ping @codesome |
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
@codesome @zhulongcheng would appreciate one final review before merging this. |
Taking a look at this today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 possible enhancement, LGTM otherwise
Thanks appreciated. |
…ioms-comments Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
@codesome just resolved the conflict with master to ready for a review when you find the time. |
ping @zhulongcheng @codesome |
…ioms-comments Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
…ioms-comments Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some suggestions 👍
all comments addressed, thanks! |
@codesome if you see any other problems please ping me and will open another PR |
Was planning to review today, but the changes looked fine earlier and the above approvals should be enough :) |
It is not late :) if you see anything let me know and will open another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you go! Hopefully, we can make this small change before the next release
batchID++ | ||
batchSize = chkSize | ||
} | ||
batches[batchID] = chks[batchStart : i+1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should ideally be done when (1) We cut a new batch (2) When its the last chunk. Else it is going to create a new slide header for every chunk.
Additionally (not a blocker), we could write the chunks as soon as we hit this case instead of collecting multiple batches, what say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should ideally be done when (1) We cut a new batch (2) When its the last chunk. Else it is going to create a new slide header for every chunk.
hm, I am not sure I understand the idea, can you give an example or maybe even test it and if it passes the tests just open a PR and I will review quickly.
Additionally (not a blocker), we could write the chunks as soon as we hit this case instead of collecting multiple batches, what say?
I think I tried this, but there was some other problem there, can't remember exactly. Maybe again try it and if it passes the tests than I will review quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought getting a sub-slice in every iteration would cause extra allocations, but running BenchmarkCompaction
shows me that apparently it does not. Additionally, I see some regression in performance in ns/op
of nearly 7-9%, but I cannot say if it is this PR itself, so that would need some pprof action I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for running the bench test. I did it this way as it was easyer to follow, and I also did a prombench test and didn't see any difference in the performance.
the regression might be due to the extra cheksum checking.
check that the chunk segment has enough data to read all chunk pieces.
fixes: #5991
fixes: thanos-io/thanos#1467
@zhulongcheng - while reviewing your pr in #5991 there were many things I didn't understand so had to refactor and rename few variables to make the workflow more clear. Feel free to copy and paste the code from here or just review this PR.
Again sorry for hijacking the PR, I just did soo many changes to understand the code properly that it only made sense to open a separate PR