return err instead of panic for corrupted chunk #6040

krasi-georgiev · 2019-09-19T11:19:28Z

check that the chunk segment has enough data to read all chunk pieces.

fixes: #5991
fixes: thanos-io/thanos#1467

@zhulongcheng - while reviewing your pr in #5991 there were many things I didn't understand so had to refactor and rename few variables to make the workflow more clear. Feel free to copy and paste the code from here or just review this PR.

Again sorry for hijacking the PR, I just did soo many changes to understand the code properly that it only made sense to open a separate PR

check that the chunk segment has enough data to read all chunk pieces. fixes: prometheus#5991 Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

CHANGELOG.md

tsdb/chunks/chunks.go

tsdb/block_test.go

zhulongcheng · 2019-09-19T13:06:14Z

Nice, and thanks for help this. 👍

(just added some comments. I am sorry if these comments disturb this pr)

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

krasi-georgiev · 2019-09-20T10:30:40Z

@zhulongcheng updated, thanks for the review!

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

codesome

If I understand correctly, the panic is avoided by

prometheus/tsdb/chunks/chunks.go

Lines 515 to 517 in 2d710a9

    
           if chkEnd > sgmBytes.Len() { 
        
           	return nil, errors.Errorf("segment doesn't include enough bytes to read the chunk - required:%v, available:%v", chkEnd, sgmBytes.Len()) 
        
           }

and adding +maxChunkLengthFieldSize here

prometheus/tsdb/chunks/chunks.go

Lines 499 to 501 in 2d710a9

    
           if sgmChunkStart+maxChunkLengthFieldSize > sgmBytes.Len() { 
        
           	return nil, errors.Errorf("segment doesn't include enough bytes to read the chunk size data field - required:%v, available:%v", sgmChunkStart+maxChunkLengthFieldSize, sgmBytes.Len()) 
        
           }

right?

PS: I haven't checked the changes in the tests yet

tsdb/chunks/chunks.go

krasi-georgiev · 2019-09-24T14:10:42Z

If I understand correctly, the panic is avoided by...

correct

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

…idioms-comments Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

krasi-georgiev

I tried ti simplify it a bit and added a test to ensure the correct behavior.

tsdb/chunks/chunks.go

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

krasi-georgiev · 2019-09-30T09:57:28Z

ping @codesome

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

krasi-georgiev · 2019-10-09T09:10:21Z

@codesome @zhulongcheng would appreciate one final review before merging this.

codesome · 2019-10-10T09:20:26Z

Taking a look at this today

codesome

1 possible enhancement, LGTM otherwise

tsdb/chunks/chunks.go

krasi-georgiev · 2019-11-11T16:49:42Z

Thanks appreciated.

…ioms-comments Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

krasi-georgiev · 2019-11-14T07:24:55Z

@codesome just resolved the conflict with master to ready for a review when you find the time.

krasi-georgiev · 2019-11-18T05:01:08Z

ping @zhulongcheng @codesome

tsdb/chunks/chunks.go

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

…ioms-comments Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

tsdb/db_test.go

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

zhulongcheng

LGTM 👍

…ioms-comments Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

bwplotka

LGTM, just some suggestions 👍

tsdb/chunks/chunks.go

tsdb/db_test.go

tsdb/chunks/chunks.go

tsdb/compact.go

tsdb/db_test.go

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

krasi-georgiev · 2019-12-03T08:52:10Z

all comments addressed, thanks!

krasi-georgiev · 2019-12-04T07:38:14Z

@codesome if you see any other problems please ping me and will open another PR

codesome · 2019-12-04T07:43:20Z

Was planning to review today, but the changes looked fine earlier and the above approvals should be enough :)

krasi-georgiev · 2019-12-04T19:52:52Z

It is not late :) if you see anything let me know and will open another PR.

codesome

Here you go! Hopefully, we can make this small change before the next release

codesome · 2019-12-05T06:00:54Z

tsdb/chunks/chunks.go

+			batchID++
+			batchSize = chkSize
+		}
+		batches[batchID] = chks[batchStart : i+1]


This should ideally be done when (1) We cut a new batch (2) When its the last chunk. Else it is going to create a new slide header for every chunk.

Additionally (not a blocker), we could write the chunks as soon as we hit this case instead of collecting multiple batches, what say?

This should ideally be done when (1) We cut a new batch (2) When its the last chunk. Else it is going to create a new slide header for every chunk.

hm, I am not sure I understand the idea, can you give an example or maybe even test it and if it passes the tests just open a PR and I will review quickly.

Additionally (not a blocker), we could write the chunks as soon as we hit this case instead of collecting multiple batches, what say?

I think I tried this, but there was some other problem there, can't remember exactly. Maybe again try it and if it passes the tests than I will review quickly.

I thought getting a sub-slice in every iteration would cause extra allocations, but running BenchmarkCompaction shows me that apparently it does not. Additionally, I see some regression in performance in ns/op of nearly 7-9%, but I cannot say if it is this PR itself, so that would need some pprof action I suppose.

Thanks for running the bench test. I did it this way as it was easyer to follow, and I also did a prombench test and didn't see any difference in the performance.

the regression might be due to the extra cheksum checking.

krasi-georgiev added 3 commits September 19, 2019 13:46

Fix tsdb panic when querying corrupted chunks.

84c16df

check that the chunk segment has enough data to read all chunk pieces. fixes: prometheus#5991 Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

add padding to the segment header

8288d8a

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

nits

08316f7

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

krasi-georgiev requested review from codesome and gouthamve September 19, 2019 11:20

zhulongcheng reviewed Sep 19, 2019

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

zhulongcheng reviewed Sep 19, 2019

View reviewed changes

tsdb/chunks/chunks.go Outdated Show resolved Hide resolved

zhulongcheng reviewed Sep 19, 2019

View reviewed changes

tsdb/chunks/chunks.go Outdated Show resolved Hide resolved

zhulongcheng reviewed Sep 19, 2019

View reviewed changes

tsdb/block_test.go Outdated Show resolved Hide resolved

krasi-georgiev self-assigned this Sep 19, 2019

nits

589a96d

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

krasi-georgiev force-pushed the demistify-chunks-idioms-comments branch from 238bac4 to f9875b6 Compare September 23, 2019 14:11

comment for the chunk size error

2d710a9

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

krasi-georgiev force-pushed the demistify-chunks-idioms-comments branch from f9875b6 to 2d710a9 Compare September 23, 2019 14:12

codesome reviewed Sep 24, 2019

View reviewed changes

krasi-georgiev added 3 commits September 25, 2019 14:58

WIP

5a948ba

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

Merge remote-tracking branch 'upstream/master' into demistify-chunks-…

6c899e1

…idioms-comments Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

refactor, simplify and add tests.

38cf588

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

krasi-georgiev commented Sep 26, 2019

View reviewed changes

tsdb/chunks/chunks.go Show resolved Hide resolved

tsdb/chunks/chunks.go Show resolved Hide resolved

review comment

39ae9c9

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

krasi-georgiev added 2 commits September 30, 2019 13:05

simplify the return

8c89b1d

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

nits

ae5e315

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

codesome self-assigned this Oct 9, 2019

codesome reviewed Oct 10, 2019

View reviewed changes

tsdb/chunks/chunks.go Outdated Show resolved Hide resolved

tsdb/chunks/chunks.go Outdated Show resolved Hide resolved

tsdb/chunks/chunks.go Outdated Show resolved Hide resolved

krasi-georgiev added 3 commits November 14, 2019 14:07

Merge remote-tracking branch 'origin/master' into demistify-chunks-id…

455e11b

…ioms-comments Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

resolved conflicts

bbd8757

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

nit

e9c121e

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

zhulongcheng reviewed Nov 19, 2019

View reviewed changes

tsdb/chunks/chunks.go Show resolved Hide resolved

krasi-georgiev added 3 commits November 20, 2019 02:10

fix a bug

cf7b313

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

Merge remote-tracking branch 'origin/master' into demistify-chunks-id…

6e895d0

…ioms-comments Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

fix tests

71b69b9

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

zhulongcheng reviewed Nov 20, 2019

View reviewed changes

tsdb/db_test.go Outdated Show resolved Hide resolved

re-enable comented test

ed73794

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

zhulongcheng approved these changes Nov 21, 2019

View reviewed changes

krasi-georgiev added 5 commits November 29, 2019 10:48

Merge remote-tracking branch 'origin/master' into demistify-chunks-id…

fd1947b

…ioms-comments Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

change error mesage formating

a18862f

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

renamings

cbd9e09

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

check for the zero length chunk size

8108c6a

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

correct the test

0027927

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

bwplotka mentioned this pull request Nov 29, 2019

Add checksums to block chunk files. thanos-io/thanos#1787

Closed

bwplotka approved these changes Dec 2, 2019

View reviewed changes

nits

ddc7370

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

krasi-georgiev merged commit 549164f into prometheus:master Dec 4, 2019

krasi-georgiev deleted the demistify-chunks-idioms-comments branch December 4, 2019 07:37

codesome reviewed Dec 5, 2019

View reviewed changes

brian-brazil mentioned this pull request Dec 24, 2019

checksum mismatch breaking Grafana queries #6512

Closed

JensErat mentioned this pull request Jun 25, 2020

Compactor/S3: sometimes segments are not being fully downloaded ("segment doesn't include enough bytes") thanos-io/thanos#2805

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

return err instead of panic for corrupted chunk #6040

return err instead of panic for corrupted chunk #6040

krasi-georgiev commented Sep 19, 2019 •

edited

Loading

zhulongcheng commented Sep 19, 2019 •

edited

Loading

krasi-georgiev commented Sep 20, 2019

codesome left a comment

krasi-georgiev commented Sep 24, 2019

krasi-georgiev left a comment

krasi-georgiev commented Sep 30, 2019

krasi-georgiev commented Oct 9, 2019

codesome commented Oct 10, 2019

codesome left a comment

krasi-georgiev commented Nov 11, 2019

krasi-georgiev commented Nov 14, 2019

krasi-georgiev commented Nov 18, 2019

zhulongcheng left a comment

bwplotka left a comment

krasi-georgiev commented Dec 3, 2019

krasi-georgiev commented Dec 4, 2019

codesome commented Dec 4, 2019

krasi-georgiev commented Dec 4, 2019

codesome left a comment

codesome Dec 5, 2019

krasi-georgiev Dec 5, 2019

codesome Dec 5, 2019

krasi-georgiev Dec 6, 2019

	if chkEnd > sgmBytes.Len() {
	return nil, errors.Errorf("segment doesn't include enough bytes to read the chunk - required:%v, available:%v", chkEnd, sgmBytes.Len())
	}

	if sgmChunkStart+maxChunkLengthFieldSize > sgmBytes.Len() {
	return nil, errors.Errorf("segment doesn't include enough bytes to read the chunk size data field - required:%v, available:%v", sgmChunkStart+maxChunkLengthFieldSize, sgmBytes.Len())
	}

return err instead of panic for corrupted chunk #6040

return err instead of panic for corrupted chunk #6040

Conversation

krasi-georgiev commented Sep 19, 2019 • edited Loading

zhulongcheng commented Sep 19, 2019 • edited Loading

krasi-georgiev commented Sep 20, 2019

codesome left a comment

Choose a reason for hiding this comment

krasi-georgiev commented Sep 24, 2019

krasi-georgiev left a comment

Choose a reason for hiding this comment

krasi-georgiev commented Sep 30, 2019

krasi-georgiev commented Oct 9, 2019

codesome commented Oct 10, 2019

codesome left a comment

Choose a reason for hiding this comment

krasi-georgiev commented Nov 11, 2019

krasi-georgiev commented Nov 14, 2019

krasi-georgiev commented Nov 18, 2019

zhulongcheng left a comment

Choose a reason for hiding this comment

bwplotka left a comment

Choose a reason for hiding this comment

krasi-georgiev commented Dec 3, 2019

krasi-georgiev commented Dec 4, 2019

codesome commented Dec 4, 2019

krasi-georgiev commented Dec 4, 2019

codesome left a comment

Choose a reason for hiding this comment

codesome Dec 5, 2019

Choose a reason for hiding this comment

krasi-georgiev Dec 5, 2019

Choose a reason for hiding this comment

codesome Dec 5, 2019

Choose a reason for hiding this comment

krasi-georgiev Dec 6, 2019

Choose a reason for hiding this comment

krasi-georgiev commented Sep 19, 2019 •

edited

Loading

zhulongcheng commented Sep 19, 2019 •

edited

Loading