Improve retention mark files. #3706

cyriltovena · 2021-05-10T11:37:29Z

This PR is two fold:

Rotate mark files when they reached 100k marks, this is to ensure we max out marks file
and don't create file that may be too big.
Instead of inserting marks using the chunk id as the key, we insert using a natural sequence.
This has two benefits:

Keys are ordered, and so insertion are faster.
Allows us to use boltdb fill percent to 100%, this means boltdb won't over allocate for inserting key in between unordered data.

Why ?

I realize that when inserting huge amount of marks, this operation can take up to hours since boltdb has to re-allocate pages over and over.
This is mainly because chunkid arrive without specific order.

Signed-off-by: Cyril Tovena cyril.tovena@gmail.com

sandeepsukhani · 2021-05-10T11:59:35Z

This is mainly because chunkid arrive without specific order.

I might be wrong, but since boltdb iterates over keys in order, we should be getting chunk ids in order.

cyriltovena · 2021-05-10T15:18:36Z

That's correct although I didn't wanted to rely on this, technically only the rotation and fill percent was required for the improvement. But I went ahead make that change just in case so we have this guarantee whatever the input.

cyriltovena · 2021-05-17T07:34:23Z

Ping @sandeepsukhani should we revert the mark file system or are you ok with the approach ?

sandeepsukhani

I just added a suggestion and a comment for a possible issue.

sandeepsukhani · 2021-05-17T10:09:20Z

pkg/storage/stores/shipper/compactor/retention/marker.go

+		return err
+	}
+	binary.BigEndian.PutUint64(m.buf, id) // insert in order using sequence id.
+	if err := m.bucket.Put(m.buf, chunkID); err != nil {


I think reusing a buf for keys might cause a problem because the comment here says:
Supplied value must remain valid for the life of the transaction.
What do you think?

the key is cloned, but I'm thinking about the chunkId.

Yeah you're right going to keep the fillpercent but revert this change.

pkg/storage/stores/shipper/compactor/retention/marker.go

This PR is two fold: 1) Rotate mark files when they reached 100k marks, this is to ensure we max out marks file and don't create file that may be too big. 2) Instead of inserting marks using the chunk id as the key, we insert using a natural sequence. This has two benefits: - Keys are ordered, and so insertion are faster. - Allows us to use boltdb fill percent to 100%, this means boltdb won't over allocate for inserting key in between unordered data. Why ? I realize that when inserting huge amount of marks, this operation can take up to hours since boltdb has to re-allocate pages over and over. This is mainly because chunkid arrive without specific order. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com>

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

sandeepsukhani

LGTM!

pull-request-size bot added the size/L label May 10, 2021

sandeepsukhani reviewed May 17, 2021

View reviewed changes

cyriltovena and others added 4 commits May 18, 2021 09:27

Improve key buf sizing.

c3bce43

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

lint

46d713d

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Update pkg/storage/stores/shipper/compactor/retention/marker.go

fe6e422

Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com>

cyriltovena force-pushed the better-marker branch from 5f5ad9c to fe6e422 Compare May 18, 2021 07:28

cyriltovena added 2 commits May 18, 2021 09:44

Copy the value so that it still available for the whole boltdb tx.

20368f9

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

typo

0ab3e97

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

sandeepsukhani approved these changes May 18, 2021

View reviewed changes

cyriltovena merged commit e1a3ab8 into grafana:main May 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve retention mark files. #3706

Improve retention mark files. #3706

cyriltovena commented May 10, 2021

sandeepsukhani commented May 10, 2021 •

edited

cyriltovena commented May 10, 2021

cyriltovena commented May 17, 2021

sandeepsukhani left a comment

sandeepsukhani May 17, 2021

cyriltovena May 17, 2021

cyriltovena May 17, 2021

sandeepsukhani left a comment

Improve retention mark files. #3706

Improve retention mark files. #3706

Conversation

cyriltovena commented May 10, 2021

sandeepsukhani commented May 10, 2021 • edited

cyriltovena commented May 10, 2021

cyriltovena commented May 17, 2021

sandeepsukhani left a comment

Choose a reason for hiding this comment

sandeepsukhani May 17, 2021

Choose a reason for hiding this comment

cyriltovena May 17, 2021

Choose a reason for hiding this comment

cyriltovena May 17, 2021

Choose a reason for hiding this comment

sandeepsukhani left a comment

Choose a reason for hiding this comment

sandeepsukhani commented May 10, 2021 •

edited