-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Introduce block cache in badger #1066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ A review job has been created and sent to the PullRequest network.
@jarifibrahim you can click here to see the review status or cancel the code review job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided some follow-up to my initial review focusing on readability.
Reviewed with ❤️ by PullRequest
table/table_test.go
Outdated
| rand.Seed(time.Now().Unix()) | ||
| opts := Options{Compression: options.ZSTD, BlockSize: 4 * 1024, BloomFalsePositive: 0.01} | ||
| if cache == nil { | ||
| var err error | ||
| cache, err = ristretto.NewCache(&ristretto.Config{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely classified as a nit; however I think it has readability implications. ristretto.NewCache accepts a pointer value for its Config. There is limited (readability) value initializing the struct then immediately taking its pointer value. I think it is a two step process and should be written with that in mind for enhanced readability.
table/table.go
Outdated
| @@ -348,7 +351,14 @@ func (t *Table) block(idx int) (*block, error) { | |||
| if idx >= len(t.blockIndex) { | |||
| return nil, errors.New("block out of index") | |||
| } | |||
|
|
|||
| var blkKey string | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you mean block key, but I think it may also be interpreted as black key as blk is also used as short hand for the color, I do not see any harm in blockKey as an identifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes looks good. Got minor comments.
Reviewable status: 0 of 3 files reviewed, 5 unresolved discussions (waiting on @ashish-goswami, @jarifibrahim, and @manishrjain)
options.go, line 93 at r2 (raw file):
maxBatchSize int64 // max batch size in bytes cache *ristretto.Cache
should we call this as blockCache as it is caching blocks of tables. For now we are just caching table blocks in future we can cache some other things.
table/table_test.go, line 758 at r2 (raw file):
n := int(5 * 1e6) var cache, _ = ristretto.NewCache(&ristretto.Config{
This change makes cache compulsory in benchmarks, we should have both with cache and without cache benchmarks.
table/table_test.go, line 792 at r2 (raw file):
var tables []*Table var cache, err = ristretto.NewCache(&ristretto.Config{
same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, 5 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @pullrequest[bot])
options.go, line 93 at r2 (raw file):
Previously, ashish-goswami (Ashish Goswami) wrote…
should we call this as
blockCacheas it is caching blocks of tables. For now we are just caching table blocks in future we can cache some other things.
Sure. Renamed it to blockCache.
table/table.go, line 354 at r1 (raw file):
Previously, pullrequest[bot] wrote…
I know you mean block key, but I think it may also be interpreted as black key as
blkis also used as short hand for the color, I do not see any harm inblockKeyas an identifier.
BlockKey could be ambiguous since we have keys in a block and the first key could be called blockKey. I've renamed it to cacheKey
table/table_test.go, line 891 at r1 (raw file):
Previously, pullrequest[bot] wrote…
This is likely classified as a nit; however I think it has readability implications.
ristretto.NewCacheaccepts a pointer value for its Config. There is limited (readability) value initializing the struct then immediately taking its pointer value. I think it is a two step process and should be written with that in mind for enhanced readability.
Yeah. Fixed this by moving the config to a global variable. We need the same config at multiple places.
table/table_test.go, line 758 at r2 (raw file):
Previously, ashish-goswami (Ashish Goswami) wrote…
This change makes cache compulsory in benchmarks, we should have both with cache and without cache benchmarks.
Discussed this with @ashish-goswami and we decided to the benchmarks as it is.
table/table_test.go, line 792 at r2 (raw file):
Previously, ashish-goswami (Ashish Goswami) wrote…
same here.
Discussed this with @ashish-goswami and we decided to the benchmarks as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Got few comments. Defer to Shekar and Ashish.
Reviewed 3 of 3 files at r3.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @ashish-goswami, @jarifibrahim, and @pullrequest[bot])
table/table.go, line 357 at r3 (raw file):
if t.opt.Cache != nil { cacheKey = t.blockCacheKey(idx) cachedBlk, ok := t.opt.Cache.Get(cacheKey)
blk, ok :=
table/table.go, line 421 at r3 (raw file):
} if t.opt.Cache != nil { t.opt.Cache.Set(cacheKey, blk, 1)
Allow users to specify how many bytes to use for cache. And then use the cap of the block byte slice to determine the cost.
table/table.go, line 427 at r3 (raw file):
func (t *Table) blockCacheKey(idx int) string { return fmt.Sprintf("%d_%d", t.ID(), idx)
fmt.Sprintf considered harmful.
Make an uint64 or something cheap. Do not use strings. Also avoid the cost of hash.
(uint64(uint32(t.ID()))) << 32 | (uint64(idx) & 0xffffffff)
Add a check that t.ID() or block id does not exceed uint32 = 4GB - 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 7 files reviewed, 8 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @pullrequest[bot])
table/table.go, line 357 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
blk, ok :=
Done.
table/table.go, line 421 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Allow users to specify how many bytes to use for cache. And then use the
capof the block byte slice to determine the cost.
Done.
table/table.go, line 427 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
fmt.Sprintf considered harmful.
Make an uint64 or something cheap. Do not use strings. Also avoid the cost of hash.
(uint64(uint32(t.ID()))) << 32 | (uint64(idx) & 0xffffffff)Add a check that t.ID() or block id does not exceed uint32 = 4GB - 1.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 6 files reviewed, 5 unresolved discussions (waiting on @manishrjain and @pullrequest[bot])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 6 files reviewed, 7 unresolved discussions (waiting on @manishrjain and @pullrequest[bot])
db.go, line 283 at r5 (raw file):
config := ristretto.Config{ NumCounters: 5 * opt.MaxCacheSize,
Any value greater than 1 GB for maxCacheSize and numCounters greater than 5 * maxCacheSize causes OOM.
options.go, line 122 at r5 (raw file):
VerifyValueChecksum: false, Compression: options.ZSTD, MaxCacheSize: 1 << 30, // 1 GB
Any value greater than 1 GB for maxCacheSize and numCounters greater than 5 * maxCacheSize causes OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 10 files reviewed, 6 unresolved discussions (waiting on @jarifibrahim, @manishrjain, and @pullrequest[bot])
table/table_test.go, line 51 at r6 (raw file):
Compression: options.ZSTD, LoadingMode: options.LoadToRAM, BlockSize: 4 * 1024,
Should we set the block size to be the same as the block size of the underlying file system? Would that give any performance advantage in reading/writing from fs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes as I mentioned. Rest is good to go.
Reviewed 2 of 5 files at r4, 1 of 3 files at r5, 6 of 6 files at r6.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @jarifibrahim, @manishrjain, and @pullrequest[bot])
db.go, line 282 at r6 (raw file):
} config := ristretto.Config{
Please add a comment about changing block size.
db.go, line 283 at r6 (raw file):
config := ristretto.Config{ NumCounters: 10 * (opt.MaxCacheSize / int64(opt.BlockSize)),
opt.MaxCacheSize * 0.05 * 2
5% of the memory is being used for NumCounters. We can store 2 counters per byte.
db.go, line 284 at r6 (raw file):
config := ristretto.Config{ NumCounters: 10 * (opt.MaxCacheSize / int64(opt.BlockSize)), MaxCost: opt.MaxCacheSize,
Perhaps *0.95
db.go, line 286 at r6 (raw file):
MaxCost: opt.MaxCacheSize, BufferItems: 64, Metrics: false,
true. And add a way to get the metrics back.
table/table.go, line 158 at r6 (raw file):
func (b *block) size() int64 { return int64(3*int(unsafe.Sizeof(int(0))) /* Size of offset, entriesIndexStart and chkLen */ +
For unsafe.Sizeof(int(0))
So, make it a variable, so it can be calculated during runtime once. And use that for all the calls to size().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to inactivity, PullRequest has cancelled this review job. You can reactivate the code review job from the PullRequest dashboard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @manishrjain and @pullrequest[bot] from 3 discussions.
Reviewable status: 2 of 12 files reviewed, 6 unresolved discussions (waiting on @manishrjain and @shekarm)
db.go, line 282 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Please add a comment about changing block size.
Added a comment in options.go file
db.go, line 283 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
opt.MaxCacheSize * 0.05 * 25% of the memory is being used for NumCounters. We can store 2 counters per byte.
Done.
db.go, line 284 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Perhaps *0.95
Done.
db.go, line 286 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
true. And add a way to get the metrics back.
Enabling metrics caused race condition see hypermodeinc/ristretto#92 . I'll enable this once the issue with ristretto is resolved.
table/table.go, line 158 at r6 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
For
unsafe.Sizeof(int(0))So, make it a variable, so it can be calculated during runtime once. And use that for all the calls to size().
Added it to constants at the top. The unsafe.SizeOf(..) will be evaluated at compile time.
table/table_test.go, line 51 at r6 (raw file):
Previously, shekarm (Shekar Mantha) wrote…
Should we set the block size to be the same as the block size of the underlying file system? Would that give any performance advantage in reading/writing from fs?
Yes. as discussed @shekarm will figure out how to determine the block size at run time. This will be changed in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @manishrjain and @shekarm from 6 discussions.
Reviewable status: 2 of 12 files reviewed, all discussions resolved (waiting on @manishrjain)
Addressed all comments
This commit adds ristretto to badger. All the reads are done through the block cache. The block cache holds blocks in a decompressed and unencrypted state. The memory usage of cache can be changed by `MaxCacheSize` option. The default size of the cache is 1 GB.
This commit adds ristretto to badger. All the reads are done through the block cache. The block cache holds blocks in a decompressed and unencrypted state. The memory usage of cache can be changed by `MaxCacheSize` option. The default size of the cache is 1 GB.
This commit adds ristretto to badger. All the reads are done through the block cache. The block cache holds blocks in a decompressed and unencrypted state. The memory usage of cache can be changed by `MaxCacheSize` option. The default size of the cache is 1 GB.
This commit adds ristretto to badger. All the reads are done through the block cache. The block cache holds blocks in a decompressed and unencrypted state. The memory usage of cache can be changed by `MaxCacheSize` option. The default size of the cache is 1 GB.
This commit adds ristretto to badger. All the reads are done through the block cache. The block cache holds blocks in a decompressed and unencrypted state. The memory usage of cache can be changed by `MaxCacheSize` option. The default size of the cache is 1 GB. Co-authored-by: Ibrahim Jarif <jarifibrahim@gmail.com>
This PR adds a block-level cache to badger.
This change is