Skip to content

Revert "Add support for caching bloomfilters (#1204)" #1256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jarifibrahim
Copy link
Contributor

@jarifibrahim jarifibrahim commented Mar 12, 2020

This reverts commit 4676ca9.

Reading bloom filters from the cache leads to severe performance degradation.
Result of read bench tool with key size 32 bytes and value size 128 bytes (Note: data is not compressed or encrypted)

1. With bloom filters in cache
	Average read speed		: 97.37 KB/s
	Total bytes read in 1 minute	: 3.7 MB

2. With bloom filters stored in memory (no cache)
	Average read speed		: 41.3833 MB/s
	Total bytes read in 1 minute	: 2.4 GB

The performance degradation is a result of the way ristretto handles item eviction from the cache. For instance, the following was the state of cache after opening 65 tables

badger 2020/03/12 18:01:41 INFO: All 65 tables opened in 1.034s

hit: 0              keys-added: 194     keys-evicted: 172              
miss: 130           keys-updated: 1     cost-added: 7911996649                
gets-kept: 0        gets-dropped: 0     cost-evicted: 6942514434 
sets-dropped: 0     sets-rejected: 0    gets-total: 130
hit-ratio: 0.00

The important metric here is the keys-added: 194 and keys-evicted: 172. This means we have only 22 items in the cache, which includes the bloom filters and the sst blocks. So most of the time the bloom filter has to be read from the SST which is very expensive.

Fixes #1255, #1254 and #1248


This change is Reviewable

This reverts commit 4676ca9.

Reading bloom filters from the cache leads to severe preformance
degradation.
Result of read bench tool
1. With bloom filters in cache
	Average read speed		: 97.37 KB/s
	Total bytes read in 1 minute	: 3.7 MB

2. With bloom filters stored in memory (no cache)
	Average read speed		: 41.3833 MB/s
	Total bytes read in 1 minute	: 2.4 GB
@jarifibrahim jarifibrahim requested a review from a team March 12, 2020 12:44
jarifibrahim pushed a commit that referenced this pull request Mar 16, 2020
#1256 showed that a single
cache might not be enough to store the data blocks and the bloom
filters.
This commit adds a separate cache for the bloom filters. This
commit also adds a new flag `LoadBloomsOnOpen` which determines
if the bloom filters should be loaded when the table is opened on
or not.

The default value of `MaxBfCacheSize` is `zero` and 
`LoadBloomsOnOpen` is true.

This change has significant performance improvement on read speeds
because a single cache would lead to bloom filter eviction and we
would read the bloom filter from the disk.
@jarifibrahim
Copy link
Contributor Author

This PR is no longer necessary. The issue has been fixed by #1260

@jarifibrahim jarifibrahim deleted the ibrahim/revert-bloomfilter-cache branch March 17, 2020 11:42
jarifibrahim pushed a commit that referenced this pull request Mar 24, 2020
#1256 showed that a single
cache might not be enough to store the data blocks and the bloom
filters.
This commit adds a separate cache for the bloom filters. This
commit also adds a new flag `LoadBloomsOnOpen` which determines
if the bloom filters should be loaded when the table is opened on
or not.

The default value of `MaxBfCacheSize` is `zero` and 
`LoadBloomsOnOpen` is true.

This change has significant performance improvement on read speeds
because a single cache would lead to bloom filter eviction and we
would read the bloom filter from the disk.

(cherry picked from commit eaf64c0)
manishrjain pushed a commit to outcaste-io/outserv that referenced this pull request Jul 6, 2022
hypermodeinc/badger#1256 showed that a single
cache might not be enough to store the data blocks and the bloom
filters.
This commit adds a separate cache for the bloom filters. This
commit also adds a new flag `LoadBloomsOnOpen` which determines
if the bloom filters should be loaded when the table is opened on
or not.

The default value of `MaxBfCacheSize` is `zero` and 
`LoadBloomsOnOpen` is true.

This change has significant performance improvement on read speeds
because a single cache would lead to bloom filter eviction and we
would read the bloom filter from the disk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

v2 much slower then v1.6.0
1 participant