-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evict cache items more efficiently #5172
Conversation
When our object cache is full, we pick eight items (or the whole cache, if there are fewer) and evict them. For small cache sizes, this is fine, but when we're dealing with a large number of objects, we can repeatedly exhaust the cache and spend a large amount of time in git_oidmap_iterate trying to find items to evict. Instead, let's assume that if the cache gets full, we have a large number of objects that we're handling, and be more aggressive about evicting items. Let's remove one item for every 2048 items, but not less than 8. This causes us to scale our evictions in proportion to the size of the cache and significantly reduces the time we spend in git_oidmap_iterate. Before this change, a full pack of all the non-blob objects in the Linux repository took in excess of 30 minutes and spent 62.3% of total runtime in odb_read_1 and its children, and 44.3% of the time in git_oidmap_iterate. With this change, the same operation now takes 14 minutes and 44 seconds, and odb_read_1 accounts for only 35.9% of total time, whereas git_oidmap_iterate consists of 6.2%. Note that we do spend a little more time inflating objects and a decent amount more time in memcmp. However, overall, the time taken is significantly improved, and time in pack building is now dominated by git_delta_create_from_index (33.7%), which is what we would expect.
Before profile chart: pprof1.pdf Note that |
@@ -115,9 +115,12 @@ void git_cache_dispose(git_cache *cache) | |||
/* Called with lock */ | |||
static void cache_evict_entries(git_cache *cache) | |||
{ | |||
size_t evict_count = 8, i; | |||
size_t evict_count = git_cache_size(cache) / 2048, i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this makes a lot of sense to me. Did you experiment with different divisors here? Just asking out of curiosity as I wonder how you arrived at 2048
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I first started off with 16384 and then went down to 8192. With 8192, we were still taking 20% of pack time in git_oidmap_iterate
. With 2048, we're spending 6.2% there, which is much lower.
I didn't want to go down much more because each eviction can start to get expensive with a 256 MB cache, which means that if we use a medium-sized repository where we only end up evicting once or twice, eviction costs may be much more significant. While I want to improve performance for repositories that are Linux-sized or larger, I also don't want to regress it for smaller repositories.
We also have to consider that as we evict more entries, we do more re-reads from the actual object database, since some of the items we had cached are now gone. 2048 is still an improvement in this area, since total time in odb_read_1
and its children are down. It's possible that 1024 would also work, but I didn't test it.
I picked a power of two because it's cheaper to compute in a hot path and the exact number doesn't matter too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks a lot for explaining!
When our object cache is full, we pick eight items (or the whole cache, if there are fewer) and evict them. For small cache sizes, this is fine, but when we're dealing with a large number of objects, we can repeatedly exhaust the cache and spend a large amount of time in
git_oidmap_iterate
trying to find items to evict.Instead, let's assume that if the cache gets full, we have a large number of objects that we're handling, and be more aggressive about evicting items. Let's remove one item for every 2048 items, but not less than 8. This causes us to scale our evictions in proportion to the size of the cache and significantly reduces the time we spend in
git_oidmap_iterate
.Before this change, a full pack of all the non-blob objects in the Linux repository took in excess of 30 minutes and spent 62.3% of total runtime in
odb_read_1
and its children, and 44.3% of the time ingit_oidmap_iterate
. With this change, the same operation now takes 14 minutes and 44 seconds, andodb_read_1
accounts for only 35.9% of total time, whereasgit_oidmap_iterate
consists of 6.2%.Note that we do spend a little more time inflating objects and a decent amount more time in memcmp. However, overall, the time taken is significantly improved, and time in pack building is now dominated by
git_delta_create_from_index
(33.7%), which is what we would expect.