Reduce amount of locking in caches #53

lpereira · 2014-09-25T01:43:15Z

Investigate how feasible it would be to reduce the amount of locks in the caching infrastructure.

One possible way of doing this is to, in addition to the current cache infrastructure, to have a hash table per CPU, each one containing proxy entries for the master hash table. Each proxy entry holds one reference to the master entry (updated with atomic operations), and will have its own reference count (which does not require atomic operations to update). Lookups on these hash tables also require no locking. Removing items from these tables could happen as soon as the last proxy reference is dropped (at which time the reference for the master hash table is also dropped).

It should be possible to implement this without changing the API.

This sacrifices some memory but potentially beats replicating the whole cache per CPU.

schets · 2016-01-12T21:57:31Z

This sacrifices some memory but potentially beats replicating the whole cache per CPU.

If by this you mean that each cache entry would also be duplicated, I don't think that's good idea. It would pollute the shared cpu cache with a bunch of repeat data. This may not be something that shows up in a microbenchmark, but it certainly matters for real workloads.

I imagine the best architecture would be a multi-level cache system based on the master + proxy cache system you described. A worker would first check the local proxy cache for an item and if it didn't find it, it would then check the master cache. If it found the item there it would pull that into the local cache. Otherwise, it would perform the request and write the result into both the master and local cache. This means there wouldn't have to be any complicated mechanism for notify workers of a new cache entry and workers wouldn't do extra work adding unused entries.

The master hash table could also be replaced by something which allows multiple readers (and maybe writers) - something like this table allows multiple wait-free readers, but would require writer locking. A structure like a Hash Trie would allow multiple readers and writers with very good scaling and latency properties, but would have slower reads than the single-reader hash table and isn't technically O(1) on reads (although I believe reads are still wait-free).

I'll look into to that this week - using the same locking global cache for a multi-level architecture shouldn't be too hard, and I'll do some benchmarking to see if a lock-free global cache would have serious benefit.

lpereira · 2016-01-12T22:39:53Z

Ah, no, I meant a proxy struct per item per thread that requested that item:

struct cache_entry_proxy {
    cache_entry_t *entry;
    int refs;
};

The entry field would point to the shared cache item, and while refs is more than zero, it would hold a reference to the shared cache entry. This would effectively remove fast path synchronization: no atomic operations to increment/decrement references, no locks, etc. These would only happen when whe cache is cold, or if the current thread does not have a proxied entry.

Anyway, I appreciate this work. Please keep me posted!

(Funny you mention a "hash trie", as I'm working on a data structure with that name, although it seems different from the linked article. It's not for concurrent use, as I pretend to use it for URL routing purposes.)

lpereira added Concurrency Refactor labels Sep 25, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce amount of locking in caches #53

Reduce amount of locking in caches #53

lpereira commented Sep 25, 2014

schets commented Jan 12, 2016

lpereira commented Jan 12, 2016

Reduce amount of locking in caches #53

Reduce amount of locking in caches #53

Comments

lpereira commented Sep 25, 2014

schets commented Jan 12, 2016

lpereira commented Jan 12, 2016