Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a feature 'strong cas', developed from 'lease' that mentioned in Facebook's paper. #65

Closed
wants to merge 2 commits into from

Conversation

zwChan
Copy link
Contributor

@zwChan zwChan commented Apr 16, 2014

I m working on a trading system, and getting stale data for the system is unaccepted at most of the time. But the high throughput make it impossible to get all data from mysql. So i want to make it more reliable when use memcache as a cache. Facebook's paper "Scaling Memcache at Facebook" mentions a method called ‘lease' and 'mcsqueal', but the mcsqueal is difficult for my case, because it is hard to get the key for mysql.
Adding the 'strong cas' feature is devoted to solve the following typical problems, client A and Client B want to update the same key, and A(set key=>1)update database before B(set key=>2):
key not exist in cache: (A get-miss)->(B get-miss)->(B set key=2) -> (A set key=1);
or key exist in cache: (A delete key)->(B delete key)->(B set key=2) -> (A set key=1);
Some thing Wrong! the key=2 in database but key=1 in cache.

It is possible to happen in a high concurrent system, and i don't find a way to solve it with the current cas method. So i add two command 'getss' and 'deletess', they will create a lease and return a cas-unique, or tell the client there already exist lease on the server. the client can do something to prevent stale data. such as wait, or invalidate the pre-lease.
I also think the lease is a concept of 'dirty lock', because anybody try to update it will replace itself expiration to the lease's expiration(the lease's expiration time should be very short), so in the worst case(low probability), the stale data only exist in cache for a short time. It is accepted for most app in my case.

For more detail information, please read doc/strongcas.txt. And hoping for u guys suggestion _


new file:   doc/strongcas.txt
modified:   items.c
modified:   memcached.c
modified:   memcached.h
modified:   slabs.c
modified:   t/binary.t
modified:   t/lib/MemcachedTest.pm
modified:   t/stats.t
new file:   t/strongcas.t
modified:   thread.c

zwChan and others added 2 commits April 16, 2014 17:34
Strong cas:
---------------------
    It is devoted to prevent setting stale data to the cache at multi-clients
environment, and also prevent thundering herds phenomenon.
----------------------
  For more detail information, please read doc/strongcas.txt. ~_~

	new file:   doc/strongcas.txt
	modified:   items.c
	modified:   memcached.c
	modified:   memcached.h
	modified:   slabs.c
	modified:   t/binary.t
	modified:   t/lib/MemcachedTest.pm
	modified:   t/stats.t
	new file:   t/strongcas.t
	modified:   thread.c
Conflicts:
	memcached.c
	memcached.h
	t/binary.t
	t/stats.t

	modified:   Makefile.am
	modified:   NEWS
	modified:   assoc.c
	modified:   configure.ac
	modified:   doc/CONTRIBUTORS
	modified:   doc/memcached.1
	modified:   doc/protocol.txt
	modified:   hash.c
	modified:   hash.h
	modified:   items.c
	modified:   items.h
	renamed:    hash.c -> jenkins_hash.c
	new file:   jenkins_hash.h
	modified:   memcached.c
	modified:   memcached.h
	modified:   memcached.spec.in
	new file:   murmur3_hash.c
	new file:   murmur3_hash.h
	modified:   slabs.c
	modified:   stats.c
	modified:   t/binary.t
	new file:   t/issue_260.t
	new file:   t/lru-crawler.t
	modified:   t/stats.t
	modified:   thread.c
@jstasiak
Copy link

jstasiak commented Nov 4, 2015

Just so it happens that my company could also use this feature, I'm surprised that after more than a year since proposing this change there are no comments (whether positive or not).

I can't speak for the code but the description sounds sensible to me.

Pinging @dormando (sorry if the wrong person, I see you're the most active committer right now). What's needed to have this merged? (apart from rebasing the changes of course)

@jstasiak
Copy link

jstasiak commented Nov 4, 2015

Alternatively: @zwChan maybe you found a memcached alternative that works in your application; if so - I'll be happy to hear about it.

@dormando
Copy link
Member

It's clear that some kind of lease support will be necessary, but getting this merged is a lot of work. It's doing a lot of fiddling with resources and I have some better ideas of how to get leases into the protocol if we're adding new commands anyway.

Sorry for not commenting on it earlier. I believe it was a mailing list thread which referred to the PR.

@deepub
Copy link

deepub commented Jan 24, 2017

Hi @dormando ,

It seems the good folks at Facebook have a custom fork that provides leases. Is there any prioritization around this feature? At the end of the day, we basically need a way to provide CAS operations for data consistency in a distributed memcached world. CAS works for point-to-point operations.

Would it be possible to leverage Facebook's implementation for leases?

@dormando
Copy link
Member

The good folks at facebook had no interest in upstream for a long time, and there wasn't too much demand for this so it was a lower priority.

That said, there's a chance it'll get implemented in the main tree relatively soon.

I'm not sure what you mean by leases vs point to point CAS though. Are you using binary protocol CAS?

In binary protocol (unless time has wiped enough of my memory); CAS is fully integrated into the protocol. This means you call "add keythatgotexpired" with a TTL of 60s, and in the response contains the CAS value of that item. That should effectively be the lease you then need to use to SET again. Binprot can SET with a CAS value and will error outside of that.

So far as I understand with facebook leases (According to the paper) it's pretty similar. There are a few differences, I think:

  1. It collapses a roundtrip: "get and get a lease token back" vs "get, miss, then run an ADD to get a CAS token to use for a later exclusive SET"
  2. It factors deletes and stale values. This is a loosening on consistency, which isn't what you're strictly asking about.

It's a little sad because we came pretty close with binprot. The INCR/DECR commands have the ability to seed a value if it doesn't exist; which removes the roundtrip cycle of running an INCR, miss, ADD, (if error, run INCR again). If there were a flag to GET to ADD a default value and return the CAS on a first miss that would be a lot closer. (or maybe more simply; and ADD with an option of returning the key data on a hit).

Still wouldn't handle stale values though :)

Anyway; TL;DR I'm looking into it soon.

@deepub
Copy link

deepub commented Jan 25, 2017

I should've given you more context initially. Sorry about that. I am using mcrouter with multiple memcached servers in a sharded distributed replicated pool setup. As a result, we are restricted to using the Memcache ASCII protocol, not the binary protocol (mcrouter restriction).

When I mentioned CAS operations restricting us for point-to-point operations, I meant that CAS ops works great if the call goes from a client to a single Memcached server. Consider a scenario where we have a Memcached server A and a second replica Memcached server B. A client makes a gets() call, is routed to server A and gets a CAS token - T1. Now, A subsequent cas() call with token T1 gets routed to server B (server A could've gone down or a flappy network switch).

At this point, the CAS token may be different on server B resulting in a failure.

Thanks a lot for your detailed explanation.

@dormando
Copy link
Member

That protocol thing is unfortunate.

The lease tokens are still created on the server itself; I think it would be a mcrouter feature to handle failover case properly. It should still be able to (theoretically) do it with CAS in the replicated case.

But, again, moot due to protocol. I'll have to look to see how it handles leases during replication/failover.

@adsr
Copy link

adsr commented Sep 6, 2017

Gentle bump. We'd find this feature very useful as well.

@slawomir-pryczek
Copy link

Interesting...

  1. i'm thinking that stampede/herding issue could be mitigated for key K, just by issuing atomic ADD of PREFIX+K, after half of K's TTL is passed. If addition of this "can re-generate" key succeeded we can proceed to re-generate, so only one process at once can re-generate the value. Easy to do using current memcached.

  2. For issues with concurrent inserts, it can easily be done by CAS, with versioning. You can store item's version at the beginning of the key's value, then you can use CAS to swap the item out ONLY if its version is <= new item's version.

I think FB's lease is just to limit network traffic (because it can be done you just need data to be send back and forth). I implemented both of these (versions and gettint TTL, on top of normal memcached by appending it to key's values and it worked great with close to 1 billion request daily).

I also wrote cache server (inspired by memcached), which supports mechanism for mitigating stampeding...

Wg. we have key X=abc, with TTL=10
Multiple threads reading value of K's
If a thread sees that TTL < 5

  • add internal key "regenerating-X" with TTL 5
  • if add succeeded - immediately increase X's TTL back to 10 and return additional data to client (including CAS), indicating he's responsible for generating new value.

Status/Op: [H]itHHHH <- HALF OF TTL -> [M]issHHH[S]etHHH <- HALF OF TTL -> MHHH....

Each item has CAS which is updated after change, so it's also ensured that if the item fails to be re-generated it'll expire normally instead of being "revived" indefinitely.

For implementing (2) on the caching server, it'll just require 2 atomic functions ...

  • get_version($key)
  • set_new_version($key, $value, $version) ... which will set the key if version will be higher than current

Anyway i could use some more testing and feedback of my caching server (has sharding and built-in replication), if someone wants to try send me a message, and if you think you can use it in your project i can add versions support ;) Currently driver is PHP only and protocol is very easy to implement, but not compatible with memcached.

@dormando
Copy link
Member

finally closing this, as #484 is going in. it's not exactly the same as this but covers a lot more use cases.

better late than never, maybe?

@dormando dormando closed this Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants