Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
If all hash values of five tail items are zero on the specified slab
Details of the Problem
Version: 1.4.19 to 1.4.25
Steps to Reproduce:
This is pretty fun. May I ask if you actually ran into this in production, or if it was something you found in an audit?
Not sure if this patch is enough. Since cur_hv can be 0 in the same way as the other items: This could reintroduce a corruption issue if the item pulls itself from the tail (in the case of do_item_replace and friends).
Or am I missing something?
Thank you for your reply.
Slab OOM suddenly happend in our production after runnning memcached for 1 month. Then I investigated it with gdb.
This patch itself doesn't make a corruption because
Before that, I'd like to confirm the reason of
First, seems like your fix won't... fix it:
The line in question:
The item_trylock() is using the same value compared against cur_hv, except it lops off most of the bits since the trylock table is pretty small. It's much more likely you'd have 5 items in the tail (and then a current item) that evaluate to the same lock entry... which would make it unable to evict. Maybe that doesn't happen as often or something.
Second, the introduction of that line wasn't in f7bf26c, that was a re-introduction after I'd accidentally dropped it. It actually got added in: fd39b19 which was part of a fix started in 59bd02c with a test on cb120c0
The full detail of that bug are here: https://code.google.com/archive/p/memcached/issues/260 and here: #67 (in the comments. ignore the patch).
It's pretty complicated, but refreshing from this: It looks like there are (or were?) some cases where an item's lock is released but references are still held, and a call to item_alloc() happens. Which means the lock test will succeed but the item being currently worked on could get pulled from the tail, "evicted", and handed back. which then causes corruption as the original item gets freed.
Way before that it looks like the cur_hv != hv was added as an optimization... but I feel stupid now, because fd39b19 just looks like it makes the logic more clear to read, but the lines are identical? Or am I unable to parse code today?
Thanks for the information about
Let me clarify the cases which leads to memory corruption and eviction stuck.
without this patch:
with this patch
I believe that's correct, but in all of my extensive notes on the bug I never expressly listed the code paths which can do that.
I've piled in enough small fixes into the 'next' tree for a release, so I'm going to try to wrap up the feature I was working on to go along with them. Please feel free to keep inspecting this problem and I'll keep checking/responding, but I won't be directly digging into it again for a little bit.
I'm still on the way to inspecting the problem, found that the commit fd39b19 is the essential to fix the 'issue 260'. These lines are not identical and the key for avoiding
'Issue 260' is caused by the race of
To see the race, just incr after
diff --git a/memcached.h b/memcached.h index 8a51352..4401ca6 100644 --- a/memcached.h +++ b/memcached.h @@ -76,7 +76,7 @@ /** How long an object can reasonably be assumed to be locked before harvesting it on a low memory condition. */ -#define TAIL_REPAIR_TIME (3 * 3600) +#define TAIL_REPAIR_TIME 3 /* warning: don't use these macros with a function, as it evals its arg twice */ #define ITEM_get_cas(i) (((i)->it_flags & ITEM_CAS) ? \
printf "set abc 0 0 1\r\n1\r\n" | nc localhost 11211 # wait TAIL_REPAIR_TIME sleep 5 printf "incr abc 2\r\n" | nc localhost 11211 echo "stats items" | nc localhost 11211 echo "stats cachedump 1 5" | nc localhost 11211
I think my patch will not cause the race because it always try to lock before pulling itself in
After inspecting the codes and past fixes around item refcount, I've come to think that it's the best fix for the problem to just remove
if ((hold_lock = item_trylock(hv)) == NULL)
If there were some cases where an item's lock is released but references are still held, and a call to
Sorry for the delay (1.4.26 went out!).
Just re-read things again: The cur_hv is only ever passed in from do_store_item() with an append/prepend command, or in do_add_delta(). In all other cases it's zero, as thread.c's item_alloc() wrapper passes in zero there.
The problem I was probably trying to avoid was deadlocking by avoiding having the thread owning a lock, lock it again... but nothing actually prevents that from happening already. This would be super dangerous if something holding an item lock called another item_lock(), as it'll just deadlock. The codebase isn't very clear about this being a problem.
Thankfully since it's just a trylock() there, it'll return EBUSY.
So, I think that check can just be removed. It could be fixed by changing it to a pointer that's passed in, then:
I still think given the original bug report.. if 5 items with the same item_lock() value (which shaves bytes off of the full hv), you'd occasionally hit this. Holding an item lock while calling item_alloc() only seems to happen in append/prepend and do_add_delta as I noted above though.
So we can start by removing the check... feel free to push a new change into this PR (replace the old one please), or I'll get to it at some point.
You'd still be stuck because only item_alloc() can call lru_pull_tail() with do_evict == true. The background thread can't evict items... but it will expire them, if they're sitting in the bottom and expired.
Please do not care. I like live logging feature, awsome work!
Yes, the range of masked hv used in
OK, I'll soon re-push the commit.
If all hash values of five tail items are zero on the specified slab class, expire check is unintentionally skipped and items stay without being evicted. Consequently, new item allocation consume memory space every time an item is set, that leads to slab OOM errors.
thanks! this won't quite make it into 1.4.27, but it'll go into the next one. The release after .27 will require burn-in testing, so it's a good fit for that one.
Tests would end up being flaky; anytime the hash value changes/etc. I'll put it under the burn-in test to be absolutely sure there wasn't something I forgot.