Permalink
Commits on Aug 8, 2015
Commits on Apr 25, 2015
  1. stop clang from whining about asserts

    dormando committed Apr 25, 2015
    we now use up to exactly clsid 255, which is the max size of a byte so the
    assertion can't fail.
  2. relax timing glitch in the lru maintainer test

    dormando committed Apr 25, 2015
    This test is requiring that the juggler thread runs at all before the stats
    check happens. I've tried running this under an rPi1 and can't reproduce the
    race, but for some reason solaris amd64 does. This is likely due to the usleep
    not working as expected.
    
    Unfortunately I don't have direct access to a solaris host, so this is the
    best I can do for now. The juggler does eventually wake up so I'm unconcerned.
Commits on Apr 24, 2015
  1. fix major off by one issue

    dormando committed Apr 24, 2015
    none of my machines could repro a crash, but it's definitely wrong :/ Very
    sad.
Commits on Apr 20, 2015
  1. don't overwrite stack during slab_automove

    dormando committed Apr 20, 2015
    every time slab_automove would run it would segfault immediately, since the
    call out into items.c would overwrite its stack.
  2. fix off-by-one with slab management

    dormando committed Apr 20, 2015
    data sticking into the highest slab class was unallocated. Thanks to pyry for
    the repro case:
    
    perl -e 'use Cache::Memcached;$memd = new Cache::Memcached {
    servers=>["127.0.0.1:11212"]};for(20..1000){print "$_\n";$memd->set("fo2$_",
    "a"x1024)};'
    (in a loop)
    with:
    ./memcached -v -m 32 -p 11212 -f 1.012
    
    This serves as a note to turn this into a test.
Commits on Feb 13, 2015
  1. Make LRU crawler work from maint thread.

    dormando committed Feb 13, 2015
    Wasn't sending the condition signal after a refactor :(
    
    Also adds some stats to inspect how much work the LRU crawler is doing, and
    removes some printf noise for the LRU maintainer.
Commits on Feb 7, 2015
  1. basic lock around hash_items counter

    dormando committed Feb 7, 2015
    could/should be an atomic. Previously all write mutations were wrapped with
    cache_lock, but that's not the case anymore. Just enforce consistency around
    the hash_items counter, which is used for hash table expansion.
  2. fix crawler/maintainer threads starting with -d

    dormando committed Feb 7, 2015
    the fork is racey and the lru crawler or maintainer threads end up not
    starting with daemonization. So we start them post-fork now.
    
    Thanks pyry for the report!
Commits on Jan 10, 2015
  1. spinlocks never seem to help in benchmarks

    dormando committed Jan 10, 2015
    If a thread is allowed to go to sleep, it can be woken up early as soon as the
    lock is freed. If we spinlock, the scheduler can't help us and threads will
    randomly run out their timeslice until the thread actually holding the lock
    finishes its work.
    
    In my benchmarks killing the spinlock only makes things better.
  2. small crawler refactor

    dormando committed Jan 10, 2015
    Separate the start function from what was string parsing and allow passing in
    the 'remaining' value as an argument.
    
    Also adds a (non-configurable yet) settings for how many crawls to run per
    sleep, to raise the default aggressiveness of the crawler.
  3. update some comments

    dormando committed Jan 10, 2015
    started to drift from reality over the patch series.
Commits on Jan 9, 2015
  1. fix refhang test.

    dormando committed Jan 9, 2015
    The new code is a lot more efficient as unblocking LRU's as it's able to
    unlink refcounted items. However it's less aggressive in these cases. You'll
    get one OOM per stuck item and then it'll be gone in most cases.
    
    Removed the bottom half of the test since it's too flaky, and the above case
    now looks for both OOM's and STORED's plus relevant counters.
  2. add `-o expirezero_does_not_evict` feature

    dormando committed Jan 9, 2015
    When enabled, items with an expiration time of 0 are placed into a separate
    LRU and are not subject to evictions. This allows a mixed-mode instance where
    you can have a stronger "guarantee" (not a real guarantee) that items aren't
    removed from the cache due to low memory.
    
    This is a dangerous option, as mixing unevictable items has obvious
    repercussions.
  3. make HOT/WARM ratios starttime tunable.

    dormando committed Jan 9, 2015
    runtime tunable is difficult and may require either atomics, or adding an
    extra items.c array. Adjusting the value would roll through and lock each
    LRU before changing the value.
Commits on Jan 8, 2015
  1. basic LRU maintainer tests.

    dormando committed Jan 8, 2015
    this did actually discover the bug in the previous commit..
  2. fix bitshifting transposition

    dormando committed Jan 8, 2015
    Fuck me this is embarrassing. I got it right once, then flipped them
    everywhere else. This is why you use defines for everything. :(
  3. cap aggressiveness of LRU maintainer

    dormando committed Jan 8, 2015
    We can revisit, but the number of use cases with typical set loads above 1m
    items/sec are unknown to me.
  4. another lock fix for slab mover

    dormando committed Jan 8, 2015
    wasn't holding LRU locks while unlinking an item. options were either never
    hold slabs lock underneath the LRU locks, which is doable but annoying... or
    drop the slabs lock for the unlink step. It's not very clear but I think it's
    safe.
Commits on Jan 7, 2015
  1. compat mode.

    dormando committed Jan 7, 2015
    Enabling the new LRU routine requires starting with `-o lru_maintainer`.
    
    This makes almost all of the tests pass, except refhang.t. Will need new tests
    for the LRU maintainer.
    
    So far as I can tell it's still handling the refhang scenario, but in a more
    natural way. Instead of flipping the items back to the top of the list, it's
    unlinking them from the hash table and LRU. This completely removes them from
    the problem, but it doesn't retry as many times to get them out of the way.
    
    A system with many stuck items next to each other could do a handful of OOM's
    before clearing the backlog, but it won't keep running into them. The test
    appears flaky even in 1.4.22; running with -vv causes it to fail in a funny
    way.
  2. make slab mover lock safe again.

    dormando committed Jan 7, 2015
    Given mutex_locks act as memory barriers this should work.
    
    This does not yet fix being able to eject hot items from the fetch path.
  3. LRU maintainer thread now fires LRU crawler

    dormando committed Jan 7, 2015
    ... if available. Very simple starter heuristic for how often to run the
    crawler.
    
    At this point, this patch series should have a significant impact on hit
    ratio.
Commits on Jan 5, 2015
  1. simple fix for LRU crawler

    dormando committed Jan 5, 2015
    ends up parallel crawling the three sub-LRU's, but that's fine.
  2. fix a few bugs and add more stats

    dormando committed Jan 5, 2015
    wasn't passing total_chunks into the bg thread anymore, which causes all items
    to flow to cold.
    
    also re-added ability to see hot/warm/cold counts. NOEXP is missing until
    that's implemented.
  3. fix itemstats to be combination of sub LRUs

    dormando committed Jan 5, 2015
    easier to reason, more tests pass.
  4. direct reclaim mode for evictions

    dormando committed Jan 5, 2015
    Only way to do eviction case fast enough is to inline it, sadly.
    This finally deletes the old item_alloc code now that I'm not intending on
    reusing it.
    
    Also removes the condition wakeup for the background thread. Instead runs on a
    timer, and meters its aggressiveness by how much shuffling is going on.
    
    Also fixes a segfault in lru_pull_tail(), was unlinking `it` instead of
    `search`.
  5. reorg juggle routine, replace prints with stats

    dormando committed Jan 5, 2015
    code is clearer, and able to react a bit faster to required evictions.
Commits on Jan 4, 2015
  1. first pass at LRU maintainer thread

    dormando committed Jan 4, 2015
    The basics work, but tests still do not pass.
    
    A background thread wakes up once per second, or when signaled. It is signaled
    if a slab class gets an allocation request and has fewer than N chunks free.
    
    The background thread shuffles LRU's: HOT, WARM, COLD. HOT is where new items
    exist. HOT and WARM flow into COLD. Active items in COLD flow back to WARM.
    Evictions are pulled from COLD.
    
    item_update's no longer do anything (and need to be fixed to tick it->time).
    Items are reshuffled within or around LRU's as they reach the bottom.
    
    Ratios of HOT/WARM memory are hardcoded, as are the low/high watermarks.
    Thread is not fast enough right now, sets cannot block on it.
Commits on Jan 3, 2015
  1. Beginning work for LRU rework

    dormando committed Jan 3, 2015
    Primarily splitting cache_lock into a lock-per LRU, and making the
    it->slab_clsid lookup indirect. cache_lock is now more or less gone.
    
    Stats are still wrong. they need to internally summarize over each
    sub-class.
Commits on Jan 2, 2015
  1. small fix for flush_all test

    dormando committed Jan 2, 2015
  2. leave comment about stats cachedump locks

    dormando committed Jan 2, 2015
    It's safe based on a technicality, which may not stay true for long.
    
    Same was true for stats sizes.
  3. flush_all was not thread safe.

    dormando committed Jan 2, 2015
    Unfortunately if you disable CAS, all items set in the same second as a
    flush_all will immediately expire. This is the old (2006ish) behavior.
    
    However, if CAS is enabled (as is the default), it will still be more or less
    exact.
    
    The locking issue is that if the LRU lock is held, you may not be able to
    modify an item if the item lock is also held. This means that some items may
    not be flushed if locking is done correctly.
    
    In the current code, it could lead to corruption as an item could be locked
    and in use while the expunging is happening.
Commits on Jan 1, 2015
  1. cache_lock refactoring

    dormando committed Dec 29, 2014
    item_lock() now protects accesses to item structures. cache_lock is just for
    LRU and LRU stats. This patch removes cache_lock from a number of places it's
    no longer needed.
    
    Some pre-existing bugs became obvious: flush_all, cachedump, and slab
    reassignment's do_item_get short-circuit all need repairs.