Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

moving server stuff down into its own directory

git-svn-id: http://code.sixapart.com/svn/memcached/trunk/server@257 b0b603af-a30f-0410-a34e-baf09ae79d0b
  • Loading branch information...
commit 1438d16b3a7ee4a327d83dffec2d55a6f6eed2c9 0 parents
bradfitz authored
2  AUTHORS
@@ -0,0 +1,2 @@
+Anatoly Vorobey <mellon@pobox.com>
+Brad Fitzpatrick <brad@danga.com>
37 BUILD
@@ -0,0 +1,37 @@
+Ideally, you want to make a static binary, otherwise the dynamic
+linker pollutes your address space with shared libs right in the
+middle. (NOTE: actually, this shouldn't matter so much anymore, now
+that we only allocate huge, fixed-size slabs)
+
+Make sure your libevent has epoll (Linux) or kqueue (BSD) support.
+Using poll or select only is slow, and works for testing, but
+shouldn't be used for high-traffic memcache installations.
+
+To build libevent with epoll on Linux, you need two things. First,
+you need /usr/include/sys/epoll.h . To get it, you can install the
+userspace epoll library, epoll-lib. The link to the latest version
+is buried inside
+http://www.xmailserver.org/linux-patches/nio-improve.html ; currently
+it's http://www.xmailserver.org/linux-patches/epoll-lib-0.9.tar.gz .
+If you're having any trouble building/installing it, you can just copy
+epoll.h from that tarball to /usr/include/sys as that's the only thing
+from there that libevent really needs.
+
+Secondly, you need to declare syscall numbers of epoll syscalls, so
+libevent can use them. Put these declarations somewhere
+inside <sys/epoll.h>:
+
+#define __NR_epoll_create 254
+#define __NR_epoll_ctl 255
+#define __NR_epoll_wait 256
+
+After this you should be able to build libevent with epoll support.
+Once you build/install libevent, you don't need <sys/epoll.h> to
+compile memcache or link it against libevent. Don't forget that for epoll
+support to actually work at runtime you need to use a kernel with epoll
+support patch applied, as explained in the README file.
+
+BSD users are luckier, and will get kqueue support by default.
+
+
+
48 CONTRIBUTORS
@@ -0,0 +1,48 @@
+Brad Fitzpatrick <brad@danga.com>
+ -- design/protocol
+ -- Perl client
+ -- prototype Perl server
+ -- memory allocator design
+ -- small enhancements/changes to C server
+ -- website
+
+Anatoly Vorobey <mellon@pobox.com>
+ -- C server
+ -- memory allocator design
+ -- revised setuid code
+
+Evan Martin <martine@danga.com>
+ -- automake/autoconf support
+ -- Python client
+ -- portability work to build on OS X
+
+Ryan <hotrodder@rocketmail.com>
+ -- PHP client
+
+Jamie McCarthy <jamie@mccarthy.vg>
+ -- Perl client fixes: Makefile.PL, stats, doc updates
+
+Lisa Marie Seelye <lisa@gentoo.org>
+ -- packaging for Gentoo Linux
+ -- initial setuid code
+
+Sean Chittenden <seanc@FreeBSD.org>
+ -- packaging for FreeBSD
+
+Stuart Herbert <stuart@gentoo.org>
+ -- fix for: memcached's php client can run in an infinite loop
+ http://bugs.gentoo.org/show_bug.cgi?id=25385
+
+Brion Vibber <brion@pobox.com>
+ -- debugging abstraction in PHP client
+ -- debugging the failure of daemon mode on FreeBSD
+
+Brad Whitaker <whitaker@danga.com>
+ -- compression support for the Perl API
+
+Richard Russo <russor@msoe.edu>
+ -- Java API
+
+Ryan T. Dean <rtdean@cytherianage.net>
+ -- Second PHP client with correct parsing (based on Perl client)
+ -- autoconf fixes for mallinfo.arena on BSD (don't just check malloc.h)
30 COPYING
@@ -0,0 +1,30 @@
+Copyright (c) 2003, Danga Interactive, Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+ * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+
+ * Neither the name of the Danga Interactive nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
197 ChangeLog
@@ -0,0 +1,197 @@
+2006-03-04
+ * avva: bucket/generation patch (old, but Brad's just finally
+ committing it)
+
+2006-01-01
+ * Brad Fitzpatrick <brad@danga.com>: allocate 1 slab per class
+ on start-up, to avoid confusing users with out-of-memory errors
+ later. this is 18 MB of allocation on start, unless max memory
+ allowed with -m is lower, in which case only the smaller slab
+ classes are allocated.
+
+2005-08-09
+ * Elizabeth Mattijsen <liz@dijkmat.nl>: needed a way to flush all
+ memcached backend servers, but not at exactly the same time (to
+ reduce load peaks), I've added some simple functionality to the
+ memcached protocol in the "flush_all" command that allows you to
+ specify a time at which the flush will actually occur (instead of
+ always at the moment the "flush_all" command is received).
+
+2005-05-25
+ * patch from Peter van Dijk <peter@nextgear.nl> to make
+ stderr unbuffered, for running under daemontools
+
+2005-04-04
+ * patch from Don MacAskill <don@smugmug.com> 'flush_all' doesn't
+ seem to work properly. Basically, if you try to add a key which
+ is present, but expired, the store fails but the old key is no
+ longer expired.
+
+ * release 1.1.12
+
+2005-01-14
+ * Date: Thu, 18 Nov 2004 15:25:59 -0600
+ From: David Phillips <electrum@gmail.com>
+ Here is a patch to configure.ac and Makefile.am to put the man page in
+ the correct location. Trying to install the man page from a
+ subdirectory results in the subdirectory being used in the install
+ path (it tries to install to doc/memcached.1). This is the correct
+ thing to do:
+
+ - create a Makefile.am in the doc directory that installs the man page
+ with man_MANS
+ - modify Makefile.am in the base directory to reference the doc
+ directory using SUBDIRS
+ - modify the AC_CONFIG_FILES macro in configure.ac to output the
+ Makefile in doc
+
+
+2005-01-14
+ * pidfile saving support from Lisa Seelye <lisa@gentoo.org>, sent
+ Jan 13, 2005
+
+2005-01-14
+ * don't delete libevent events that haven't been added (the deltimer)
+ patch from Ted Schundler <tschundler@gmail.com>
+
+2004-12-10
+ * document -M and -r in manpage (Doug Porter <dsp@dsp.name>)
+
+2004-07-22
+ * fix buffer overflow in items.c with 250 byte keys along with
+ other info on the same line going into a 256 byte char[].
+ thanks to Andrei Nigmatulin <anight@monamour.ru>
+
+2004-06-15
+ * immediate deletes weren't being unlinked a few seconds,
+ preventing "add" commands to the same key in that time period.
+ thanks to Michael Alan Dorman <mdorman@debian.org> for the
+ bug report and demo script.
+
+2004-04-30
+ * released 1.1.11
+
+2004-04-24
+ * Avva: Add a new command line option: -r , to maximize core file
+ limit.
+
+2004-03-31
+ * Avva: Use getrlimit and setrlimit to set limits for number of
+ simultaneously open file descriptors. Get the current limits and
+ try to raise them if they're not enough for the specified (or the
+ default) setting of max connections.
+
+2004-02-24
+ * Adds a '-M' flag to turn off tossing items from the cache.
+ (Jason Titus <jtitus@postini.com>)
+
+2004-02-19 (Evan)
+ * Install manpage on "make install", etc.
+
+2003-12-30 (Brad)
+ * remove static build stuff. interferes with PAM setuid stuff
+ and was only included as a possible fix with the old memory
+ allocator. really shouldn't make a difference.
+ * add Jay Bonci's Debian scripts and manpage
+ * release version 1.1.10
+
+2003-12-01 (Avva)
+ * New command: flush_all, causes all existing items to
+ be invalidated immediately (without deleting them from
+ memory, merely causing memcached to no longer return them).
+2003-10-23
+ * Shift init code around to fix daemon mode on FreeBSD,
+ * and drop root only after creating the server socket (to
+ * allow the use of privileged ports)
+ * version 1.1.10pre
+
+2003-10-09
+ * BSD compile fixes from Ryan T. Dean
+ * version 1.1.9
+
+2003-09-29
+ * ignore SIGPIPE at start instead of crashing in rare cases it
+ comes up. no other code had to be modified, since everything
+ else is already dead-connection-aware. (avva)
+
+2003-09-09 (Avva, Lisa Marie Seelye <lisa@gentoo.org>)
+ * setuid support
+
+2003-09-05 (Avva)
+ * accept all new connections in the same event (so we work with ET epoll)
+ * mark all items as clsid=0 after slab page reassignment to please future
+ asserts (on the road to making slab page reassignment work fully)
+
+2003-08-12 (Brad Fitzpatrick)
+ * use TCP_CORK on Linux or TCP_PUSH on BSD
+ * only use TCP_NODELAY when we don't have alternatives
+
+2003-08-10
+ * disable Nagel's Algorithm (TCP_NODELAY) for better performance (avva)
+
+2003-08-10
+ * support multiple levels of verbosity (-vv)
+
+2003-08-10 (Evan Martin)
+ * Makefile.am: debug, optimization, and static flags are controlled
+ by the configure script.
+ * configure.ac:
+ - allow specifying libevent directory with --with-libevent=DIR
+ - check for malloc.h (unavailable on BSDs)
+ - check for socklen_t (unavailable on OSX)
+ * assoc.c, items.c, slabs.c: Remove some unused headers.
+ * memcached.c: allow for nonexistence of malloc.h; #define a POSIX
+ macro to import mlockall flags.
+
+2003-07-29
+ * version 1.1.7
+ * big bug fix: item exptime 0 meant expire immediately, not never
+ * version 1.1.8
+
+2003-07-22
+ * make 'delete' take second arg, of time to refuse new add/replace
+ * set/add/replace/delete can all take abs or delta time (delta can't
+ be larger than a month)
+
+2003-07-21
+ * added doc/protocol.txt
+
+2003-07-01
+ * report CPU usage in stats
+
+2003-06-30
+ * version 1.1.6
+ * fix a number of obscure bugs
+ * more stats reporting
+
+2003-06-10
+ * removing use of Judy; use a hash. (judy caused memory fragmentation)
+ * shrink some structures
+ * security improvements
+ * version 1.1.0
+
+2003-06-18
+ * changing maxsize back to an unsigned int
+
+2003-06-16
+ * adding PHP support
+ * added CONTRIBUTORS file
+ * version 1.0.4
+
+2003-06-15
+ * forgot to distribute website/api (still learning auto*)
+ * version 1.0.3
+
+2003-06-15
+ * update to version 1.0.2
+ * autoconf/automake fixes for older versions
+ * make stats report version number
+ * change license from GPL to BSD
+
+Fri, 13 Jun 2003 10:05:51 -0700 Evan Martin <martine@danga.com>
+
+ * configure.ac, autogen.sh, Makefile.am: Use autotools.
+ * items.c, memcached.c: #include <time.h> for time(),
+ printf time_t as %lu (is this correct?),
+ minor warnings fixes.
+
30 LICENSE
@@ -0,0 +1,30 @@
+Copyright (c) 2003, Danga Interactive, Inc.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+ * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+
+ * Neither the name of the Danga Interactive nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
10 Makefile.am
@@ -0,0 +1,10 @@
+bin_PROGRAMS = memcached
+
+memcached_SOURCES = memcached.c slabs.c items.c memcached.h assoc.c
+
+SUBDIRS = doc
+DIST_DIRS = scripts
+EXTRA_DIST = doc scripts TODO
+
+AM_CFLAGS=-DNDEBUG
+
1  NEWS
@@ -0,0 +1 @@
+http://www.danga.com/memcached/news.bml
22 README
@@ -0,0 +1,22 @@
+Dependencies:
+
+ -- libevent, http://www.monkey.org/~provos/libevent/ (libevent-dev)
+
+If using Linux, you need a kernel with epoll. Sure, libevent will
+work with normal select, but it sucks.
+
+epoll isn't in Linux 2.4 yet, but there's a backport at:
+
+ http://www.xmailserver.org/linux-patches/nio-improve.html
+
+You want the epoll-lt patch (level-triggered).
+
+Also, be warned that the -k (mlockall) option to memcached might be
+dangerous when using a large cache. Just make sure the memcached machines
+don't swap. memcached does non-blocking network I/O, but not disk. (it
+should never go to disk, or you've lost the whole point of it)
+
+The memcached website is at:
+
+ http://www.danga.com/memcached/
+
8 TODO
@@ -0,0 +1,8 @@
+* slab class reassignment still buggy and can crash. once that's
+ stable, server should re-assign pages every 60 seconds or so
+ to keep all classes roughly equal. [Update: fixed now?, but
+ not heavily tested. Future: make slab classes, with per-class
+ cleaners functions.]
+
+* calendar queue for early expirations of items, so they don't push
+ out other objects with infinite expirations.
186 assoc.c
@@ -0,0 +1,186 @@
+/* -*- Mode: C; tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*- */
+/*
+ * Hash table
+ *
+ * The hash function used here is by Bob Jenkins, 1996:
+ * <http://burtleburtle.net/bob/hash/doobs.html>
+ * "By Bob Jenkins, 1996. bob_jenkins@burtleburtle.net.
+ * You may use this code any way you wish, private, educational,
+ * or commercial. It's free."
+ *
+ * The rest of the file is licensed under the BSD license. See LICENSE.
+ *
+ * $Id$
+ */
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/socket.h>
+#include <sys/signal.h>
+#include <sys/resource.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <netinet/in.h>
+#include <errno.h>
+#include <event.h>
+#include <assert.h>
+
+#include "memcached.h"
+
+typedef unsigned long int ub4; /* unsigned 4-byte quantities */
+typedef unsigned char ub1; /* unsigned 1-byte quantities */
+
+/* hard-code one million buckets, for now (2**20 == 4MB hash) */
+#define HASHPOWER 20
+
+#define hashsize(n) ((ub4)1<<(n))
+#define hashmask(n) (hashsize(n)-1)
+
+#define mix(a,b,c) \
+{ \
+ a -= b; a -= c; a ^= (c>>13); \
+ b -= c; b -= a; b ^= (a<<8); \
+ c -= a; c -= b; c ^= (b>>13); \
+ a -= b; a -= c; a ^= (c>>12); \
+ b -= c; b -= a; b ^= (a<<16); \
+ c -= a; c -= b; c ^= (b>>5); \
+ a -= b; a -= c; a ^= (c>>3); \
+ b -= c; b -= a; b ^= (a<<10); \
+ c -= a; c -= b; c ^= (b>>15); \
+}
+
+/*
+--------------------------------------------------------------------
+hash() -- hash a variable-length key into a 32-bit value
+ k : the key (the unaligned variable-length array of bytes)
+ len : the length of the key, counting by bytes
+ initval : can be any 4-byte value
+Returns a 32-bit value. Every bit of the key affects every bit of
+the return value. Every 1-bit and 2-bit delta achieves avalanche.
+About 6*len+35 instructions.
+
+The best hash table sizes are powers of 2. There is no need to do
+mod a prime (mod is sooo slow!). If you need less than 32 bits,
+use a bitmask. For example, if you need only 10 bits, do
+ h = (h & hashmask(10));
+In which case, the hash table should have hashsize(10) elements.
+
+If you are hashing n strings (ub1 **)k, do it like this:
+ for (i=0, h=0; i<n; ++i) h = hash( k[i], len[i], h);
+
+By Bob Jenkins, 1996. bob_jenkins@burtleburtle.net. You may use this
+code any way you wish, private, educational, or commercial. It's free.
+
+See http://burtleburtle.net/bob/hash/evahash.html
+Use for hash table lookup, or anything where one collision in 2^^32 is
+acceptable. Do NOT use for cryptographic purposes.
+--------------------------------------------------------------------
+*/
+
+ub4 hash( k, length, initval)
+ register ub1 *k; /* the key */
+ register ub4 length; /* the length of the key */
+ register ub4 initval; /* the previous hash, or an arbitrary value */
+{
+ register ub4 a,b,c,len;
+
+ /* Set up the internal state */
+ len = length;
+ a = b = 0x9e3779b9; /* the golden ratio; an arbitrary value */
+ c = initval; /* the previous hash value */
+
+ /*---------------------------------------- handle most of the key */
+ while (len >= 12)
+ {
+ a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16) +((ub4)k[3]<<24));
+ b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16) +((ub4)k[7]<<24));
+ c += (k[8] +((ub4)k[9]<<8) +((ub4)k[10]<<16)+((ub4)k[11]<<24));
+ mix(a,b,c);
+ k += 12; len -= 12;
+ }
+
+ /*------------------------------------- handle the last 11 bytes */
+ c += length;
+ switch(len) /* all the case statements fall through */
+ {
+ case 11: c+=((ub4)k[10]<<24);
+ case 10: c+=((ub4)k[9]<<16);
+ case 9 : c+=((ub4)k[8]<<8);
+ /* the first byte of c is reserved for the length */
+ case 8 : b+=((ub4)k[7]<<24);
+ case 7 : b+=((ub4)k[6]<<16);
+ case 6 : b+=((ub4)k[5]<<8);
+ case 5 : b+=k[4];
+ case 4 : a+=((ub4)k[3]<<24);
+ case 3 : a+=((ub4)k[2]<<16);
+ case 2 : a+=((ub4)k[1]<<8);
+ case 1 : a+=k[0];
+ /* case 0: nothing left to add */
+ }
+ mix(a,b,c);
+ /*-------------------------------------------- report the result */
+ return c;
+}
+
+static item** hashtable = 0;
+
+void assoc_init(void) {
+ unsigned int hash_size = hashsize(HASHPOWER) * sizeof(void*);
+ hashtable = malloc(hash_size);
+ if (! hashtable) {
+ fprintf(stderr, "Failed to init hashtable.\n");
+ exit(1);
+ }
+ memset(hashtable, 0, hash_size);
+}
+
+item *assoc_find(char *key) {
+ ub4 hv = hash(key, strlen(key), 0) & hashmask(HASHPOWER);
+ item *it = hashtable[hv];
+
+ while (it) {
+ if (strcmp(key, ITEM_key(it)) == 0)
+ return it;
+ it = it->h_next;
+ }
+ return 0;
+}
+
+/* returns the address of the item pointer before the key. if *item == 0,
+ the item wasn't found */
+
+static item** _hashitem_before (char *key) {
+ ub4 hv = hash(key, strlen(key), 0) & hashmask(HASHPOWER);
+ item **pos = &hashtable[hv];
+
+ while (*pos && strcmp(key, ITEM_key(*pos))) {
+ pos = &(*pos)->h_next;
+ }
+ return pos;
+}
+
+/* Note: this isn't an assoc_update. The key must not already exist to call this */
+int assoc_insert(char *key, item *it) {
+ ub4 hv = hash(key, strlen(key), 0) & hashmask(HASHPOWER);
+ it->h_next = hashtable[hv];
+ hashtable[hv] = it;
+ return 1;
+}
+
+void assoc_delete(char *key) {
+ item **before = _hashitem_before(key);
+ if (*before) {
+ item *nxt = (*before)->h_next;
+ (*before)->h_next = 0; /* probably pointless, but whatever. */
+ *before = nxt;
+ return;
+ }
+ /* Note: we never actually get here. the callers don't delete things
+ they can't find. */
+ assert(*before != 0);
+}
+
22 autogen.sh
@@ -0,0 +1,22 @@
+#!/bin/sh
+#
+# This is hacky, because there are so many damn versions
+# of autoconf/automake. It works with Debian woody, at least.
+#
+
+echo "aclocal..."
+ACLOCAL=${ACLOCAL:-aclocal-1.7}
+$ACLOCAL || aclocal-1.5 || aclocal || exit 1
+
+echo "autoheader..."
+AUTOHEADER=${AUTOHEADER:-autoheader}
+$AUTOHEADER || exit 1
+
+echo "automake..."
+AUTOMAKE=${AUTOMAKE:-automake-1.7}
+$AUTOMAKE --foreign --add-missing || automake --gnu --add-missing || exit 1
+
+echo "autoconf..."
+AUTOCONF=${AUTOCONF:-autoconf}
+$AUTOCONF || exit 1
+
56 configure.ac
@@ -0,0 +1,56 @@
+AC_PREREQ(2.52)
+AC_INIT(memcached, 1.1.13-cvs, brad@danga.com)
+AC_CANONICAL_SYSTEM
+AC_CONFIG_SRCDIR(memcached.c)
+AM_INIT_AUTOMAKE(AC_PACKAGE_NAME, AC_PACKAGE_VERSION)
+AM_CONFIG_HEADER(config.h)
+
+AC_PROG_CC
+AC_PROG_INSTALL
+
+AC_ARG_WITH(libevent,
+ AC_HELP_STRING([--with-libevent=DIRECTORY],[base directory for libevent]))
+if test "$with_libevent" != "no"; then
+ CFLAGS="$CFLAGS -I$with_libevent/include"
+ LDFLAGS="$LDFLAGS -L$with_libevent/lib"
+fi
+
+LIBEVENT_URL=http://www.monkey.org/~provos/libevent/
+AC_CHECK_LIB(event, event_set, ,
+ [AC_MSG_ERROR(libevent is required. You can get it from $LIBEVENT_URL)])
+
+AC_CHECK_HEADER(malloc.h, AC_DEFINE(HAVE_MALLOC_H,,[do we have malloc.h?]))
+AC_CHECK_MEMBER([struct mallinfo.arena], [
+ AC_DEFINE(HAVE_STRUCT_MALLINFO,,[do we have stuct mallinfo?])
+ ], ,[
+# include <malloc.h>
+ ]
+)
+
+dnl From licq: Copyright (c) 2000 Dirk Mueller
+dnl Check if the type socklen_t is defined anywhere
+AC_DEFUN(AC_C_SOCKLEN_T,
+[AC_CACHE_CHECK(for socklen_t, ac_cv_c_socklen_t,
+[
+ AC_TRY_COMPILE([
+ #include <sys/types.h>
+ #include <sys/socket.h>
+ ],[
+ socklen_t foo;
+ ],[
+ ac_cv_c_socklen_t=yes
+ ],[
+ ac_cv_c_socklen_t=no
+ ])
+])
+if test $ac_cv_c_socklen_t = no; then
+ AC_DEFINE(socklen_t, int, [define to int if socklen_t not available])
+fi
+])
+
+AC_C_SOCKLEN_T
+
+AC_CHECK_FUNCS(mlockall)
+
+AC_CONFIG_FILES(Makefile doc/Makefile)
+AC_OUTPUT
3  doc/Makefile.am
@@ -0,0 +1,3 @@
+man_MANS = memcached.1
+
+EXTRA_DIST = *.txt
28 doc/OSX.txt
@@ -0,0 +1,28 @@
+memcached is slow on OSX because:
+
+ -- OSX's kqueue is broken
+ -- OSX's TCP_NOPUSH stuff is different/broken
+
+So there are reports that this works and make memcached fast on OS X:
+
+ Two simple changes:
+
+ First, in memcached.c (in the memcached source directory) add
+ (anywhere above line 105, which reads #ifdef TCP_NOPUSH) the line:
+
+ #undef TCP_NOPUSH
+
+ I just added it on the line above the #ifdef line.
+
+ Rebuild memcached (just do a make && sudo make install, don.t need
+ to re-run configure if you.ve already done it)
+
+ then, set the environment variable EVENT_NOKQUEUE to 1
+
+ in csh and derivatives: setenv EVENT_NOKQUEUE 1
+
+ in sh and derivatives (like bash): export EVENT_NOKQUEUE=1
+
+ then start memcached, and it should be fast (it certainly made a
+ difference here)
+
86 doc/memcached.1
@@ -0,0 +1,86 @@
+.TH MEMCACHED 1 "April 11, 2005"
+.SH NAME
+memcached \- high-performance memory object caching system
+.SH SYNOPSIS
+.B memcached
+.RI [ options ]
+.br
+.SH DESCRIPTION
+This manual page documents briefly the
+.B memcached
+memory object caching daemon.
+.PP
+.B memcached
+is a flexible memory object caching daemon designed to alleviate database load
+in dynamic web applications by storing objects in memory. It's based on
+libevent to scale to any size needed, and is specifically optimized to avoid
+swapping and always use non-blocking I/O.
+.br
+.SH OPTIONS
+These programs follow the usual GNU command line syntax. A summary of options
+is included below.
+.TP
+.B \-l <ip_addr>
+Listen on <ip_addr>; default to INDRR_ANY. This is an important option to
+consider as there is no other way to secure the installation. Binding to an
+internal or firewalled network interface is suggested.
+.TP
+.B \-d
+Run memcached as a daemon.
+.TP
+.B \-u <username>
+Assume the identity of <username> (only when run as root).
+.TP
+.B \-m <num>
+Use <num> MB memory max to use for object storage; the default is 64 megabytes.
+.TP
+.B \-c <num>
+Use <num> max simultaneous connections; the default is 1024.
+.TP
+.B \-k
+Lock down all paged memory. This is a somewhat dangerous option with large
+caches, so consult the README and memcached homepage for configuration
+suggestions.
+.TP
+.B \-p <num>
+Listen on port <num>, the default is port 11211.
+.TP
+.B \-M
+Disable automatic removal of items from the cache when out of memory.
+Additions will not be possible until adequate space is freed up.
+.TP
+.B \-r
+Raise the core file size limit to the maximum allowable.
+.TP
+.B \-h
+Show the version of memcached and a summary of options.
+.TP
+.B \-v
+Be verbose during the event loop; print out errors and warnings.
+.TP
+.B \-vv
+Be even more verbose; same as \-v but also print client commands and
+responses.
+.TP
+.B \-i
+Print memcached and libevent licenses.
+.TP
+.B \-P <filename>
+Print pidfile to <filename>, only used under -d option.
+.br
+.SH LICENSE
+The memcached daemon is copyright Danga Interactive and is distributed under
+the BSD license. Note that daemon clients are licensed separately.
+.br
+.SH SEE ALSO
+The README file that comes with memcached
+.br
+.B http://www.danga.com/memcached
+.SH AUTHOR
+The memcached daemon was written by Anatoly Vorobey
+.B <mellon@pobox.com>
+and Brad Fitzpatrick
+.B <brad@danga.com>
+and the rest of the crew of Danga Interactive
+.B http://www.danga.com
+.br
83 doc/memory_management.txt
@@ -0,0 +1,83 @@
+Date: Fri, 5 Sep 2003 20:31:03 +0300
+From: Anatoly Vorobey <mellon@pobox.com>
+To: memcached@lists.danga.com
+Subject: Re: Memory Management...
+
+On Fri, Sep 05, 2003 at 12:07:48PM -0400, Kyle R. Burton wrote:
+> prefixing keys with a container identifier). We have just begun to
+> look at the implementation of the memory management sub-system with
+> regards to it's allocation, de-allocation and compaction approaches.
+> Is there any documentation or discussion of how this subsystem
+> operates? (slabs.c?)
+
+There's no documentation yet, and it's worth mentioning that this
+subsystem is the most active area of memcached under development at the
+moment (however, all the changes to it won't modify the way memcached
+presents itself towards clients, they're primarily directed at making
+memcached use memory more efficiently).
+
+Here's a quick recap of what it does now and what is being worked
+on.
+
+The primary goal of the slabs subsystem in memcached was to eliminate
+memory fragmentation issues totally by using fixed-size memory chunks
+coming from a few predetermined size classes (early versions of
+memcached relied on malloc()'s handling of fragmentation which proved
+woefully inadequate for our purposes). For instance, suppose
+we decide at the outset that the list of possible sizes is: 64 bytes,
+128 bytes, 256 bytes, etc. - doubling all the way up to 1Mb. For each
+size class in this list (each possible size) we maintain a list of free
+chunks of this size. Whenever a request comes for a particular size,
+it is rounded up to the closest size class and a free chunk is taken
+from that size class. In the above example, if you request from the
+slabs subsystem 100 bytes of memory, you'll actually get a chunk 128
+bytes worth, from the 128-bytes size class. If there are no free chunks
+of the needed size at the moment, there are two ways to get one: 1) free
+an existing chunk in the same size class, using LRU queues to free the
+least needed objects; 2) get more memory from the system, which we
+currently always do in _slabs_ of 1Mb each; we malloc() a slab, divide
+it to chunks of the needed size, and use them.
+
+The tradeoff is between memory fragmentation and memory utilisation. In
+the scheme we're now using, we have zero fragmentation, but a relatively
+high percentage of memory is wasted. The most efficient way to reduce
+the waste is to use a list of size classes that closely matches (if
+that's at all possible) common sizes of objects that the clients
+of this particular installation of memcached are likely to store.
+For example, if your installation is going to store hundreds of thousands of objects of the size exactly 120 bytes, you'd be much better
+off changing, in the "naive" list of sizes outlined above, the class
+of 128 bytes to something a bit higher (because the overhead of
+storing an item, while not large, will push those 120-bytes objects over
+128 bytes of storage internally, and will require using 256 bytes for
+each of them in the naive scheme, forcing you to waste almost 50% of
+memory). Such tinkering with the list of size classes is not currently
+possible with memcached, but enabling it is one of the immediate goals.
+
+Ideally, the slabs subsystem would analyze at runtime the common sizes
+of objects that are being requested, and would be able to modify the
+list of sizes dynamically to improve memory utilisation. This is not
+planned for the immediate future, however. What is planned is the
+ability to reassign slabs to different classes. Here's what this means.
+Currently, the total amount of memory allocated for each size class is
+determined by how clients interact with memcached during the initial
+phase of its execution, when it keeps malloc()'ing more slabs and
+dividing them into chunks, until it hits the specified memory limit
+(say, 2Gb, or whatever else was specified). Once it hits the limit, to
+allocate a new chunk it'll always delete an existing chunk of the same
+size (using LRU queues), and will never malloc() or free() any memory
+from/to the system. So if, for example, during those initial few hours
+of memcached's execution your clients mainly wanted to store very small
+items, the bulk of memory allocated will be divided to small-sized
+chunks, and the large size classes will get fewer memory, therefore the
+life-cycle of large objects you'll store in memcached will henceforth
+always be much shorter, with this instance of memcached (their LRU
+queues will be shorter and they'll be pushed out much more often). In
+general, if your system starts producing a different pattern of common
+object sizes, the memcached servers will become less efficient, unless
+you restart them. Slabs reassignment, which is the next feature being
+worked on, will ensure the server's ability to reclaim a slab (1Mb of
+memory) from one size class and put it into another class size, where
+it's needed more.
+
+--
+avva
390 doc/protocol.txt
@@ -0,0 +1,390 @@
+Protocol
+--------
+
+Clients of memcached communicate with server through TCP
+connections. A given running memcached server listens on some
+(configurable) port; clients connect to that port, send commands to
+the server, read responses, and eventually close the connection.
+
+There is no need to send any command to end the session. A client may
+just close the connection at any moment it no longer needs it. Note,
+however, that clients are encouraged to cache their connections rather
+than reopen them every time they need to store or retrieve data. This
+is because memcached is especially designed to work very efficiently
+with a very large number (many hundreds, more than a thousand if
+necessary) of open connections. Caching connections will eliminate the
+overhead associated with establishing a TCP connection (the overhead
+of preparing for a new connection on the server side is insignificant
+compared to this).
+
+There are two kinds of data sent in the memcache protocol: text lines
+and unstructured data. Text lines are used for commands from clients
+and responses from servers. Unstructured data is sent when a client
+wants to store or retrieve data. The server will transmit back
+unstructured data in exactly the same way it received it, as a byte
+stream. The server doesn't care about byte order issues in
+unstructured data and isn't aware of them. There are no limitations on
+characters that may appear in unstructured data; however, the reader
+of such data (either a client or a server) will always know, from a
+preceding text line, the exact length of the data block being
+transmitted.
+
+Text lines are always terminated by \r\n. Unstructured data is _also_
+terminated by \r\n, even though \r, \n or any other 8-bit characters
+may also appear inside the data. Therefore, when a client retrieves
+data from a server, it must use the length of the data block (which it
+will be provided with) to determine where the data block ends, and not
+the fact that \r\n follows the end of the data block, even though it
+does.
+
+Keys
+----
+
+Data stored by memcached is identified with the help of a key. A key
+is a text string which should uniquely identify the data for clients
+that are interested in storing and retrieving it. Currently the
+length limit of a key is set at 250 characters (of course, normally
+clients wouldn't need to use such long keys); the key must not include
+control characters or whitespace.
+
+Commands
+--------
+
+There are three types of commands.
+
+Storage commands (there are three: "set", "add" and "replace") ask the
+server to store some data identified by a key. The client sends a
+command line, and then a data block; after that the client expects one
+line of response, which will indicate success or faulure.
+
+Retrieval commands (there is only one: "get") ask the server to
+retrieve data corresponding to a set of keys (one or more keys in one
+request). The client sends a command line, which includes all the
+requested keys; after that for each item the server finds it sends to
+the client one response line with information about the item, and one
+data block with the item's data; this continues until the server
+finished with the "END" response line.
+
+All other commands don't involve unstructured data. In all of them,
+the client sends one command line, and expects (depending on the
+command) either one line of response, or several lines of response
+ending with "END" on the last line.
+
+A command line always starts with the name of the command, followed by
+parameters (if any) delimited by whitespace. Command names are
+lower-case and are case-sensitive.
+
+Expiration times
+----------------
+
+Some commands involve a client sending some kind of expiration time
+(relative to an item or to an operation requested by the client) to
+the server. In all such cases, the actual value sent may either be
+Unix time (number of seconds since January 1, 1970, as a 32-bit
+value), or a number of seconds starting from current time. In the
+latter case, this number of seconds may not exceed 60*60*24*30 (number
+of seconds in 30 days); if the number sent by a client is larger than
+that, the server will consider it to be real Unix time value rather
+than an offset from current time.
+
+
+Error strings
+-------------
+
+Each command sent by a client may be answered with an error string
+from the server. These error strings come in three types:
+
+- "ERROR\r\n"
+
+ means the client sent a nonexistent command name.
+
+- "CLIENT_ERROR <error>\r\n"
+
+ means some sort of client error in the input line, i.e. the input
+ doesn't conform to the protocol in some way. <error> is a
+ human-readable error string.
+
+- "SERVER_ERROR <error>\r\n"
+
+ means some sort of server error prevents the server from carrying
+ out the command. <error> is a human-readable error string. In cases
+ of severe server errors, which make it impossible to continue
+ serving the client (this shouldn't normally happen), the server will
+ close the connection after sending the error line. This is the only
+ case in which the server closes a connection to a client.
+
+
+In the descriptions of individual commands below, these error lines
+are not again specifically mentioned, but clients must allow for their
+possibility.
+
+
+Storage commands
+----------------
+
+First, the client sends a command line which looks like this:
+
+<command name> <key> <flags> <exptime> <bytes>\r\n
+
+- <command name> is "set", "add" or "replace"
+
+ "set" means "store this data".
+
+ "add" means "store this data, but only if the server *doesn't* already
+ hold data for this key".
+
+ "replace" means "store this data, but only if the server *does*
+ already hold data for this key".
+
+- <key> is the key under which the client asks to store the data
+
+- <flags> is an arbitrary 16-bit unsigned integer (written out in
+ decimal) that the server stores along with the data and sends back
+ when the item is retrieved. Clients may use this as a bit field to
+ store data-specific information; this field is opaque to the server.
+
+- <exptime> is expiration time. If it's 0, the item never expires
+ (although it may be deleted from the cache to make place for other
+ items). If it's non-zero (either Unix time or offset in seconds from
+ current time), it is guaranteed that clients will not be able to
+ retrieve this item after the expiration time arrives (measured by
+ server time).
+
+- <bytes> is the number of bytes in the data block to follow, *not*
+ including the delimiting \r\n. <bytes> may be zero (in which case
+ it's followed by an empty data block).
+
+After this line, the client sends the data block:
+
+<data block>\r\n
+
+- <data block> is a chunk of arbitrary 8-bit data of length <bytes>
+ from the previous line.
+
+After sending the command line and the data blockm the client awaits
+the reply, which may be:
+
+- "STORED\r\n", to indicate success.
+
+- "NOT_STORED\r\n" to indicate the data was not stored, but not
+because of an error. This normally means that either that the
+condition for an "add" or a "replace" command wasn't met, or that the
+item is in a delete queue (see the "delete" command below).
+
+
+Retrieval command:
+------------------
+
+The retrieval command looks like this:
+
+get <key>*\r\n
+
+- <key>* means one or more key strings separated by whitespace.
+
+After this command, the client expects zero or more items, each of
+which is received as a text line followed by a data block. After all
+the items have been transmitted, the server sends the string
+
+"END\r\n"
+
+to indicate the end of response.
+
+Each item sent by the server looks like this:
+
+VALUE <key> <flags> <bytes>\r\n
+<data block>\r\n
+
+- <key> is the key for the item being sent
+
+- <flags> is the flags value set by the storage command
+
+- <bytes> is the length of the data block to follow, *not* including
+ its delimiting \r\n
+
+- <data block> is the data for this item.
+
+If some of the keys appearing in a retrieval request are not sent back
+by the server in the item list this means that the server does not
+hold items with such keys (because they were never stored, or stored
+but deleted to make space for more items, or expired, or explicitly
+deleted by a client).
+
+
+
+Deletion
+--------
+
+The command "delete" allows for explicit deletion of items:
+
+delete <key> <time>\r\n
+
+- <key> is the key of the item the client wishes the server to delete
+
+- <time> is the amount of time in seconds (or Unix time until which)
+ the client wishes the server to refuse "add" and "replace" commands
+ with this key. For this amount of item, the item is put into a
+ delete queue, which means that it won't possible to retrieve it by
+ the "get" command, but "add" and "replace" command with this key
+ will also fail (the "set" command will succeed, however). After the
+ time passes, the item is finally deleted from server memory.
+
+ The parameter <time> is optional, and, if absent, defaults to 0
+ (which means that the item will be deleted immediately and further
+ storage commands with this key will succeed).
+
+The response line to this command can be one of:
+
+- "DELETED\r\n" to indicate success
+
+- "NOT_FOUND\r\n" to indicate that the item with this key was not
+ found.
+
+See the "flush_all" command below for immediate invalidation
+of all existing items.
+
+
+Increment/Decrement
+-------------------
+
+Commands "incr" and "decr" are used to change data for some item
+in-place, incrementing or decrementing it. The data for the item is
+treated as decimal representation of a 32-bit unsigned integer. If the
+current data value does not conform to such a representation, the
+commands behave as if the value were 0. Also, the item must already
+exist for incr/decr to work; these commands won't pretend that a
+non-existent key exists with value 0; instead, they will fail.
+
+The client sends the command line:
+
+incr <key> <value>\r\n
+
+or
+
+decr <key> <value>\r\n
+
+- <key> is the key of the item the client wishes to change
+
+- <value> is the amount by which the client wants to increase/decrease
+the item. It is a decimal representation of a 32-bit unsigned integer.
+
+The response will be one of:
+
+- "NOT_FOUND\r\n" to indicate the item with this value was not found
+
+- <value>\r\n , where <value> is the new value of the item's data,
+ after the increment/decrement operation was carried out.
+
+Note that underflow in the "decr" command is caught: if a client tries
+to decrease the value below 0, the new value will be 0. Overflow in
+the "incr" command is not checked.
+
+
+Statistics
+----------
+
+The command "stats" is used to query the server about statistics it
+maintains and other internal data. It has two forms. Without
+arguments:
+
+stats\r\n
+
+it causes the server to output general-purpose statistics and
+settings, documented below. In the other form it has some arguments:
+
+stats <args>\r\n
+
+Depending on <args>, various internal data is sent by the server. The
+kinds of arguments and the data sent are not documented in this vesion
+of the protocol, and are subject to change for the convenience of
+memcache developers.
+
+
+General-purpose statistics
+--------------------------
+
+Upon receiving the "stats" command without arguments, the server sents
+a number of lines which look like this:
+
+STAT <name> <value>\r\n
+
+The server terminates this list with the line
+
+END\r\n
+
+In each line of statistics, <name> is the name of this statistic, and
+<value> is the data. The following is the list of all names sent in
+response to the "stats" command, together with the type of the value
+sent for this name, and the meaning of the value.
+
+In the type column below, "32u" means a 32-bit unsigned integer, "64u"
+means a 64-bit unsigner integer. '32u:32u' means two 32-but unsigned
+integers separated by a colon.
+
+
+Name Type Meaning
+----------------------------------
+pid 32u Process id of this server process
+uptime 32u Number of seconds this server has been running
+time 32u current UNIX time according to the server
+version string Version string of this server
+rusage_user 32u:32u Accumulated user time for this process
+ (seconds:microseconds)
+rusage_system 32u:32u Accumulated system time for this process
+ (seconds:microseconds)
+curr_items 32u Current number of items stored by the server
+total_items 32u Total number of items stored by this server
+ ever since it started
+bytes 64u Current number of bytes used by this server
+ to store items
+curr_connections 32u Number of open connections
+total_connections 32u Total number of connections opened since
+ the server started running
+connection_structures 32u Number of connection structures allocated
+ by the server
+cmd_get 32u Cumulative number of retrieval requests
+cmd_set 32u Cumulative number of storage requests
+get_hits 32u Number of keys that have been requested and
+ found present
+get_misses 32u Number of items that have been requested
+ and not found
+bytes_read 64u Total number of bytes read by this server
+ from network
+bytes_written 64u Total number of bytes sent by this server to
+ network
+limit_maxbytes 32u Number of bytes this server is allowed to
+ use for storage.
+
+
+
+Other commands
+--------------
+
+"flush_all" is a command with an optional numeric argument. It always
+succeeds, and the server sends "OK\r\n" in response. Its effect is to
+invalidate all existing items immediately (by default) or after the
+expiration specified. After invalidation none of the items will be returned
+in response to a retrieval command (unless it's stored again under the
+same key *after* flush_all has invalidated the items). flush_all doesn't
+actually free all the memory taken up by existing items; that will
+happen gradually as new items are stored. The most precise definition
+of what flush_all does is the following: it causes all items whose
+update time is earlier than the time at which flush_all was set to be
+executed to be ignored for retrieval purposes.
+
+"version" is a command with no arguments:
+
+version\r\n
+
+In response, the server sends
+
+"VERSION <version>\r\n", where <version> is the version string for the
+server.
+
+
+"quit" is a command with no arguments:
+
+quit\r\n
+
+Upon receiving this command, the server closes the
+connection. However, the client may also simply close the connection
+when it no longer needs it, without issuing this command.
299 items.c
@@ -0,0 +1,299 @@
+/* -*- Mode: C; tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*- */
+/* $Id$ */
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/socket.h>
+#include <sys/signal.h>
+#include <sys/resource.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <netinet/in.h>
+#include <errno.h>
+#include <time.h>
+#include <event.h>
+#include <assert.h>
+
+#include "memcached.h"
+
+
+/*
+ * NOTE: we assume here for simplicity that slab ids are <=32. That's true in
+ * the powers-of-2 implementation, but if that changes this should be changed too
+ */
+
+#define LARGEST_ID 32
+static item *heads[LARGEST_ID];
+static item *tails[LARGEST_ID];
+unsigned int sizes[LARGEST_ID];
+
+void item_init(void) {
+ int i;
+ for(i=0; i<LARGEST_ID; i++) {
+ heads[i]=0;
+ tails[i]=0;
+ sizes[i]=0;
+ }
+}
+
+
+item *item_alloc(char *key, int flags, time_t exptime, int nbytes) {
+ int ntotal, len;
+ item *it;
+ unsigned int id;
+
+ len = strlen(key) + 1; if(len % 4) len += 4 - (len % 4);
+ ntotal = sizeof(item) + len + nbytes;
+
+ id = slabs_clsid(ntotal);
+ if (id == 0)
+ return 0;
+
+ it = slabs_alloc(ntotal);
+ if (it == 0) {
+ int tries = 50;
+ item *search;
+
+ /* If requested to not push old items out of cache when memory runs out,
+ * we're out of luck at this point...
+ */
+
+ if (!settings.evict_to_free) return 0;
+
+ /*
+ * try to get one off the right LRU
+ * don't necessariuly unlink the tail because it may be locked: refcount>0
+ * search up from tail an item with refcount==0 and unlink it; give up after 50
+ * tries
+ */
+
+ if (id > LARGEST_ID) return 0;
+ if (tails[id]==0) return 0;
+
+ for (search = tails[id]; tries>0 && search; tries--, search=search->prev) {
+ if (search->refcount==0) {
+ item_unlink(search);
+ break;
+ }
+ }
+ it = slabs_alloc(ntotal);
+ if (it==0) return 0;
+ }
+
+ assert(it->slabs_clsid == 0);
+
+ it->slabs_clsid = id;
+
+ assert(it != heads[it->slabs_clsid]);
+
+ it->next = it->prev = it->h_next = 0;
+ it->refcount = 0;
+ it->it_flags = 0;
+ it->nkey = len;
+ it->nbytes = nbytes;
+ strcpy(ITEM_key(it), key);
+ it->exptime = exptime;
+ it->flags = flags;
+ return it;
+}
+
+void item_free(item *it) {
+ unsigned int ntotal = ITEM_ntotal(it);
+ assert((it->it_flags & ITEM_LINKED) == 0);
+ assert(it != heads[it->slabs_clsid]);
+ assert(it != tails[it->slabs_clsid]);
+ assert(it->refcount == 0);
+
+ /* so slab size changer can tell later if item is already free or not */
+ it->slabs_clsid = 0;
+ it->it_flags |= ITEM_SLABBED;
+ slabs_free(it, ntotal);
+}
+
+void item_link_q(item *it) { /* item is the new head */
+ item **head, **tail;
+ assert(it->slabs_clsid <= LARGEST_ID);
+ assert((it->it_flags & ITEM_SLABBED) == 0);
+
+ head = &heads[it->slabs_clsid];
+ tail = &tails[it->slabs_clsid];
+ assert(it != *head);
+ assert((*head && *tail) || (*head == 0 && *tail == 0));
+ it->prev = 0;
+ it->next = *head;
+ if (it->next) it->next->prev = it;
+ *head = it;
+ if (*tail == 0) *tail = it;
+ sizes[it->slabs_clsid]++;
+ return;
+}
+
+void item_unlink_q(item *it) {
+ item **head, **tail;
+ assert(it->slabs_clsid <= LARGEST_ID);
+ head = &heads[it->slabs_clsid];
+ tail = &tails[it->slabs_clsid];
+
+ if (*head == it) {
+ assert(it->prev == 0);
+ *head = it->next;
+ }
+ if (*tail == it) {
+ assert(it->next == 0);
+ *tail = it->prev;
+ }
+ assert(it->next != it);
+ assert(it->prev != it);
+
+ if (it->next) it->next->prev = it->prev;
+ if (it->prev) it->prev->next = it->next;
+ sizes[it->slabs_clsid]--;
+ return;
+}
+
+int item_link(item *it) {
+ assert((it->it_flags & (ITEM_LINKED|ITEM_SLABBED)) == 0);
+ assert(it->nbytes < 1048576);
+ it->it_flags |= ITEM_LINKED;
+ it->time = time(0);
+ assoc_insert(ITEM_key(it), it);
+
+ stats.curr_bytes += ITEM_ntotal(it);
+ stats.curr_items += 1;
+ stats.total_items += 1;
+
+ item_link_q(it);
+
+ return 1;
+}
+
+void item_unlink(item *it) {
+ if (it->it_flags & ITEM_LINKED) {
+ it->it_flags &= ~ITEM_LINKED;
+ stats.curr_bytes -= ITEM_ntotal(it);
+ stats.curr_items -= 1;
+ assoc_delete(ITEM_key(it));
+ item_unlink_q(it);
+ }
+ if (it->refcount == 0) item_free(it);
+}
+
+void item_remove(item *it) {
+ assert((it->it_flags & ITEM_SLABBED) == 0);
+ if (it->refcount) it->refcount--;
+ assert((it->it_flags & ITEM_DELETED) == 0 || it->refcount);
+ if (it->refcount == 0 && (it->it_flags & ITEM_LINKED) == 0) {
+ item_free(it);
+ }
+}
+
+void item_update(item *it) {
+ assert((it->it_flags & ITEM_SLABBED) == 0);
+
+ item_unlink_q(it);
+ it->time = time(0);
+ item_link_q(it);
+}
+
+int item_replace(item *it, item *new_it) {
+ assert((it->it_flags & ITEM_SLABBED) == 0);
+
+ item_unlink(it);
+ return item_link(new_it);
+}
+
+char *item_cachedump(unsigned int slabs_clsid, unsigned int limit, unsigned int *bytes) {
+
+ int memlimit = 2*1024*1024;
+ char *buffer;
+ int bufcurr;
+ item *it;
+ int len;
+ int shown = 0;
+ char temp[512];
+
+ if (slabs_clsid > LARGEST_ID) return 0;
+ it = heads[slabs_clsid];
+
+ buffer = malloc(memlimit);
+ if (buffer == 0) return 0;
+ bufcurr = 0;
+
+ while (it && (!limit || shown < limit)) {
+ len = sprintf(temp, "ITEM %s [%u b; %lu s]\r\n", ITEM_key(it), it->nbytes - 2, it->time);
+ if (bufcurr + len + 6 > memlimit) /* 6 is END\r\n\0 */
+ break;
+ strcpy(buffer + bufcurr, temp);
+ bufcurr+=len;
+ shown++;
+ it = it->next;
+ }
+
+ strcpy(buffer+bufcurr, "END\r\n");
+ bufcurr+=5;
+
+ *bytes = bufcurr;
+ return buffer;
+}
+
+void item_stats(char *buffer, int buflen) {
+ int i;
+ char *bufcurr = buffer;
+ time_t now = time(0);
+
+ if (buflen < 4096) {
+ strcpy(buffer, "SERVER_ERROR out of memory");
+ return;
+ }
+
+ for (i=0; i<LARGEST_ID; i++) {
+ if (tails[i])
+ bufcurr += sprintf(bufcurr, "STAT items:%u:number %u\r\nSTAT items:%u:age %lu\r\n",
+ i, sizes[i], i, now - tails[i]->time);
+ }
+ strcpy(bufcurr, "END");
+ return;
+}
+
+/* dumps out a list of objects of each size, with granularity of 32 bytes */
+char* item_stats_sizes(int *bytes) {
+ int num_buckets = 32768; /* max 1MB object, divided into 32 bytes size buckets */
+ unsigned int *histogram = (int*) malloc(num_buckets * sizeof(int));
+ char *buf = (char*) malloc(1024*1024*2*sizeof(char));
+ int i;
+
+ if (histogram == 0 || buf == 0) {
+ if (histogram) free(histogram);
+ if (buf) free(buf);
+ return 0;
+ }
+
+ /* build the histogram */
+ memset(histogram, 0, num_buckets * sizeof(int));
+ for (i=0; i<LARGEST_ID; i++) {
+ item *iter = heads[i];
+ while (iter) {
+ int ntotal = ITEM_ntotal(iter);
+ int bucket = ntotal / 32;
+ if (ntotal % 32) bucket++;
+ if (bucket < num_buckets) histogram[bucket]++;
+ iter = iter->next;
+ }
+ }
+
+ /* write the buffer */
+ *bytes = 0;
+ for (i=0; i<num_buckets; i++) {
+ if (histogram[i]) {
+ *bytes += sprintf(&buf[*bytes], "%u %u\r\n", i*32, histogram[i]);
+ }
+ }
+ *bytes += sprintf(&buf[*bytes], "END\r\n");
+
+ free(histogram);
+ return buf;
+}
1,713 memcached.c
@@ -0,0 +1,1713 @@
+/* -*- Mode: C; tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*- */
+/*
+ * memcached - memory caching daemon
+ *
+ * http://www.danga.com/memcached/
+ *
+ * Copyright 2003 Danga Interactive, Inc. All rights reserved.
+ *
+ * Use and distribution licensed under the BSD license. See
+ * the LICENSE file for full text.
+ *
+ * Authors:
+ * Anatoly Vorobey <mellon@pobox.com>
+ * Brad Fitzpatrick <brad@danga.com>
+ *
+ * $Id$
+ */
+
+#include "config.h"
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/socket.h>
+#include <sys/signal.h>
+#include <sys/resource.h>
+/* some POSIX systems need the following definition
+ * to get mlockall flags out of sys/mman.h. */
+#ifndef _P1003_1B_VISIBLE
+#define _P1003_1B_VISIBLE
+#endif
+#include <pwd.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <netinet/in.h>
+#include <netinet/tcp.h>
+#include <arpa/inet.h>
+#include <errno.h>
+#include <time.h>
+#include <event.h>
+#include <assert.h>
+
+#ifdef HAVE_MALLOC_H
+#include <malloc.h>
+#endif
+
+#include "memcached.h"
+
+struct stats stats;
+struct settings settings;
+
+static item **todelete = 0;
+static int delcurr;
+static int deltotal;
+
+int *buckets = 0; /* bucket->generation array for a managed instance */
+
+time_t realtime(time_t exptime) {
+ time_t now;
+
+ /* no. of seconds in 30 days - largest possible delta exptime */
+ #define REALTIME_MAXDELTA 60*60*24*30
+
+ if (exptime == 0) return 0; /* 0 means never expire */
+
+ if (exptime > REALTIME_MAXDELTA)
+ return exptime;
+ else {
+ now = time(0);
+ return exptime + now;
+ }
+}
+
+void stats_init(void) {
+ stats.curr_items = stats.total_items = stats.curr_conns = stats.total_conns = stats.conn_structs = 0;
+ stats.get_cmds = stats.set_cmds = stats.get_hits = stats.get_misses = 0;
+ stats.curr_bytes = stats.bytes_read = stats.bytes_written = 0;
+ stats.started = time(0);
+}
+
+void stats_reset(void) {
+ stats.total_items = stats.total_conns = 0;
+ stats.get_cmds = stats.set_cmds = stats.get_hits = stats.get_misses = 0;
+ stats.bytes_read = stats.bytes_written = 0;
+}
+
+void settings_init(void) {
+ settings.port = 11211;
+ settings.interface.s_addr = htonl(INADDR_ANY);
+ settings.maxbytes = 64*1024*1024; /* default is 64MB */
+ settings.maxconns = 1024; /* to limit connections-related memory to about 5MB */
+ settings.verbose = 0;
+ settings.oldest_live = 0;
+ settings.evict_to_free = 1; /* push old items out of cache when memory runs out */
+ settings.managed = 0;
+}
+
+conn **freeconns;
+int freetotal;
+int freecurr;
+
+void set_cork (conn *c, int val) {
+ if (c->is_corked == val) return;
+ c->is_corked = val;
+#ifdef TCP_NOPUSH
+ setsockopt(c->sfd, IPPROTO_TCP, TCP_NOPUSH, &val, sizeof(val));
+#endif
+}
+
+void conn_init(void) {
+ freetotal = 200;
+ freecurr = 0;
+ freeconns = (conn **)malloc(sizeof (conn *)*freetotal);
+ return;
+}
+
+conn *conn_new(int sfd, int init_state, int event_flags) {
+ conn *c;
+
+ /* do we have a free conn structure from a previous close? */
+ if (freecurr > 0) {
+ c = freeconns[--freecurr];
+ } else { /* allocate a new one */
+ if (!(c = (conn *)malloc(sizeof(conn)))) {
+ perror("malloc()");
+ return 0;
+ }
+ c->rbuf = c->wbuf = 0;
+ c->ilist = 0;
+
+ c->rbuf = (char *) malloc(DATA_BUFFER_SIZE);
+ c->wbuf = (char *) malloc(DATA_BUFFER_SIZE);
+ c->ilist = (item **) malloc(sizeof(item *)*200);
+
+ if (c->rbuf == 0 || c->wbuf == 0 || c->ilist == 0) {
+ if (c->rbuf != 0) free(c->rbuf);
+ if (c->wbuf != 0) free(c->wbuf);
+ if (c->ilist !=0) free(c->ilist);
+ free(c);
+ perror("malloc()");
+ return 0;
+ }
+ c->rsize = c->wsize = DATA_BUFFER_SIZE;
+ c->rcurr = c->rbuf;
+ c->isize = 200;
+ stats.conn_structs++;
+ }
+
+ if (settings.verbose > 1) {
+ if (init_state == conn_listening)
+ fprintf(stderr, "<%d server listening\n", sfd);
+ else
+ fprintf(stderr, "<%d new client connection\n", sfd);
+ }
+
+ c->sfd = sfd;
+ c->state = init_state;
+ c->rlbytes = 0;
+ c->rbytes = c->wbytes = 0;
+ c->wcurr = c->wbuf;
+ c->ritem = 0;
+ c->icurr = c->ilist;
+ c->ileft = 0;
+ c->iptr = c->ibuf;
+ c->ibytes = 0;
+
+ c->write_and_go = conn_read;
+ c->write_and_free = 0;
+ c->item = 0;
+ c->bucket = -1;
+ c->gen = 0;
+
+ c->is_corked = 0;
+
+ event_set(&c->event, sfd, event_flags, event_handler, (void *)c);
+ c->ev_flags = event_flags;
+
+ if (event_add(&c->event, 0) == -1) {
+ if (freecurr < freetotal) {
+ freeconns[freecurr++] = c;
+ } else {
+ free (c->rbuf);
+ free (c->wbuf);
+ free (c->ilist);
+ free (c);
+ }
+ return 0;
+ }
+
+ stats.curr_conns++;
+ stats.total_conns++;
+
+ return c;
+}
+
+void conn_close(conn *c) {
+ /* delete the event, the socket and the conn */
+ event_del(&c->event);
+
+ if (settings.verbose > 1)
+ fprintf(stderr, "<%d connection closed.\n", c->sfd);
+
+ close(c->sfd);
+
+ if (c->item) {
+ item_free(c->item);
+ }
+
+ if (c->ileft) {
+ for (; c->ileft > 0; c->ileft--,c->icurr++) {
+ item_remove(*(c->icurr));
+ }
+ }
+
+ if (c->write_and_free) {
+ free(c->write_and_free);
+ }
+
+ /* if we have enough space in the free connections array, put the structure there */
+ if (freecurr < freetotal) {
+ freeconns[freecurr++] = c;
+ } else {
+ /* try to enlarge free connections array */
+ conn **new_freeconns = realloc(freeconns, sizeof(conn *)*freetotal*2);
+ if (new_freeconns) {
+ freetotal *= 2;
+ freeconns = new_freeconns;
+ freeconns[freecurr++] = c;
+ } else {
+ free(c->rbuf);
+ free(c->wbuf);
+ free(c->ilist);
+ free(c);
+ }
+ }
+
+ stats.curr_conns--;
+
+ return;
+}
+
+void out_string(conn *c, char *str) {
+ int len;
+
+ if (settings.verbose > 1)
+ fprintf(stderr, ">%d %s\n", c->sfd, str);
+
+ len = strlen(str);
+ if (len + 2 > c->wsize) {
+ /* ought to be always enough. just fail for simplicity */
+ str = "SERVER_ERROR output line too long";
+ len = strlen(str);
+ }
+
+ strcpy(c->wbuf, str);
+ strcat(c->wbuf, "\r\n");
+ c->wbytes = len + 2;
+ c->wcurr = c->wbuf;
+
+ c->state = conn_write;
+ c->write_and_go = conn_read;
+ return;
+}
+
+/*
+ * we get here after reading the value in set/add/replace commands. The command
+ * has been stored in c->item_comm, and the item is ready in c->item.
+ */
+
+void complete_nread(conn *c) {
+ item *it = c->item;
+ int comm = c->item_comm;
+ item *old_it;
+ time_t now = time(0);
+
+ stats.set_cmds++;
+
+ while(1) {
+ if (strncmp(ITEM_data(it) + it->nbytes - 2, "\r\n", 2) != 0) {
+ out_string(c, "CLIENT_ERROR bad data chunk");
+ break;
+ }
+
+ old_it = assoc_find(ITEM_key(it));
+
+ if (old_it && settings.oldest_live &&
+ old_it->time <= settings.oldest_live) {
+ item_unlink(old_it);
+ old_it = 0;
+ }
+
+ if (old_it && old_it->exptime && old_it->exptime < now) {
+ item_unlink(old_it);
+ old_it = 0;
+ }
+
+ if (old_it && comm==NREAD_ADD) {
+ item_update(old_it);
+ out_string(c, "NOT_STORED");
+ break;
+ }
+
+ if (!old_it && comm == NREAD_REPLACE) {
+ out_string(c, "NOT_STORED");
+ break;
+ }
+
+ if (old_it && (old_it->it_flags & ITEM_DELETED) && (comm == NREAD_REPLACE || comm == NREAD_ADD)) {
+ out_string(c, "NOT_STORED");
+ break;
+ }
+
+ if (old_it) {
+ item_replace(old_it, it);
+ } else item_link(it);
+
+ c->item = 0;
+ out_string(c, "STORED");
+ return;
+ }
+
+ item_free(it);
+ c->item = 0;
+ return;
+}
+
+void process_stat(conn *c, char *command) {
+ time_t now = time(0);
+
+ if (strcmp(command, "stats") == 0) {
+ char temp[1024];
+ pid_t pid = getpid();
+ char *pos = temp;
+ struct rusage usage;
+
+ getrusage(RUSAGE_SELF, &usage);
+
+ pos += sprintf(pos, "STAT pid %u\r\n", pid);
+ pos += sprintf(pos, "STAT uptime %lu\r\n", now - stats.started);
+ pos += sprintf(pos, "STAT time %ld\r\n", now);
+ pos += sprintf(pos, "STAT version " VERSION "\r\n");
+ pos += sprintf(pos, "STAT rusage_user %ld.%06ld\r\n", usage.ru_utime.tv_sec, usage.ru_utime.tv_usec);
+ pos += sprintf(pos, "STAT rusage_system %ld.%06ld\r\n", usage.ru_stime.tv_sec, usage.ru_stime.tv_usec);
+ pos += sprintf(pos, "STAT curr_items %u\r\n", stats.curr_items);
+ pos += sprintf(pos, "STAT total_items %u\r\n", stats.total_items);
+ pos += sprintf(pos, "STAT bytes %llu\r\n", stats.curr_bytes);
+ pos += sprintf(pos, "STAT curr_connections %u\r\n", stats.curr_conns - 1); /* ignore listening conn */
+ pos += sprintf(pos, "STAT total_connections %u\r\n", stats.total_conns);
+ pos += sprintf(pos, "STAT connection_structures %u\r\n", stats.conn_structs);
+ pos += sprintf(pos, "STAT cmd_get %u\r\n", stats.get_cmds);
+ pos += sprintf(pos, "STAT cmd_set %u\r\n", stats.set_cmds);
+ pos += sprintf(pos, "STAT get_hits %u\r\n", stats.get_hits);
+ pos += sprintf(pos, "STAT get_misses %u\r\n", stats.get_misses);
+ pos += sprintf(pos, "STAT bytes_read %llu\r\n", stats.bytes_read);
+ pos += sprintf(pos, "STAT bytes_written %llu\r\n", stats.bytes_written);
+ pos += sprintf(pos, "STAT limit_maxbytes %u\r\n", settings.maxbytes);
+ pos += sprintf(pos, "END");
+ out_string(c, temp);
+ return;
+ }
+
+ if (strcmp(command, "stats reset") == 0) {
+ stats_reset();
+ out_string(c, "RESET");
+ return;
+ }
+
+#ifdef HAVE_MALLOC_H
+#ifdef HAVE_STRUCT_MALLINFO
+ if (strcmp(command, "stats malloc") == 0) {
+ char temp[512];
+ struct mallinfo info;
+ char *pos = temp;
+
+ info = mallinfo();
+ pos += sprintf(pos, "STAT arena_size %d\r\n", info.arena);
+ pos += sprintf(pos, "STAT free_chunks %d\r\n", info.ordblks);
+ pos += sprintf(pos, "STAT fastbin_blocks %d\r\n", info.smblks);
+ pos += sprintf(pos, "STAT mmapped_regions %d\r\n", info.hblks);
+ pos += sprintf(pos, "STAT mmapped_space %d\r\n", info.hblkhd);
+ pos += sprintf(pos, "STAT max_total_alloc %d\r\n", info.usmblks);
+ pos += sprintf(pos, "STAT fastbin_space %d\r\n", info.fsmblks);
+ pos += sprintf(pos, "STAT total_alloc %d\r\n", info.uordblks);
+ pos += sprintf(pos, "STAT total_free %d\r\n", info.fordblks);
+ pos += sprintf(pos, "STAT releasable_space %d\r\nEND", info.keepcost);
+ out_string(c, temp);
+ return;
+ }
+#endif /* HAVE_STRUCT_MALLINFO */
+#endif /* HAVE_MALLOC_H */
+
+ if (strcmp(command, "stats maps") == 0) {
+ char *wbuf;
+ int wsize = 8192; /* should be enough */
+ int fd;
+ int res;
+
+ wbuf = (char *)malloc(wsize);
+ if (wbuf == 0) {
+ out_string(c, "SERVER_ERROR out of memory");
+ return;
+ }
+
+ fd = open("/proc/self/maps", O_RDONLY);
+ if (fd == -1) {
+ out_string(c, "SERVER_ERROR cannot open the maps file");
+ free(wbuf);
+ return;
+ }
+
+ res = read(fd, wbuf, wsize - 6); /* 6 = END\r\n\0 */
+ if (res == wsize - 6) {
+ out_string(c, "SERVER_ERROR buffer overflow");
+ free(wbuf); close(fd);
+ return;
+ }
+ if (res == 0 || res == -1) {
+ out_string(c, "SERVER_ERROR can't read the maps file");
+ free(wbuf); close(fd);
+ return;
+ }
+ strcpy(wbuf + res, "END\r\n");
+ c->write_and_free=wbuf;
+ c->wcurr=wbuf;
+ c->wbytes = res + 6;
+ c->state = conn_write;
+ c->write_and_go = conn_read;
+ close(fd);
+ return;
+ }
+
+ if (strncmp(command, "stats cachedump", 15) == 0) {
+ char *buf;
+ unsigned int bytes, id, limit = 0;
+ char *start = command + 15;
+ if (sscanf(start, "%u %u\r\n", &id, &limit) < 1) {
+ out_string(c, "CLIENT_ERROR bad command line");
+ return;
+ }
+
+ buf = item_cachedump(id, limit, &bytes);
+ if (buf == 0) {
+ out_string(c, "SERVER_ERROR out of memory");
+ return;
+ }
+
+ c->write_and_free = buf;
+ c->wcurr = buf;
+ c->wbytes = bytes;
+ c->state = conn_write;
+ c->write_and_go = conn_read;
+ return;
+ }
+
+ if (strcmp(command, "stats slabs")==0) {
+ int bytes = 0;
+ char *buf = slabs_stats(&bytes);
+ if (!buf) {
+ out_string(c, "SERVER_ERROR out of memory");
+ return;
+ }
+ c->write_and_free = buf;
+ c->wcurr = buf;
+ c->wbytes = bytes;
+ c->state = conn_write;
+ c->write_and_go = conn_read;
+ return;
+ }
+
+ if (strcmp(command, "stats items")==0) {
+ char buffer[4096];
+ item_stats(buffer, 4096);
+ out_string(c, buffer);
+ return;
+ }
+
+ if (strcmp(command, "stats sizes")==0) {
+ int bytes = 0;
+ char *buf = item_stats_sizes(&bytes);
+ if (! buf) {
+ out_string(c, "SERVER_ERROR out of memory");
+ return;
+ }
+
+ c->write_and_free = buf;
+ c->wcurr = buf;
+ c->wbytes = bytes;
+ c->state = conn_write;
+ c->write_and_go = conn_read;
+ return;
+ }
+
+ out_string(c, "ERROR");
+}
+
+void process_command(conn *c, char *command) {
+
+ int comm = 0;
+ int incr = 0;
+
+ /*
+ * for commands set/add/replace, we build an item and read the data
+ * directly into it, then continue in nread_complete().
+ */
+
+ if (settings.verbose > 1)
+ fprintf(stderr, "<%d %s\n", c->sfd, command);
+
+ /* All incoming commands will require a response, so we cork at the beginning,
+ and uncork at the very end (usually by means of out_string) */
+ set_cork(c, 1);
+
+ if ((strncmp(command, "add ", 4) == 0 && (comm = NREAD_ADD)) ||
+ (strncmp(command, "set ", 4) == 0 && (comm = NREAD_SET)) ||
+ (strncmp(command, "replace ", 8) == 0 && (comm = NREAD_REPLACE))) {
+
+ char key[251];
+ int flags;
+ time_t expire;
+ int len, res;
+ item *it;
+
+ res = sscanf(command, "%*s %250s %u %ld %d\n", key, &flags, &expire, &len);
+ if (res!=4 || strlen(key)==0 ) {
+ out_string(c, "CLIENT_ERROR bad command line format");
+ return;
+ }
+
+ if (settings.managed) {
+ int bucket = c->bucket;
+ if (bucket == -1) {
+ out_string(c, "CLIENT_ERROR no BG data in managed mode");
+ return;
+ }
+ c->bucket = -1;
+ if (buckets[bucket] != c->gen) {
+ out_string(c, "ERROR_NOT_OWNER");
+ return;
+ }
+ }
+
+ expire = realtime(expire);
+ it = item_alloc(key, flags, expire, len+2);
+ if (it == 0) {
+ out_string(c, "SERVER_ERROR out of memory");
+ /* swallow the data line */
+ c->write_and_go = conn_swallow;
+ c->sbytes = len+2;
+ return;
+ }
+
+ c->item_comm = comm;
+ c->item = it;
+ c->ritem = ITEM_data(it);
+ c->rlbytes = it->nbytes;
+ c->state = conn_nread;
+ return;
+ }
+
+ if ((strncmp(command, "incr ", 5) == 0 && (incr = 1)) ||
+ (strncmp(command, "decr ", 5) == 0)) {
+ char temp[32];
+ unsigned int value;
+ item *it;
+ unsigned int delta;
+ char key[251];
+ int res;
+ char *ptr;
+ time_t now = time(0);
+
+ res = sscanf(command, "%*s %250s %u\n", key, &delta);
+ if (res!=2 || strlen(key)==0 ) {
+ out_string(c, "CLIENT_ERROR bad command line format");
+ return;
+ }
+
+ if (settings.managed) {
+ int bucket = c->bucket;
+ if (bucket == -1) {
+ out_string(c, "CLIENT_ERROR no BG data in managed mode");
+ return;
+ }
+ c->bucket = -1;
+ if (buckets[bucket] != c->gen) {
+ out_string(c, "ERROR_NOT_OWNER");
+ return;
+ }
+ }
+
+ it = assoc_find(key);
+ if (it && (it->it_flags & ITEM_DELETED)) {
+ it = 0;
+ }
+ if (it && it->exptime && it->exptime < now) {
+ item_unlink(it);
+ it = 0;
+ }
+
+ if (!it) {
+ out_string(c, "NOT_FOUND");
+ return;
+ }
+
+ ptr = ITEM_data(it);
+ while (*ptr && (*ptr<'0' && *ptr>'9')) ptr++;
+
+ value = atoi(ptr);
+
+ if (incr)
+ value+=delta;
+ else {
+ if (delta >= value) value = 0;
+ else value-=delta;
+ }
+
+ sprintf(temp, "%u", value);
+ res = strlen(temp);
+ if (res + 2 > it->nbytes) { /* need to realloc */
+ item *new_it;
+ new_it = item_alloc(ITEM_key(it), it->flags, it->exptime, res + 2 );
+ if (new_it == 0) {
+ out_string(c, "SERVER_ERROR out of memory");
+ return;
+ }
+ memcpy(ITEM_data(new_it), temp, res);
+ memcpy(ITEM_data(new_it) + res, "\r\n", 2);
+ item_replace(it, new_it);
+ } else { /* replace in-place */
+ memcpy(ITEM_data(it), temp, res);
+ memset(ITEM_data(it) + res, ' ', it->nbytes-res-2);
+ }
+ out_string(c, temp);
+ return;
+ }
+
+ if (strncmp(command, "get ", 4) == 0) {
+
+ char *start = command + 4;
+ char key[251];
+ int next;
+ int i = 0;
+ item *it;
+ time_t now = time(0);
+
+ if (settings.managed) {
+ int bucket = c->bucket;
+ if (bucket == -1) {
+ out_string(c, "CLIENT_ERROR no BG data in managed mode");
+ return;
+ }
+ c->bucket = -1;
+ if (buckets[bucket] != c->gen) {
+ out_string(c, "ERROR_NOT_OWNER");
+ return;
+ }
+ }
+
+ while(sscanf(start, " %250s%n", key, &next) >= 1) {
+ start+=next;
+ stats.get_cmds++;
+ it = assoc_find(key);
+ if (it && (it->it_flags & ITEM_DELETED)) {
+ it = 0;
+ }
+ if (settings.oldest_live && settings.oldest_live <= now &&
+ it && it->time <= settings.oldest_live) {
+ item_unlink(it);
+ it = 0;
+ }
+ if (it && it->exptime && it->exptime < now) {
+ item_unlink(it);
+ it = 0;
+ }
+
+ if (it) {
+ if (i >= c->isize) {
+ item **new_list = realloc(c->ilist, sizeof(item *)*c->isize*2);
+ if (new_list) {
+ c->isize *= 2;
+ c->ilist = new_list;
+ } else break;
+ }
+ stats.get_hits++;
+ it->refcount++;
+ item_update(it);
+ *(c->ilist + i) = it;
+ i++;
+ } else stats.get_misses++;
+ }
+ c->icurr = c->ilist;
+ c->ileft = i;
+ if (c->ileft) {
+ c->ipart = 0;
+ c->state = conn_mwrite;
+ c->ibytes = 0;
+ return;
+ } else {
+ out_string(c, "END");
+ return;
+ }
+ }
+
+ if (strncmp(command, "delete ", 7) == 0) {
+ char key[251];
+ item *it;
+ int res;
+ time_t exptime = 0;
+
+ if (settings.managed) {
+ int bucket = c->bucket;
+ if (bucket == -1) {
+ out_string(c, "CLIENT_ERROR no BG data in managed mode");
+ return;
+ }
+ c->bucket = -1;
+ if (buckets[bucket] != c->gen) {
+ out_string(c, "ERROR_NOT_OWNER");
+ return;
+ }
+ }
+
+ res = sscanf(command, "%*s %250s %ld", key, &exptime);
+ it = assoc_find(key);
+ if (!it) {
+ out_string(c, "NOT_FOUND");
+ return;
+ }
+
+ if (exptime == 0) {
+ item_unlink(it);
+ out_string(c, "DELETED");
+ return;
+ }
+
+ if (delcurr >= deltotal) {
+ item **new_delete = realloc(todelete, sizeof(item *) * deltotal * 2);
+ if (new_delete) {
+ todelete = new_delete;
+ deltotal *= 2;
+ } else {
+ /*
+ * can't delete it immediately, user wants a delay,
+ * but we ran out of memory for the delete queue
+ */
+ out_string(c, "SERVER_ERROR out of memory");
+ return;
+ }
+ }
+
+ exptime = realtime(exptime);
+
+ it->refcount++;
+ /* use its expiration time as its deletion time now */
+ it->exptime = exptime;
+ it->it_flags |= ITEM_DELETED;
+ todelete[delcurr++] = it;
+ out_string(c, "DELETED");
+ return;
+ }
+
+ if (strncmp(command, "own ", 4) == 0) {
+ int bucket, gen;
+ char *start = command+4;
+ if (!settings.managed) {
+ out_string(c, "CLIENT_ERROR not a managed instance");
+ return;
+ }
+ if (sscanf(start, "%u:%u\r\n", &bucket,&gen) == 2) {
+ if ((bucket < 0) || (bucket >= MAX_BUCKETS)) {
+ out_string(c, "CLIENT_ERROR bucket number out of range");
+ return;
+ }
+ buckets[bucket] = gen;
+ out_string(c, "OWNED");
+ return;
+ } else {
+ out_string(c, "CLIENT_ERROR bad format");
+ return;
+ }
+ }
+
+ if (strncmp(command, "disown ", 7) == 0) {
+ int bucket;
+ char *start = command+7;
+ if (!settings.managed) {
+ out_string(c, "CLIENT_ERROR not a managed instance");
+ return;
+ }
+ if (sscanf(start, "%u\r\n", &bucket) == 1) {
+ if ((bucket < 0) || (bucket >= MAX_BUCKETS)) {
+ out_string(c, "CLIENT_ERROR bucket number out of range");
+ return;
+ }
+ buckets[bucket] = 0;
+ out_string(c, "DISOWNED");
+ return;
+ } else {
+ out_string(c, "CLIENT_ERROR bad format");
+ return;
+ }
+ }
+
+ if (strncmp(command, "bg ", 3) == 0) {
+ int bucket, gen;
+ char *start = command+3;
+ if (!settings.managed) {
+ out_string(c, "CLIENT_ERROR not a managed instance");
+ return;
+ }
+ if (sscanf(start, "%u:%u\r\n", &bucket,&gen) == 2) {
+ /* we never write anything back, even if input's wrong */
+ if ((bucket < 0) || (bucket >= MAX_BUCKETS) || (gen<=0)) {
+ /* do nothing, bad input */
+ } else {
+ c->bucket = bucket;
+ c->gen = gen;
+ }
+ c->state = conn_read;
+ /* normally conn_write uncorks the connection, but this
+ is the only time we accept a command w/o writing anything */
+ set_cork(c,0);
+ return;
+ } else {
+ out_string(c, "CLIENT_ERROR bad format");
+ return;
+ }
+ }
+
+ if (strncmp(command, "stats", 5) == 0) {
+ process_stat(c, command);
+ return;
+ }
+
+ if (strncmp(command, "flush_all", 9) == 0) {
+ time_t exptime = 0;
+ int res;
+
+ if (strcmp(command, "flush_all") == 0) {
+ settings.oldest_live = time(0);
+ out_string(c, "OK");
+ return;
+ }
+
+ res = sscanf(command, "%*s %ld", &exptime);
+ if (res != 1) {
+ out_string(c, "ERROR");
+ return;
+ }
+
+ settings.oldest_live = realtime(exptime);
+ out_string(c, "OK");
+ return;
+ }
+
+ if (strcmp(command, "version") == 0) {
+ out_string(c, "VERSION " VERSION);
+ return;
+ }
+
+ if (strcmp(command, "quit") == 0) {
+ c->state = conn_closing;
+ return;
+ }
+
+ if (strncmp(command, "slabs reassign ", 15) == 0) {
+ int src, dst;
+ char *start = command+15;
+ if (sscanf(start, "%u %u\r\n", &src, &dst) == 2) {
+ int rv = slabs_reassign(src, dst);
+ if (rv == 1) {
+ out_string(c, "DONE");
+ return;
+ }
+ if (rv == 0) {
+ out_string(c, "CANT");
+ return;
+ }
+ if (rv == -1) {
+ out_string(c, "BUSY");
+ return;
+ }
+ }
+ out_string(c, "CLIENT_ERROR bogus command");
+ return;
+ }
+
+ out_string(c, "ERROR");
+ return;
+}
+
+/*
+ * if we have a complete line in the buffer, process it.
+ */
+int try_read_command(conn *c) {
+ char *el, *cont;
+
+ if (!c->rbytes)
+ return 0;
+ el = memchr(c->rcurr, '\n', c->rbytes);
+ if (!el)
+ return 0;
+ cont = el + 1;
+ if (el - c->rcurr > 1 && *(el - 1) == '\r') {
+ el--;
+ }
+ *el = '\0';
+
+ process_command(c, c->rcurr);
+
+ c->rbytes -= (cont - c->rcurr);
+ c->rcurr = cont;
+
+ return 1;
+}
+
+/*
+ * read from network as much as we can, handle buffer overflow and connection
+ * close.
+ * before reading, move the remaining incomplete fragment of a command
+ * (if any) to the beginning of the buffer.
+ * return 0 if there's nothing to read on the first read.
+ */
+int try_read_network(conn *c) {
+ int gotdata = 0;
+ int res;
+
+ if (c->rcurr != c->rbuf) {
+ if (c->rbytes != 0) /* otherwise there's nothing to copy */
+ memmove(c->rbuf, c->rcurr, c->rbytes);
+ c->rcurr = c->rbuf;
+ }
+
+ while (1) {
+ if (c->rbytes >= c->rsize) {
+ char *new_rbuf = realloc(c->rbuf, c->rsize*2);
+ if (!new_rbuf) {
+ if (settings.verbose > 0)
+ fprintf(stderr, "Couldn't realloc input buffer\n");
+ c->rbytes = 0; /* ignore what we read */
+ out_string(c, "SERVER_ERROR out of memory");
+ c->write_and_go = conn_closing;
+ return 1;
+ }
+ c->rbuf = new_rbuf; c->rsize *= 2;
+ }
+ res = read(c->sfd, c->rbuf + c->rbytes, c->rsize - c->rbytes);
+ if (res > 0) {
+ stats.bytes_read += res;
+ gotdata = 1;
+ c->rbytes += res;
+ continue;
+ }
+ if (res == 0) {
+ /* connection closed */
+ c->state = conn_closing;
+ return 1;
+ }
+ if (res == -1) {
+ if (errno == EAGAIN || errno == EWOULDBLOCK) break;
+ else return 0;
+ }
+ }
+ return gotdata;
+}
+
+int update_event(conn *c, int new_flags) {
+ if (c->ev_flags == new_flags)
+ return 1;
+ if (