Skip to content
This repository

Dec 22, 2011

  1. ehohenstein

    fixing issue with re-connect logic

    There was an issue with the re-connect logic such that it would sometimes only partially apply a transaction. The redis_check_eof() function in library.c was automatically and invisibly re-connecting to the redis server when php_stream_eof() returned non-zero. If this happened to happen after a transaction had been partially stored, the server was rolling back the transaction but the phpredis library was continuing as if it was still applying the transaction so the remaining commands were being applied immediately. The result was that the portion of the transaction after the re-connect was applied (although not atomically) but the portion before the reconnect was thrown away by the server.
    
    This change causes the phpredis library to fail (by throwing a RedisException) instead of reconnecting if the client session is in multi mode or it is watching any keys. Either of these conditions indicate that the server has lost the state of the session and the client needs to rebuild it. Because transactions can fail due to watched keys having been changed, clients that use transactions should already be coded to handle transaction failures using retry logic. With this change, clients that use transactions when talking to redis servers that use connection timeout, they will have to add a try...catch within the transaction retry loop to handle this type of failure. Clients that do not use transactions and clients that use transactions with redis servers that do not use connection timeout will be unaffected.
    
    Perhaps not coincidentally, this change also fixes a well known but previously not well understood bug that causes RedisExceptions to be thrown with a message like  "protocol error, got '*' as reply type byte". My company began integrating redis into our service that handles 500,000 daily active users in April of this year. Prior to this fix, when we had our redis server connection timeout set to 5 seconds we would get hundreds of those error messages showing up in our php error logs per day. This wasn't a huge problem but it was concerning enough to raise the connection timeout to 30 seconds which reduced the rate of the errors but didn't eliminate them. At one point we had a process that was failing repeatedly with this error with a 30 second timeout so we had to raise the timeout all the way up to 300 seconds. A long timeout is not idea for a high throughput website like ours since it can cause failures to cascade from one system to another due to resource limits and requests holding on to connections for a long time. After introducing this fix roughly a month ago we have not had a single instance of RedisException show up in our php error logs (except for legitimate errors related to the server being down) even after lowering the server timeout back down to 5 seconds.
    
    To reproduce the problem without this fix, set the redis server timeout configuration to 3 seconds and use the following test php script:
    
    <?php
    
    define('MAX_FAILURES', 1);
    
    function get_redis() {
        $r = new Redis();
        $r->connect('localhost');
        return $r;
    }
    
    $r = get_redis();
    
    $r->set('foo', '123');
    $r->set('bar', 'abc');
    
    if (isset($_GET['trans']) && $_GET['trans']) {
        $completed = false;
        $failures = 0;
        while (!$completed && ($failures < MAX_FAILURES)) {
            try {
                $trans = $r->multi();
                $trans->set('foo', $_GET['foo']);
                if (isset($_GET['sleep']) && $_GET['sleep']) {
                    sleep($_GET['sleep']);
                }
                $trans->set('bar', $_GET['bar']);
                var_export($trans->exec());
                echo '<br/>';
                $completed = true;
            } catch (RedisException $e) {
                echo 'transaction failed<br/>';
                $failures++;
                $r = get_redis();
            }
        }
    } else {
        $r->set('foo', $_GET['foo']);
        if (isset($_GET['sleep']) && $_GET['sleep']) {
            sleep($_GET['sleep']);
        }
        $r->set('bar', $_GET['bar']);
    }
    
    echo $r->get('foo');
    echo '<br/>';
    echo $r->get('bar');
    
    ?>
    
    ****************************
    *** Results without this fix
    ****************************
    
    foo=bar&bar=baz&trans=0&sleep=0
    bar
    baz
    
    foo=bar&bar=baz&trans=1&sleep=0
    array ( 0 => true, 1 => true, )
    bar
    baz
    
    foo=bar&bar=baz&trans=0&sleep=30
    bar
    baz
    
    foo=bar&bar=baz&trans=1&sleep=30
    NULL
    123
    baz
    
    Notice in this last example the call to exec() did not return anything and the value of the key 'bar' was modified by the transaction but the value of the key 'foo' was not even though the calls to set() on both keys were made between a call to multi() and a call to exec().
    
    *************************
    *** Results with this fix
    *************************
    
    foo=bar&bar=baz&trans=0&sleep=0
    bar
    baz
    
    foo=bar&bar=baz&trans=1&sleep=0
    array ( 0 => true, 1 => true, )
    bar
    baz
    
    foo=bar&bar=baz&trans=0&sleep=30
    bar
    baz
    
    foo=bar&bar=baz&trans=1&sleep=30
    transaction failed
    123
    abc
    
    Notice in the last example where the transaction failed message is printed, it is necessary to explicitly reconnect to the redis server. Trying to reuse the same redis object after it has failed to reconnect will result in a segmentation fault. I believe this was an existing problem with the phpredis library and it is not addressed by this change.
    authored December 22, 2011
Something went wrong with that request. Please try again.