Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read/write out-of-order bug? #20

Closed
ithinkihaveacat opened this issue Mar 3, 2013 · 7 comments
Closed

Read/write out-of-order bug? #20

ithinkihaveacat opened this issue Mar 3, 2013 · 7 comments

Comments

@ithinkihaveacat
Copy link

There seems to be a problem with memjs not receiving responses from a (local) memcached server in the order in which they are sent. This can mean that the response to a GET can be confused with the response to a SET, and vice versa.

The read-write-ordering-bug branch of https://github.com/ithinkihaveacat/memjs/tree/read-write-ordering-bug illustrates this problem.

$ node example.js 
writing  <Buffer 80 00 00 03 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 41 41 41>
writing  <Buffer 80 01 00 03 08 00 00 00 00 00 00 0e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 42 42 42 5a 5a 5a>
response =  { header: 
   { magic: 129,
     opcode: 1,
     keyLength: 0,
     extrasLength: 0,
     dataType: 0,
     status: 0,
     totalBodyLength: 0,
     opaque: 0,
     cas: <Buffer 00 00 00 00 00 00 00 03> },
  key: <Buffer >,
  extras: <Buffer >,
  val: <Buffer > }
response =  { header: 
   { magic: 129,
     opcode: 0,
     keyLength: 0,
     extrasLength: 0,
     dataType: 0,
     status: 1,
     totalBodyLength: 9,
     opaque: 0,
     cas: <Buffer 00 00 00 00 00 00 00 00> },
  key: <Buffer >,
  extras: <Buffer >,
  val: <Buffer 4e 6f 74 20 66 6f 75 6e 64> }
MemJS SET: Key not found

The output shows:

  1. The GET request sent to the memcached server.
  2. The SET request sent to the memcached server.
  3. The response to the SET. (The opcode is 1--see section 3.3 of the spec.)
  4. The response to the GET.
  5. memjs incorrectly assuming that the response to GET ("Key not found") is actually the response to the SET.

I've reproduced this bug on both OS X (node 0.8.18, memcached 1.4.5) and Linux (node 0.8.18, memcached 1.4.7).

I'm a bit surprised no-one else has noticed this! Is everyone else using MemCachier, which doesn't respond out-of-order?

@alevy
Copy link
Member

alevy commented Mar 3, 2013

Interesting. OK. This is solvable by using the opaque field in the header. I'll get to it.

@alevy
Copy link
Member

alevy commented Mar 8, 2013

Aha! The problem is not with memcached responding out of order, but rather with memjs requesting out of order.

What's going on is that since the there is no connection yet (it's established lazily), the first operation (after enqueing the response and error) blocks on the connection to be established. Meanwhile, example.js continues on it's merry way and performs the second operation. Now the connection has been set, so it immediately proceeds to write the request to the socket. Only now do we switch back to the first operation. So the requests are sent out of order.

Nesting both operations within a higher level operation solves this:

client.get('dummy', function() {
  client.get('AAA');
  client.set('BBB', 'ZZZ');
  client.close(); // close really needs to be nested to ensure ordering, but at least on localhost this happens to work
});

In principal, opaque would solve this, but it turns out to be kind of a messy solution that I'd like to avoid if we can rely on ordering. I think this is more easily fixable by mucking with the connection code a bit (specifically sticking requests in a common buffer that's flushed upon the first request that sees a connected socket).

Thanks for finding this! It's a kind of subtle one, but very bad and important to fix.

@ithinkihaveacat
Copy link
Author

Ah yes, I guess memcached physically can't respond out of order, so it can't be a problem with out of order responses. And whilst memjs might use multiple memcached servers, each of them will have a separate queue, so any queue-based fix should work for them all. Thanks for looking at this. Look forward to the fix!

@ithinkihaveacat
Copy link
Author

@alevy Is this being worked on? Do you have any sort of idea how long it might take?

@alevy
Copy link
Member

alevy commented Apr 10, 2013

@ithinkihaveacat thanks for the nudge. My original (to actually write to the socket lazily) turned out to be complicated, but I did something dumber, but that I'm pretty sure works and is fairly clean. Check out the lazyconn branch or pull request #22.

If you can give this a test before i merge that would be awesome!

@ithinkihaveacat
Copy link
Author

Thanks! I had a bit of a quick test tonight, and it seems okay so far.

@ithinkihaveacat
Copy link
Author

Looks good to me! Thanks!

@alevy alevy closed this as completed May 22, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants