Multi-key GET Behavior #40

Closed
swashbucklin opened this Issue Apr 15, 2015 · 4 comments

Projects

None yet

2 participants

@swashbucklin

I noticed that mcrouter when given a multi-key get request (ex: get key1 key2 key3 \r\n) splits it into single-key get requests (get key1 \r\n & get key2 \r\n & get key3 \r\n), even if all of the keys have the same route and hash to the same server in a pool.

example config:

{"pools":{"A":{"servers":["172.17.0.3:11211", "172.17.0.4:11211"]}},"route":"PoolRoute|A"}

If the keys abc and def exist on 172.17.0.3:11211, a get abc def \r\n against mcrouter results in:

<28 get abc
>28 sending key abc
>28 END
<28 get def
>28 sending key def
>28 END

when arguably it could have been optimized:

<28 get abc def
>28 sending key abc
>28 sending key def
>28 END

In summary: For a multi-key get request, why not (1) parse the keys (2) run the appropriate hashing function against each key (3) combine keys that hash to the same server into a multi-key get request (4) send the computed multi-key get request(s) and single-key request(s) and finally (5) consolidate the response to return back to the caller.

Is there a philosophical reason why this was not done? Or would this be a reasonable enhancement request?

Thanks!

@alikhtarov
Contributor

No reason except for simplicity. There's very little difference from the server's point of view between 'get a b\r\n' and 'get a\r\nget b\rn' except for a few extra bytes to send each 'get' and 'value/end' separately, as long as we batch the request into the same network packet which we try to do.
In our use case we also very rarely get multigets that end up all going to the same server (we typically have a lot of fan out), so there was also no practical reason for this optimization.

@swashbucklin

To provide more color:

Our usecase: a user has a set of objects, with the number of objects per user ranging anywhere from dozens to 100K. Some of our pages require fetching dozens of a user's objects, whereas other pages only need to fetch one object.

Our original thought was to write a custom hashing function for the PoolRoute to guarantee that all of a user's objects would land on the same server in a pool. The idea being that a multi-key GET to mcrouter for a set of the user's objects would only result in a single request to a single box. This has a few nice properties, one being that it avoids the incast congestion problem you folks mentioned in your whitepaper.

Were we boiling the ocean here (aka overthinking it)? Would we just be better off using default consistent hashing and have the multi-key get requests split up?

@alikhtarov
Contributor

There's a very simple way to achieve storing a bunch of related keys on one server with mcrouter. If you structure a key like prefix|#|suffix, mcrouter will only use prefix part for consistent hashing. So you can do stuff like user_id:123|#|data_id:456, and all different data keys for the same user will end up on one box.

This is a separate issue from using multigets. In this case mcrouter will still send individual gets to the server, but they will be batched into minimal number of network packets possible, so it's not a big issue. It's literally the difference between sending get a b c\r\n vs. get a\r\nget b\r\nget c\r\n.

@swashbucklin

really appreciate the quick replies @alikhtarov !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment