Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagged response fastpath #687

Merged
merged 16 commits into from Jun 6, 2016

Conversation

Projects
None yet
3 participants
@garlick
Copy link
Member

garlick commented Jun 3, 2016

This PR replaces "matchtag blocks" with "matchtag groups" as discussed in #680. Basically the 32 bit matchtag is divided into the high order (12 bit) group part and low order (20 bit) part. When a flux_rpc_multi() call is made, instead of allocating a block of matchtags, it allocates a single "group matchtag", which has one or more high order bits set and low order bits zeroed.

The dispatcher and flux_recv() then simply match on the group tag, ignoring the lower order bits, rather than matching in a block (range).

Besides some cleanup/simplification, this allows the low order 20 bits to be "user defined" when a group matchtag is used. For example, flux_rpc_multi() stores the nodeid in there and then doesn't have to use a lookup table to map matchtag to nodeid.

The bsize field is dropped from struct flux_match.

The public matchtag API functions now look like this:

/* Flags for flux_matchtag_alloc()
 */
enum {
    FLUX_MATCHTAG_GROUP = 1,
};

/* Alloc/free matchtag for matched request/response.
 * This is mainly used internally by the rpc code.
 */
uint32_t flux_matchtag_alloc (flux_t h, int flags);
void flux_matchtag_free (flux_t h, uint32_t matchtag);
uint32_t flux_matchtag_avail (flux_t h, int flags);

Finally, a fastpath is implemented for tagged responses in the dispatcher, which should directly address high load in the broker when flushing the content cache to the backing store when there is high content store traffic.

Test were updated where necessary.

garlick added some commits Jun 2, 2016

libflux/message: flux_msg_cmp() matchtag groups
Drop the 'bsize' element from struct flux_match.

Change flux_msg_cmp_matchtag() to
- not match if route stack is non-empty (foreign matchtag domain)
- match on group bits only (ignore non-group bits) if group is nonzero
  Non-group bits are user-defined in this case.
- match on non-group bits if group bits are zero

Change flux_msg_cmp() to use flux_msg_cmp_matchtag() instead
of duplicating code.

Define for convenience:
  FLUX_MATCHTAG_GROUP_SHIFT
  FLUX_MATCHTAG_GROUP_MASK
libflux/tagpool: replace matchtag block with group
Replace "matchtag blocks" with matchtag groups.
There are now two Veb trees, one for regular tags, and
a smaller one for group tags.  The group tags are shifted <<20.

Replace tagpool_avail() with a getattr call, and change attribute
names.
libflux/handle: change matchtag API
flux_matchtag_alloc() now takes a 'flags' argument instead of a size.
The flags determine if a regular or group matchtag is allocated.

flux_matchtag_free() lost its 'size' argument.

flux_matchtag_avail() now accepts a 'flags' argument to select
the group pool if desired.

New flag:  FLUX_MATCHTAG_GROUP
libflux/rpc: rework for matchtag groups
Use matchtag groups instead of blocks.

When groups are used, the unused non-group bits contain the
destination nodeid, which allows us to dispense with the
nodemap array.

The above change limits nodeids used as targets of flux_rpc_multi()
to 1<<20.  If that becomes a problem in the future, we can use sequence
numbers and a mapping array to allow individual nodeids to exceed 1<<20,
within a nodeset size limit of 1<<20.

flux_rpc_multi() now honors the FLUX_RPC_NORESPONSE flag.
bindings/lua: adopt new matchtag API
Also, check return value of flux_matchtag_alloc() in l_flux_send().
bindings/python: adapt to struct flux_match change
struct flux_match has lost its bsize element.
libflux/dispatch: add fastpath for tagged responses
When a message watcher is registered for a tagged response,
use O(1) array lookup to map the tag in the response back to the
message handler rather the O(N) list traversal.

Start with 32 entries for single RPCs and 32 for group RPCs.
Expand that as needed.  Since the tagpool is LIFO, the array
size is a function of the maximum number of concurrent requests
in flight.

This should solve #680 where a large backlog of requests,
such as flush requests from content cache to backing store,
can cause the dispatcher to spend a *lot* of time walking that list.

This fastpath is not used in coproc mode.  Coproc mode is less
commonly used and it is more challenging to implement the fastpath
in that context.

@garlick garlick added the review label Jun 3, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 3, 2016

Coverage Status

Coverage decreased (-0.03%) to 74.723% when pulling 7c45eb5 on garlick:tagged_response_fastpath into 92c508c on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Jun 3, 2016

This looks great to me! Is there some systematic way to test that the fastpath improvements here had an effect on large request backlog, and ensure we don't regress in the future? I know perf testing is near impossible in Travis, but I just thought I'd ask.

cmd/flux-content: add spam subcommand
This test uses the content store as a "sink" for requests and
allows response matching scalability to be explored, by fixing N
and varying M.
@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jun 3, 2016

Here's a little test, just to build confidence (not tackling travis though):

$ ./flux content spam
Usage: flux content spam N [M]
Store N random entries, keeping M requests in flight (default 1)
  -h, --help             Display this message.

This test uses the content store as a "sink" for requests. By fixing N and varying M in this test, we can probe response matching scalability in the client (flux-content). To cause the thrashing problem, we use a big M. Here are some numbers for master:

$ time flux content spam 65536 128

real    0m4.910s
user    0m1.384s
sys 0m0.360s
$ time flux content spam 65536 256

real    0m5.398s
user    0m2.168s
sys 0m0.316s
$ time flux content spam 65536 512

real    0m5.156s
user    0m3.580s
sys 0m0.220s
$ time flux content spam 65536 1024

real    0m7.078s
user    0m6.484s
sys 0m0.168s
$ time flux content spam 65536 2048

real    0m13.813s
user    0m13.040s
sys 0m0.208s
$ time flux content spam 65536 4096

real    0m26.982s
user    0m26.008s
sys 0m0.308s
$ time flux content spam 65536 8192

real    0m51.898s
user    0m51.016s
sys 0m0.328s

Same test on this branch:

$ time flux content spam 65536 128

real    0m4.964s
user    0m0.564s
sys 0m0.452s
$ time flux content spam 65536 256

real    0m4.866s
user    0m0.576s
sys 0m0.360s
$ time flux content spam 65536 512

real    0m4.814s
user    0m0.604s
sys 0m0.364s
$ time flux content spam 65536 1024

real    0m4.791s
user    0m0.628s
sys 0m0.312s
$ time flux content spam 65536 2048

real    0m4.821s
user    0m0.580s
sys 0m0.348s
$ time flux content spam 65536 4096

real    0m5.053s
user    0m0.588s
sys 0m0.280s
$ time flux content spam 65536 8192

real    0m5.172s
user    0m0.624s
sys 0m0.268s
$ time flux content spam 65536 16384

real    0m5.094s
user    0m0.624s
sys 0m0.248s
$ time flux content spam 65536 32768

real    0m4.930s
user    0m0.612s
sys 0m0.272s
$ time flux content spam 65536 65535

real    0m4.896s
user    0m0.596s
sys 0m0.320s

Essentially flat.

I did have to subtract one from M on that last test because 65536 is the size of the matchtag pool (I would get EBUSY there).

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 4, 2016

Coverage Status

Coverage decreased (-0.09%) to 74.658% when pulling 4a80ba5 on garlick:tagged_response_fastpath into 92c508c on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Jun 4, 2016

Oh, that is really cool!!
On Jun 3, 2016 5:20 PM, "Coveralls" notifications@github.com wrote:

[image: Coverage Status] https://coveralls.io/builds/6460390

Coverage decreased (-0.09%) to 74.658% when pulling 4a80ba5
4a80ba5
on garlick:tagged_response_fastpath
into 92c508c
92c508c
on flux-framework:master
.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#687 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAtSUujVgXE1VKO4_-mFBLn6ODk8nnnBks5qIMTGgaJpZM4It2wR
.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jun 4, 2016

Thanks - at least that one case is helped here. Was hoping for more earlier!

Don't merge yet - I think I may as well drive that test from sharness at a smaller scale as it does test reallocation of the tag map in the handle (might bump up coverage a bit even if it's not very methodical)

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 4, 2016

Coverage Status

Coverage increased (+0.02%) to 74.768% when pulling f9c830f on garlick:tagged_response_fastpath into 92c508c on flux-framework:master.

test/content-cache: add spam test
Store blobs using async rpc, from rank 0, and from all ranks.

In addition to exercising the content store, this uses
the dispatcher's response matching fastpath, and will exceed
the initial tag map size and force it to be reallocated
several times.

@garlick garlick force-pushed the garlick:tagged_response_fastpath branch from f9c830f to ace0e33 Jun 5, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 5, 2016

Coverage Status

Coverage increased (+0.009%) to 74.76% when pulling ace0e33 on garlick:tagged_response_fastpath into 92c508c on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Jun 6, 2016

Is this one ready to merge?

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jun 6, 2016

OK, I guess it's ready. I'll separately propose an update to RFC 6 to cover this change.

@grondo grondo merged commit 71123f5 into flux-framework:master Jun 6, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.009%) to 74.76%
Details

@grondo grondo removed the review label Jun 6, 2016

@garlick garlick deleted the garlick:tagged_response_fastpath branch Jun 6, 2016

This was referenced Aug 2, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.