New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
broker/content-cache: throughput drop after conversion to zhashx #3591
Comments
What are the details of the test environment? I also wonder if it is reproducible with #3590 applied, as I didn't see anything like this in my testing. (However, given that the issue went away when you reverted #3589, you have a strong case something is going on there) To get to the bottom of this you may have to devise a way to capture perf tracing on the 1st and 10th runs and look at the difference in the case where there is a performance hit. I'll see if this is reproducible on my system. |
I ran on catalyst, so not the most perfect environment, but over many runs a pattern was emerging. Was going to try and profile later on today basically using the same techniques i used in #3583. |
Yeah, the pattern is concerning. Are you running on a login node or allocating a node for testing? If you have a node to yourself for testing, it is also handy to use |
I've just been doing it on the login node, accepting the occasional performance blip. But when I see the throughput drop into the teens, I knew something was wrong. |
I think there is probably something racy, perhaps with an initialization or something like that. I do not get the horrible throughput drop everytime I run. And when I run with Anyways, the perf diff output.
Looking into the profiling that was recorded:
Not clear what's going on. Perhaps cache entries are not getting cleared? So things are sticking around longer, thus |
Adding some content cache stats to my runs. Short story, nothing looks extra ordinarily different. slow throughput case
normal throughput case (before the
|
potential theory: at the top of the zhashx implementation
Most "traditional" hash tables I know of only increase in size when they are X% full. Skimming the code,
Could performance randomness be related to this? The Its hard to test b/c we can't get internals from the Just a guess at the moment. |
We should have ideal case for the hash function since sha1 keys should be very evenly distributed. Since the resize occurs inside From your previous perf trace, I thought maybe it would be worth seeing if we could employ an LRU instead of the scanning done by the heartbeat-driven cache purge. That repeated scanning (even though it is limited with some heuristics) seems like a probable source of overhead. I have a PR started for this though I have to pause it for today while I travel. |
Eek! Just to make sure about the hash sizes, I modified czmq with a using
using
even though number of entries is always < 1M, we have a Interestingly enough, even before the first 4096 jobs are done, the Why this is happening is still a mystery to me or if its just dumb luck. Edit: my czmq hack diff --git a/src/zhash.c b/src/zhash.c
index 625ae20..86752b8 100644
--- a/src/zhash.c
+++ b/src/zhash.c
@@ -167,6 +167,9 @@ zhash_insert (zhash_t *self, const char *key, void *value)
if (!new_items)
return -1;
+ if (new_limit > 100000) {
+ fprintf (stderr, "new limit zhash: %ju\n", (uintmax_t)new_limit);
+ }
// Move all items to the new hash table, rehashing to
// take into account new hash table limit
uint index;
diff --git a/src/zhashx.c b/src/zhashx.c
index 74485a4..37a9ddf 100644
--- a/src/zhashx.c
+++ b/src/zhashx.c
@@ -215,6 +215,10 @@ s_zhashx_rehash (zhashx_t *self, uint new_prime_index)
if (!new_items)
return -1;
+ if (new_limit > 100000) {
+ fprintf (stderr, "new limit: %ju\n", (uintmax_t)new_limit);
+ }
+
// Move all items to the new hash table, rehashing to
// take into account new hash table limit
size_t index; |
Awesome job zeroing in on that! I guess my next question is: is the growth caused by
Maybe it would be worth adding a printf at the two call sites of Edit: curious: how did you isolate the content-cache hash from the other places where zhashx is being used in broker modules? |
I'm zero-ing in, but I actually think this is
Good point, i guess i didn't. I just assumed if the I think your LRU idea will solve this issue regardless of sticking with Edit: oops, didn't answer your other questions
Yup, already did that :-) The rehash was being called from the "chain limit" path and not the "load" one. |
It does seem like they only increase the chain limit on the "load factor" path not the "chain limit" one. Hmm. |
yup, this seems to fix things as well
I'll open up an issue in czmq to discuss possible solutions. I think the above or simply making the default chain limit > 1 would be sufficient. Or maybe there's some other way too. |
Excellent!!! Seems like this may help our memory footprint in other areas as well! |
Wow, amazing debugging Al!
…On Sat, Apr 10, 2021, 9:09 AM Jim Garlick ***@***.***> wrote:
Excellent!!! Seems like this may help our memory footprint in other areas
as well!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3591 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFVEUUL63VLDGO3VVTTV3TTIBZ2VANCNFSM42TDRRQQ>
.
|
Posted an issue zeromq/czmq#2173 and PR zeromq/czmq#2174 But of course this fix isn't in any Using the example above, ~500M vs ~1M hash buckets on the heap. The Edit: Perhaps worth mentioning. I had this run a bit ago. I think I hit enough unlucky collisions that basically things stopped working.
the 11th iteration (of 13) ran for 26 minutes before I ctrl+C'ed it. So there is a "worst case w/ bad luck" kinda case in this issue. |
This lead me to an obvious question. Is this Other places don't worry me quite as much but I don't know the code everywhere that well. Or perhaps it adds up? Having a Going back to the
LD preloading my fixed up
granted its only 1 run and on a node I don't exclusively own, but it does seem the fixes help performance outside of the content-cache a smidge? Either way, it's not huge, suggesting issue doesn't seem to affect other parts of flux-core too much. |
Thoughts on pulling in just This bug feels serious to me. Not sure if we can wait for it to filter down to the distros. Edit: maybe we could grab just |
Pulling in If we copy & paste in the files, I think we're good? But we probably have to rename functions to ensure no symbol conflicts? So we have to go down the modified file path? Edit: ok, if my reading is correct, I think if we copy in files and only edit in those files, we're fine. We just can't copy in any LGPL code into the files. Is that your guy's reading? |
Yes, that's how I read it - the file needs to remain MPL2, but it can be included (and the source can be modified) in an (L)GPL project. |
On renaming - I don't think we'll have symbol conflicts since symbols in a statically linked internal library would take precedence over ones in an external dynamically linked library. We might run into trouble with include files though - or would we if we don't change any function prototypes? Edit: I may have forgotten yet again how that stuff works so please don't take this comment too seriously. |
I dunno, I feel like we're just askin for trouble :-) e.g. portability on systems we're not so familiar with. My preference would be to slightly adjust the function names. Perhaps just prefix all with an |
Started my /* This is a copy of the czmq zhashx library.
*
* Due to czmq issue #2173
* (https://github.com/zeromq/czmq/issues/2173) performance in
* flux-core was extremely poor at times and flux-core could not
* wait for the OS distros to pick up the bug fix.
*
* This library is a verbatim copy with the following changes:
*
* - rename all "zhashx" to "fzhashx" to avoid symbol collisions.
* - replace all calls to zmalloc() to calloc()
* - add fzhashx_t typedef
* - add #define for freen()
* - remove zhashx_test() and all associated testing code
* - remove functions using zframe_t (zhashx_unpack(), zhashx_pack(),
* fzhashx_unpack_own(), fzhashx_pack_own())
* - remove all declarations of CZMQ_EXPORT
* - adjust headers
*/ More changes than I would have liked, but not soooo bad. https://github.com/chu11/flux-core/tree/issue3591_internal_zhashx |
You might be able to do symbol renaming transparently in the copied header
with some clever preprocessor macros, instead of making changes by hand.
…On Sat, Apr 10, 2021, 7:54 PM Al Chu ***@***.***> wrote:
On renaming - I don't think we'll have symbol conflicts since symbols in a
statically linked internal library would take precedence over ones in an
external dynamically linked library. We might run into trouble with include
files though - or would we if we don't change any function prototypes?
I dunno, I feel like we're just askin for trouble :-)
My preference would be to slightly adjust the function names. Perhaps just
prefix all with an f? So like fzhashx_create()?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3591 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFVEUTNSYMTB4W2KMO3HVDTIEFPNANCNFSM42TDRRQQ>
.
|
I had considered this, but thought it bug-prone. Like the next time we hit an issue in But ... perhaps I was being overly conservative and it's probably fine. Let me give it a shot, see it turns out ok. |
So I tried to go with the technique that my copied in library names everything
in files to do the mapping and limit all of the code changes. And when |
Oh crud, I just realized something. Do we need to export |
Good point - I opened flux-framework/flux-sched#821. I think probably it would be more appropriate for them to also pull in |
Problem: To avoid bugs in most OS distro's versions of czmq, we want internal flux callers to use the internal "fzhashx" library instead of the czmq "zhashx" library whenever a "zhashx" is used. Solution: Include the file "zhashx_int.h" whereever zhashx data structures are used. Update Makefile.am files to add libczmqcontainers.la when appropriate. Fixes flux-framework#3591
Problem: To avoid bugs in most OS distro's versions of czmq, we want internal flux callers to use the internal "fzhashx" library instead of the czmq "zhashx" library whenever a "zhashx" is used. Solution: Include the file "zhashx_int.h" whereever zhashx data structures are used. Update Makefile.am files to add libczmqcontainers.la when appropriate. Fixes flux-framework#3591
Problem: To avoid bugs in most OS distro's versions of czmq, we want internal flux callers to use the internal "fzhashx" library instead of the czmq "zhashx" library whenever a "zhashx" is used. Solution: Include the file "czmq_containers.h" whereever zhashx data structures are used. Update Makefile.am files to add libczmqcontainers.la when appropriate. Fixes flux-framework#3591
Problem: To avoid bugs in most OS distro's versions of czmq, we want internal flux callers to use the internal "fzhashx" library instead of the czmq "zhashx" library whenever a "zhashx" is used. Solution: Include the file "czmq_containers.h" whereever zhashx data structures are used. Update Makefile.am files to add libczmqcontainers.la when appropriate. Fixes flux-framework#3591
Problem: To avoid bugs in most OS distro's versions of czmq, we want internal flux callers to use the internal "fzhashx" library instead of the czmq "zhashx" library whenever a "zhashx" is used. Solution: Include the file "czmq_containers.h" whereever zhashx data structures are used. Update Makefile.am files to add libczmqcontainers.la when appropriate. Fixes flux-framework#3591
While continuing to work on #3583 I got some strange performance results. After a lot of trial and error, it appears that PR #3589 commit a608e51 introduced something that at times (but not all the time) slows down throughput.
Here's an example, I ran the following 10 times.
src/cmd/flux start ./throughput_test_simulated.sh 4096 10
The script simple runs a number of jobs (4096) a number of times (10) and greps for just the final throughput result.
Here's the results of the run:
As you can see, the final throughput varies from as low as ~12/s to as high as ~65/s. I have gotten "good" runs as well, but didn't seem to see one amongst the 10 above.
When running against the commit 3d068c7 (just a few commits earlier before the converted use of
zhashx
), everything ran as expected (post PR #3585) with the final throughput always > 80/s.The text was updated successfully, but these errors were encountered: