-
Notifications
You must be signed in to change notification settings - Fork 7.8k
GC improvement #3165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC improvement #3165
Conversation
I will review and run benchmarks on this today in the evening. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my benchmark results, comparing three different workloads between master (OLD) and this PR (NEW). The first workload currently completely trashes the GC implementation on master, the other two are more lightweight.
Here the GC "disabled" lines mean disabled at runtime, so roots are still collected.
// Very, very, very many objects
GC | OLD | NEW
disabled | 1.32s | 1.50s
enabled | 12.75s | 2.32s
// Very many objects
GC | OLD | NEW
disabled | 0.87s | 0.87s
enabled | 1.48s | 0.94s
// Less many objects
GC | OLD | NEW
disabled | 1.65s | 1.62s
enabled | 1.75s | 1.62s
The most interesting is probably the first result:
- With GC enabled, this is still 1.5x slower than with GC disabled (previously it was nearly 10x slower). This is because the (linear) GC threshold backoff happens too slowly for this case. I think this is fine to start with.
- With GC (runtime) disabled, we have a slowdown from the old to the new implementation (1.32s to 1.50s). perf shows that 10% of the time is spent inside
gc_remove_compressed
. So this workload happens to use enough objects that the compression scheme becomes a problem...
For the compression, I wonder if the current scheme that separates compressed & uncompressed addresses is really worthwhile. I would suggest to always mask the address (something like this: https://gist.github.com/nikic/515dc2dfdc4912cee5c6e1fb17a4d276). This means that we always have to do a root->ref != ref
check when removing a root, but on the other hand it saves a bunch of checks in other places and also resolves the gc_remove_compressed
performance issue mentioned above. Looking at perf, always doing this check does not seem problematic.
In any case, I really like this implementation, dropping the linked lists makes this a lot nicer. It's a great improvement, let's land it!
Zend/zend_gc.c
Outdated
/* bit stealing tags for gc_root_buffer.ref */ | ||
#define GC_BITS 0x3 | ||
|
||
#define GC_ROOT 0x0 /* poissible root of circular garbage */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: poissible -> possible
Zend/zend_gc.c
Outdated
} | ||
} | ||
if (GC_G(buf_size) < GC_BUF_GROW_STEP) { | ||
new_size = GC_G(buf_size) *= 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can be just *
instead of *=
, as GC_G(buf_size)
is assigned below.
Zend/zend_gc.c
Outdated
|
||
static void gc_adjust_threshold(int count) | ||
{ | ||
uint32_t new_threshold; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Indentation
Zend/zend_gc.c
Outdated
addr = GC_G(unused); | ||
root = GC_G(buf) + addr; | ||
ZEND_ASSERT(GC_IS_UNUSED(root->ref)); | ||
GC_G(unused) = (uint32_t)(uintptr_t)GC_GET_PTR(root->ref) / sizeof(void*); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GC_GET_PTR here can be dropped. At least for me the mask is not optimized away.
Zend/zend_gc.c
Outdated
return addr; | ||
} | ||
|
||
static zend_always_inline void gc_ref_set_info(zend_refcounted *ref, uint32_t info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function is no longer used.
gc_root_buffer *newRoot; | ||
|
||
if (UNEXPECTED(CG(unclean_shutdown)) || UNEXPECTED(GC_G(gc_active))) { | ||
if (UNEXPECTED(GC_G(gc_protected)) || UNEXPECTED(CG(unclean_shutdown))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could explicitly set gc_protected
on unclean shutdown and save one check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I already thought about this. Lets do this after merge.
Zend/zend_gc.c
Outdated
gc_remove_nested_data_from_buffer(current->ref, current); | ||
n = GC_FIRST_REAL_ROOT; | ||
current = GC_G(buf) + GC_FIRST_REAL_ROOT; | ||
last = GC_G(buf) + GC_G(first_unused); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last
looks unused here.
@nikic thank you for review and benchmarks. Really impressive :) You may play with GC_THRESHOLD adoption after the merge. I'll think about compression once again. I hope, I'll merge this by tomorrow evening. |
Zend/zend_gc.c
Outdated
while (idx < GC_G(first_unused)) { | ||
gc_root_buffer *root = GC_G(buf) + idx; | ||
|
||
if (root->ref == ref) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I missed the GC_GET_PTR()
. I guess we need it in this line as well, it's just less likely to hit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found a compromise, that keep good performance an makes small overhead only for apps with big data sets - (GC_G(first_unused) >= GC_MAX_UNCOMPRESSED).
Amazing work! |
These recent GC changes cause segfaults, see https://bugs.php.net/bug.php?id=76050. |
Thanks for catching. |
No description provided.