New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC improvement #3165

Merged
merged 8 commits into from Mar 2, 2018

Conversation

5 participants
@dstogov
Contributor

dstogov commented Mar 1, 2018

No description provided.

dstogov added some commits Mar 1, 2018

@nikic

This comment has been minimized.

Member

nikic commented Mar 1, 2018

I will review and run benchmarks on this today in the evening.

@nikic

Here are my benchmark results, comparing three different workloads between master (OLD) and this PR (NEW). The first workload currently completely trashes the GC implementation on master, the other two are more lightweight.

Here the GC "disabled" lines mean disabled at runtime, so roots are still collected.

// Very, very, very many objects
GC       |    OLD |   NEW
disabled |  1.32s | 1.50s
enabled  | 12.75s | 2.32s

// Very many objects
GC       |    OLD |   NEW
disabled |  0.87s | 0.87s
enabled  |  1.48s | 0.94s

// Less many objects
GC       |    OLD |   NEW
disabled |  1.65s | 1.62s
enabled  |  1.75s | 1.62s

The most interesting is probably the first result:

  • With GC enabled, this is still 1.5x slower than with GC disabled (previously it was nearly 10x slower). This is because the (linear) GC threshold backoff happens too slowly for this case. I think this is fine to start with.
  • With GC (runtime) disabled, we have a slowdown from the old to the new implementation (1.32s to 1.50s). perf shows that 10% of the time is spent inside gc_remove_compressed. So this workload happens to use enough objects that the compression scheme becomes a problem...

For the compression, I wonder if the current scheme that separates compressed & uncompressed addresses is really worthwhile. I would suggest to always mask the address (something like this: https://gist.github.com/nikic/515dc2dfdc4912cee5c6e1fb17a4d276). This means that we always have to do a root->ref != ref check when removing a root, but on the other hand it saves a bunch of checks in other places and also resolves the gc_remove_compressed performance issue mentioned above. Looking at perf, always doing this check does not seem problematic.

In any case, I really like this implementation, dropping the linked lists makes this a lot nicer. It's a great improvement, let's land it!

/* bit stealing tags for gc_root_buffer.ref */
#define GC_BITS 0x3
#define GC_ROOT 0x0 /* poissible root of circular garbage */

This comment has been minimized.

@nikic

nikic Mar 1, 2018

Member

nit: poissible -> possible

}
}
if (GC_G(buf_size) < GC_BUF_GROW_STEP) {
new_size = GC_G(buf_size) *= 2;

This comment has been minimized.

@nikic

nikic Mar 1, 2018

Member

nit: Can be just * instead of *=, as GC_G(buf_size) is assigned below.

static void gc_adjust_threshold(int count)
{
uint32_t new_threshold;

This comment has been minimized.

@nikic

nikic Mar 1, 2018

Member

nit: Indentation

addr = GC_G(unused);
root = GC_G(buf) + addr;
ZEND_ASSERT(GC_IS_UNUSED(root->ref));
GC_G(unused) = (uint32_t)(uintptr_t)GC_GET_PTR(root->ref) / sizeof(void*);

This comment has been minimized.

@nikic

nikic Mar 1, 2018

Member

The GC_GET_PTR here can be dropped. At least for me the mask is not optimized away.

return addr;
}
static zend_always_inline void gc_ref_set_info(zend_refcounted *ref, uint32_t info)

This comment has been minimized.

@nikic

nikic Mar 1, 2018

Member

I think this function is no longer used.

gc_root_buffer *newRoot;
if (UNEXPECTED(CG(unclean_shutdown)) || UNEXPECTED(GC_G(gc_active))) {
if (UNEXPECTED(GC_G(gc_protected)) || UNEXPECTED(CG(unclean_shutdown))) {

This comment has been minimized.

@nikic

nikic Mar 1, 2018

Member

We could explicitly set gc_protected on unclean shutdown and save one check here.

This comment has been minimized.

@dstogov

dstogov Mar 1, 2018

Contributor

Right. I already thought about this. Lets do this after merge.

gc_remove_nested_data_from_buffer(current->ref, current);
n = GC_FIRST_REAL_ROOT;
current = GC_G(buf) + GC_FIRST_REAL_ROOT;
last = GC_G(buf) + GC_G(first_unused);

This comment has been minimized.

@nikic

nikic Mar 1, 2018

Member

last looks unused here.

dstogov added some commits Mar 1, 2018

@dstogov

This comment has been minimized.

Contributor

dstogov commented Mar 1, 2018

@nikic thank you for review and benchmarks. Really impressive :)

You may play with GC_THRESHOLD adoption after the merge.

I'll think about compression once again.
In general, I like your patch.
It didn't work out of the box (you forgot about GC_BITS).
We have to perform (GC_GET_PTR(root->ref) == ref) check.
For me, your implementation is a bit "slower" (according to callgrind)

I hope, I'll merge this by tomorrow evening.

while (idx < GC_G(first_unused)) {
gc_root_buffer *root = GC_G(buf) + idx;
if (root->ref == ref) {

This comment has been minimized.

@nikic

nikic Mar 2, 2018

Member

You're right, I missed the GC_GET_PTR(). I guess we need it in this line as well, it's just less likely to hit here.

This comment has been minimized.

@dstogov

dstogov Mar 2, 2018

Contributor

I've found a compromise, that keep good performance an makes small overhead only for apps with big data sets - (GC_G(first_unused) >= GC_MAX_UNCOMPRESSED).

@php-pulls php-pulls merged commit 06c6c63 into php:master Mar 2, 2018

0 of 2 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@frederikbosch

This comment has been minimized.

Contributor

frederikbosch commented Mar 2, 2018

Amazing work!

@kelunik

This comment has been minimized.

Contributor

kelunik commented Mar 5, 2018

These recent GC changes cause segfaults, see https://bugs.php.net/bug.php?id=76050.

@dstogov

This comment has been minimized.

Contributor

dstogov commented Mar 5, 2018

Thanks for catching.
I've reproduced and understood the reason.
@nikic could you also take a look into bug report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment