Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC improvement #3165

Merged
merged 8 commits into from Mar 2, 2018
Merged

GC improvement #3165

merged 8 commits into from Mar 2, 2018

Conversation

dstogov
Copy link
Member

@dstogov dstogov commented Mar 1, 2018

No description provided.

@nikic
Copy link
Member

nikic commented Mar 1, 2018

I will review and run benchmarks on this today in the evening.

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my benchmark results, comparing three different workloads between master (OLD) and this PR (NEW). The first workload currently completely trashes the GC implementation on master, the other two are more lightweight.

Here the GC "disabled" lines mean disabled at runtime, so roots are still collected.

// Very, very, very many objects
GC       |    OLD |   NEW
disabled |  1.32s | 1.50s
enabled  | 12.75s | 2.32s

// Very many objects
GC       |    OLD |   NEW
disabled |  0.87s | 0.87s
enabled  |  1.48s | 0.94s

// Less many objects
GC       |    OLD |   NEW
disabled |  1.65s | 1.62s
enabled  |  1.75s | 1.62s

The most interesting is probably the first result:

  • With GC enabled, this is still 1.5x slower than with GC disabled (previously it was nearly 10x slower). This is because the (linear) GC threshold backoff happens too slowly for this case. I think this is fine to start with.
  • With GC (runtime) disabled, we have a slowdown from the old to the new implementation (1.32s to 1.50s). perf shows that 10% of the time is spent inside gc_remove_compressed. So this workload happens to use enough objects that the compression scheme becomes a problem...

For the compression, I wonder if the current scheme that separates compressed & uncompressed addresses is really worthwhile. I would suggest to always mask the address (something like this: https://gist.github.com/nikic/515dc2dfdc4912cee5c6e1fb17a4d276). This means that we always have to do a root->ref != ref check when removing a root, but on the other hand it saves a bunch of checks in other places and also resolves the gc_remove_compressed performance issue mentioned above. Looking at perf, always doing this check does not seem problematic.

In any case, I really like this implementation, dropping the linked lists makes this a lot nicer. It's a great improvement, let's land it!

Zend/zend_gc.c Outdated
/* bit stealing tags for gc_root_buffer.ref */
#define GC_BITS 0x3

#define GC_ROOT 0x0 /* poissible root of circular garbage */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: poissible -> possible

Zend/zend_gc.c Outdated
}
}
if (GC_G(buf_size) < GC_BUF_GROW_STEP) {
new_size = GC_G(buf_size) *= 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can be just * instead of *=, as GC_G(buf_size) is assigned below.

Zend/zend_gc.c Outdated

static void gc_adjust_threshold(int count)
{
uint32_t new_threshold;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Indentation

Zend/zend_gc.c Outdated
addr = GC_G(unused);
root = GC_G(buf) + addr;
ZEND_ASSERT(GC_IS_UNUSED(root->ref));
GC_G(unused) = (uint32_t)(uintptr_t)GC_GET_PTR(root->ref) / sizeof(void*);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GC_GET_PTR here can be dropped. At least for me the mask is not optimized away.

Zend/zend_gc.c Outdated
return addr;
}

static zend_always_inline void gc_ref_set_info(zend_refcounted *ref, uint32_t info)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is no longer used.

gc_root_buffer *newRoot;

if (UNEXPECTED(CG(unclean_shutdown)) || UNEXPECTED(GC_G(gc_active))) {
if (UNEXPECTED(GC_G(gc_protected)) || UNEXPECTED(CG(unclean_shutdown))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could explicitly set gc_protected on unclean shutdown and save one check here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I already thought about this. Lets do this after merge.

Zend/zend_gc.c Outdated
gc_remove_nested_data_from_buffer(current->ref, current);
n = GC_FIRST_REAL_ROOT;
current = GC_G(buf) + GC_FIRST_REAL_ROOT;
last = GC_G(buf) + GC_G(first_unused);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last looks unused here.

@dstogov
Copy link
Member Author

dstogov commented Mar 1, 2018

@nikic thank you for review and benchmarks. Really impressive :)

You may play with GC_THRESHOLD adoption after the merge.

I'll think about compression once again.
In general, I like your patch.
It didn't work out of the box (you forgot about GC_BITS).
We have to perform (GC_GET_PTR(root->ref) == ref) check.
For me, your implementation is a bit "slower" (according to callgrind)

I hope, I'll merge this by tomorrow evening.

Zend/zend_gc.c Outdated
while (idx < GC_G(first_unused)) {
gc_root_buffer *root = GC_G(buf) + idx;

if (root->ref == ref) {
Copy link
Member

@nikic nikic Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I missed the GC_GET_PTR(). I guess we need it in this line as well, it's just less likely to hit here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found a compromise, that keep good performance an makes small overhead only for apps with big data sets - (GC_G(first_unused) >= GC_MAX_UNCOMPRESSED).

@frederikbosch
Copy link
Contributor

Amazing work!

@kelunik
Copy link
Member

kelunik commented Mar 5, 2018

These recent GC changes cause segfaults, see https://bugs.php.net/bug.php?id=76050.

@dstogov
Copy link
Member Author

dstogov commented Mar 5, 2018

Thanks for catching.
I've reproduced and understood the reason.
@nikic could you also take a look into bug report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants