Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non recursive cyclic garbage collector #3889

Closed

Conversation

drealecs
Copy link

@drealecs drealecs commented Feb 27, 2019

This fixes https://bugs.php.net/bug.php?id=77345

Replace recursive calls in gc_mark_grey, gc_scan, gc_scan_black and gc_collect_white logic with using a simple stack.
Target version is 7.2.

So far it seems to work pretty well.
I'll wait for review.

I'll investigate what tests I can write. If you have something in mind different than the bug case (or a similar one), please share it.

Because with PHP 7.3 there are some changes in garbage collector, I maintained also a merged branch for it so that it will be easier when merging PHP-7.2 into PHP-7.3 and also to make sure it works: PHP-7.3...drealecs:non_recursive_cyclic_garbage_collector-7.3

@drealecs
Copy link
Author

Also, I thought about removing tail call optimization that were of higher importance when functions were recursive. This would simplify the code.
But in the end I thought that even if the optimization is small, it might worth more than the maintenance effort. Let me know what you think.

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. From the implementation side I think the main thing that needs to be improved is a different vector-backed stack implementation, as the current one is going to hurt GC performance a lot, due to the large amount of allocation it requires.

I'm also a bit concerned that this is going to add the same ref multiple times onto the stack, though I'm not sure what the best way to avoid this is. One possibility (taking black marking as an example) would be to mark the node black before adding it to the stack and only performing the examination of its children afterwards.

Zend/zend_gc.c Outdated Show resolved Hide resolved
Zend/zend_gc.c Outdated Show resolved Hide resolved
Zend/zend_gc.c Show resolved Hide resolved
Zend/zend_gc.c Outdated Show resolved Hide resolved
@nikic
Copy link
Member

nikic commented Feb 27, 2019

I think it would make sense to focus on the PHP 7.3 version of this, as I doubt we're going to land this on 7.2. Introducing major changes to GC this late in the release cycle would be quite adventurous.

@petk petk added the Bug label Mar 1, 2019
…e that many changes on return types of the changed functions

improved the stack altering functions so that we can migrate to a different type of stack with minimal changes
@drealecs
Copy link
Author

drealecs commented Mar 1, 2019

The only thing to be done is to use an array-based stack.
I'll try to find time for this over the weekend.

I think it would make sense to focus on the PHP 7.3 version of this, as I doubt we're going to land this on 7.2. Introducing major changes to GC this late in the release cycle would be quite adventurous.

When I started to investigate on the fix I had the same question whether the change is big and if I should do it for PHP-7.4 or PHP-7.2.
But than I though about it as being just another bugfix and, IMHO, I don't think PHP-7.2 and PHP-7.3 should be treated differently.

As it's not a big effort for me in terms of merges, I'll keep the changes on PHP-7.2 branch.
This way we can change our minds in the future.

Zend/zend_gc.c Outdated Show resolved Hide resolved
@drealecs
Copy link
Author

drealecs commented Mar 2, 2019

@nikic, I think I implemented all the problems and suggestions you mentioned except the one about the concern that a ref could be added multiple times to the stack.
Please review the changes again, when you have time.

To not add the same ref more than once to the stack, we could have a hashset that we can use to remember if the ref is in the stack, so we have something like a unique_stack :).
However, from what I saw so far, we don't need one as each of the functions that now uses a stack has a way of marking the ref.
I'll investigate more on this and how we can change it. It looks it can be done. I'm not sure what the performance improvement will be or if we should be happy with how performance is now.
I'm not really sure in real life this scenario is probable. Let me know what you think.

Because I'm used at my job to always have two code review approvals for any change, I'm going to ping also @dstogov for reviewing this PR as he is the top contributor for this file.

@drealecs
Copy link
Author

drealecs commented Mar 3, 2019

I managed to implement not adding the same ref multiple times:
7f309a8?w=1

What I changed was that the color changed before calling the function first, before tail recursion and before adding it to stack and not during processing.

Unfortunately, there are quite several white-space changes as some if branches needed to disappear. If you're ok with that I can push it here already.

@dstogov
Copy link
Member

dstogov commented Mar 4, 2019

@drealecs thanks for taking look into the problem.
For now, I took only a quick look over the patch.
I don't very like the idea of allocation (and reallocation) of additional memory during GC, but it's clear that GC depends on stack anyway.
I'll find a half of a day to play with the patch on this or next week.

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already looks much better from a perf perspective :) Your mechanism to prevent duplicate ref additions also makes sense to me.

Zend/zend_gc.c Outdated Show resolved Hide resolved
Zend/zend_gc.c Outdated Show resolved Hide resolved
Zend/zend_gc.c Outdated Show resolved Hide resolved
Zend/zend_gc.c Outdated Show resolved Hide resolved
Zend/zend_gc.c Outdated Show resolved Hide resolved
@nikic
Copy link
Member

nikic commented Mar 4, 2019

When I started to investigate on the fix I had the same question whether the change is big and if I should do it for PHP-7.4 or PHP-7.2.
But than I though about it as being just another bugfix and, IMHO, I don't think PHP-7.2 and PHP-7.3 should be treated differently.

Right. I think due to the size and potential risk of the change, it should only go into PHP-7.4. I mainly suggested PHP-7.3 because the GC implementation changed a lot relative to PHP-7.2 and it's become more likely to run into these kind of problems (simply because GC gets to deal with a lot more objects now).

@drealecs
Copy link
Author

drealecs commented Mar 4, 2019

…rk_grey, gc_scan and gc_collect_white

also stack_black used for gc_scan_black is defined on stack instead of heap
gc_refcounted_stack_init will not create a stack but just initialize internal array
gc_refcounted_stack_destroy will not destroy a stack but just destroy internal array
@drealecs
Copy link
Author

drealecs commented Mar 5, 2019

As @nikic had a look at it, I also pushed here the change that avoids adding same ref to a stack multiple times.

As I mentioned, I made all changes on this PR still pointing PHP-7.2 branch while maintaining 3 more branches for PHP-7.3, PHP-7.4 and master:
I tried to check if I missed anything between versions or something and apparently I'm still good with merging things. Always use 3-way merge!
I don't know if you do diffs between git diffs to test that merges were fine but I sometimes do this for an extra check. Let me show you.
First we extract the diff each branch would introduce

git diff -U0 -b $(git merge-base PHP-7.2 non_recursive_cyclic_garbage_collector-7.2) non_recursive_cyclic_garbage_collector-7.2 Zend/zend_gc.c > diff-7.2.diff 
git diff -U0 -b $(git merge-base PHP-7.3 non_recursive_cyclic_garbage_collector-7.3) non_recursive_cyclic_garbage_collector-7.3 Zend/zend_gc.c > diff-7.3.diff
git diff -U0 -b $(git merge-base PHP-7.4 non_recursive_cyclic_garbage_collector-7.4) non_recursive_cyclic_garbage_collector-7.4 Zend/zend_gc.c > diff-7.4.diff
git diff -U0 -b $(git merge-base master non_recursive_cyclic_garbage_collector) non_recursive_cyclic_garbage_collector Zend/zend_gc.c > diff-8.0.diff

and then we compare the diffs in a graphic editor.
Or run diff between each diff, considering only the + and - lines.
7.4 to 8.0:

diff -b <(grep ^[\+-].* diff-7.4.diff) <(grep ^[\+-].* diff-8.0.diff)

no change
7.3 to 7.4:

diff -b <(grep ^[\+-].* diff-7.3.diff) <(grep ^[\+-].* diff-7.4.diff)
< - * gc_scan checkes the colors of possible members.
< + * gc_scan checks the colors of possible members.

change was already fixed in commit da3316f, present in PHP7.4
and finally 7.2 to 7.3:

diff -b <(grep ^[\+-].* diff-7.2.diff) <(grep ^[\+-].* diff-7.3.diff)

We have here many changes, almost all of them related to GC_REF_GET_COLOR usages changing into GC_REF_CHECK_COLOR.
Other than that, it's an extra variable declaration in 7.2 that was already present in 7.3
and another change that was done also in sync. regarding a change between 7.2 and 7.3 ref to current->ref
And this is how I check that merges were done right. Though I could share it here if anyone is interested.

Also, the PR changes are not very complex so I still feel confident with this bug targeting PHP-7.2. Maybe we can merge the PHP-7.4 branch and after few weeks in, we can consider merging it for PHP-7.3 and PHP-7.2 if that feels safer for you. It's your call anyway :)

@drealecs
Copy link
Author

drealecs commented Mar 5, 2019

Also, another topic of higher interest maybe:
Even after @nikic helped with removing all performance issues (thank you!), I was worried that there might be some performance problems. Now we are using a stack on the heap instead of the usual execution stack that was probably optimized.
So I went on trying to come up with some scripts that uses a lot of objects, measure it on new and old version and surprisingly what I found was that the gc_collect_cycles() function now runs faster with about 30% for all 4 PHP versions.

@nikic, could you make some benchmark tests as well, if you still have the scripts you used here?: #3165 (review)

Zend/zend_gc.c Show resolved Hide resolved
…n the stack, it can be skipped

useful for white scanning after marking when the item might have been marked as black in the meantime
and also to reduce whitespaces changes
@drealecs
Copy link
Author

drealecs commented Mar 6, 2019

For reference, PR for 7.3: #3923

@drealecs
Copy link
Author

drealecs commented Mar 7, 2019

As this was merged in 7.3, I'm going to close this PR for 7.2

@drealecs drealecs closed this Mar 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants