Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] epoch + 3 or epoch + 2? #597

Open
ming535 opened this issue Mar 11, 2022 · 5 comments
Open

[Question] epoch + 3 or epoch + 2? #597

ming535 opened this issue Mar 11, 2022 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@ming535
Copy link

ming535 commented Mar 11, 2022

Hi, I was studying the lecture on epoch based garbage collection, the lecture proves that when retire an object at epoch E, it is safe to free the object at E + 3 since the two "happens before" releation.

Screen Shot 2022-03-11 at 15 48 14

And I was also looking into the code of crossebeam-epoch, it seems that crossbeam-epoch has used E + 3 and reverts it back to E + 2:

I am not sure if this is the right place to ask, but I am confused that crossbeam-epoch reverted back to E + 2.

@ming535 ming535 added the question Further information is requested label Mar 11, 2022
@tomtomjhj tomtomjhj assigned tomtomjhj and unassigned kyeongmincho Mar 11, 2022
@ming535
Copy link
Author

ming535 commented Mar 15, 2022

Looking into this rfc https://github.com/crossbeam-rs/rfcs/blob/master/text/2017-07-23-relaxed-memory.md and the code of the pr carefully, I think the essential difference is the remove of SC fence in unlink/push_bag.

@tomtomjhj
Copy link
Member

tomtomjhj commented Mar 16, 2022

Hi, sorry for late reply.

The essential difference between E+2 and E+3 is that the epoch consensus rule (concurrent epochs may differ by at most 1) doesn't hold in E+2.

Note that in pin, there can be some delay between loading the global epoch (loading global epoch is essentially an optimization for checking all the other thread's local epochs) and storing the local epoch. During this interval, the global epoch can increase multiple times without considering the thread currently being pinned, resulting in 'local epoch < global epoch - 1'. Therefore, if retire tags the garbage with the local epoch, the garbage might be considered immediately expired. E+2 fixes this issue by tagging garbage with global epoch (and SC fence).

On the other hand, pin in E+3 checks that the stored local epoch is not stale (note that this is quite similar to the validation loop in hazard pointers). This enforces the epoch consensus rule. So retire can tag the garbage with the local epoch instead of the global epoch, and no additional synchronization is needed.

The advantage of E+3 is simplicity. As you can see from the slide, its correctness is very intuitive. On the other hand, correctness proof for E+2 needs a bit more involved reasoning as described in Jeehoon's RFC. However, E+3's simplicity comes at the cost of making pin no longer wait-free due to the validation loop.

Then why revert E+3? It caused random segfaults in CI which IIRC weren't reproducible on our machines, and we couldn't figure out why.

@tomtomjhj
Copy link
Member

tomtomjhj commented Mar 16, 2022

oh actually it's reproducible in my laptop (Intel). It seems it's not reproducible only in AMD machines.

@tomtomjhj
Copy link
Member

tomtomjhj commented Mar 16, 2022

tomtomjhj/crossbeam@4522ab0 seems to fix the issue.

@jeehoonkang
Copy link
Member

@tomtomjhj would you please upstream the change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants