Skip to content

Confirm runtime events ring is still active after callback#13522

Merged
sadiqj merged 1 commit intoocaml:trunkfrom
kayceesrk:recheck_ring_status
Oct 5, 2024
Merged

Confirm runtime events ring is still active after callback#13522
sadiqj merged 1 commit intoocaml:trunkfrom
kayceesrk:recheck_ring_status

Conversation

@kayceesrk
Copy link
Contributor

@kayceesrk kayceesrk commented Oct 4, 2024

We need to check whether the ring is still active after a callback. It may be that the ring was destroyed during the callback execution. If this were the case, the subsequent call to write_to_ring would crash.

@kayceesrk kayceesrk changed the title Check ring status after callback. Check ring status after callback Oct 4, 2024
@kayceesrk kayceesrk added the run-thread-sanitizer Makes the CI run the testsuite with TSAN enabled label Oct 4, 2024
@sadiqj sadiqj self-assigned this Oct 4, 2024
Copy link
Contributor

@sadiqj sadiqj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find and the fix seems correct, thanks!

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Thanks for looking into this.

@kayceesrk kayceesrk changed the title Check ring status after callback Confirm runtime events ring is still active after callback Oct 4, 2024
@kayceesrk kayceesrk force-pushed the recheck_ring_status branch from 3cf68f3 to 370b052 Compare October 4, 2024 09:45
@gasche
Copy link
Member

gasche commented Oct 4, 2024

My understanding:

  1. It's a bit weird to silently stop at this point, we called a user-provided feedback and we silently ignore its result. But there is no clearly better choice. (In particular, failing with an exception would not be very nice either, because it would be an asynchronous exception that is not due to the emitted event itself.)
  2. An alternative would be to reference-count the number of people planning to write to the ring, and wait until that count drops to 0 before actually closing the ring. But this is more work to implement, and it makes the programming model weird -- when someone asks to close/shutdown the runtime-events machinery, they get a return but the ring may remain written to for an arbitrarily long time, possibly forever if a writing thread is deadlocked.

@kayceesrk
Copy link
Contributor Author

kayceesrk commented Oct 4, 2024

Agree with your assessment. I would expect the guard added in this PR to only trigger rarely in racy programs where some domain is writing to the ring while another domain is actively terminating the program. Given that the program is terminating, it is better to take the simpler route of dropping the write. Tools like perfetto can already deal with "missing" events, such as having a span open event with a corresponding span close event.

@kayceesrk
Copy link
Contributor Author

The data race in test_dropped_events is fixed in #13529.

@kayceesrk
Copy link
Contributor Author

@gasche do you have any objections to merging this PR?

Need to check whether the ring is still active after a callback. It may
be possible that ring has been destroyed during the execution of the
callback.
@kayceesrk kayceesrk force-pushed the recheck_ring_status branch from 370b052 to c298125 Compare October 5, 2024 10:28
@gasche
Copy link
Member

gasche commented Oct 5, 2024

None. (I haven't looked at this enough to have formed an opinion, and I'm happy to trust the process :-)

@sadiqj sadiqj merged commit bf63f88 into ocaml:trunk Oct 5, 2024
@sadiqj
Copy link
Contributor

sadiqj commented Oct 5, 2024

Great. Merging! Thanks @kayceesrk

dra27 pushed a commit that referenced this pull request Oct 14, 2024
Need to check whether the ring is still active after a callback. It may
be possible that ring has been destroyed during the execution of the
callback.

(cherry picked from commit bf63f88)
@dra27
Copy link
Member

dra27 commented Oct 14, 2024

Cherry-picked to 5.3 in ebc3c13. Is this wanted for 5.2.1 as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-thread-sanitizer Makes the CI run the testsuite with TSAN enabled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants