New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing GC root registrations in runtime/io.c #12445
Conversation
I think the changes are good, but I have a couple of nits to pick :-) The comment
would be more accurate. Also, after rooting through several English dictionaries, I still don't root for your use of "to root value arguments" to mean "to register value arguments as memory roots for the GC". Perhaps that's because the practice of verbing nouns is not deeply rooted in my native language. But the only meaning I know so far for "to root something" is to install a rootkit on something or more generally to elevate one's privileges to the "root" user level on something. |
Thanks for the nitpick! I had to search a bit for a comment I liked myself, so I am happy to discuss this further.
Yes, that was my intent.
Counter-nitpick: in OCaml 5 some parts of the GC may always be running concurrently in other domains. I think that a fully precise comment would be as follows:
"Giving control" to the GC can happen by a direct call or indirectly by releasing the runtime lock. I thought that "call" was an acceptable substitute for "giving control" (or "transferring control") in this broad sense. What would you think of the following?
|
On a different note: I was surprised to find so many subtle bugs in one file of the runtime -- I started auditing other parts of the runtime and didn't find any issue; what is different in the production context of this one file that explains the somewhat high defect rate? I found the answer: the functions that are buggy here are all functions in which locking operations were added just before the 5.0 release, namely in #11171. There probably aren't rampant root-registration issues in the runtime libraries, but maybe @Octachron and @OlivierNicole need a crash course on being paranoid about the GC. |
After thinking more about this, I decided to remove the comments/justifications on why certain functions do not register their value arguments as roots. I suspect that it would take a lot more work to converge towards a common vocabulary for what is (un)safe. I am happy to send the comments as a separate PR if there is interest, but for now I reduced the current PR to contain only the extra root registrations. |
I pushed a Changes entry for the 5.1 section because I think that the fixes are safe and worth having in 5.1 as well. For example, those issues could show up in automated property-based testing. They may also be the cause of some of the flaky CI failures we have been observing: @damiendoligez mentioned that an intext_par failure.ml was fixed by #12436. cc @Octachron |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, modulo one change that may not be necessary. I'll let @Octachron decide how and when to merge this.
runtime/io.c
Outdated
CAMLparam1 (vchannel); | ||
struct channel * channel = Channel(vchannel); | ||
int num = caml_num_rows_fd(channel->fd); | ||
CAMLreturn (Val_int(num)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the original code is safe and this change is not necessary. "But it doesn't hurt!" you're saying? I prefer bug-fix commit that contain only code strictly necessary to fix the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I removed the root-registration here.
(caml_num_rows_fd
does not currently call the GC in either its Unix and Windows versions. This is not documented and might change in the future, so I thought that it was more robust to root anyway.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(half-written comment deleted. The changes are clearly correct.)
As @gasche points out below: I misread the code; apologies! |
I believe that the fact that I had missed the adding a I agree that it is better to merge those fixes in 5.1 to decrease the number of runtime bugs in 5.1.0 . |
Apologies on introducing the bugs, I hadn’t realized at the time that the |
I think that what would really help is a discipline to distinguish functions that may transfer functions to the GC, and those that do not. Currently one has to transitively unfold definitions to tell if a function is in one category or another; this should be property that can be more easily checked. I have an idea for a discipline that could work, which is to systematically use |
Each added root corresponds to a bug in the previous code, except for `caml_terminfo_rows` which was safe for an arguably-fragile reason.
Thanks @xavierleroy and @yallop for the reviews. I will go ahead and merge (also in 5.1) once the CI is happy again. |
@yallop I don't see the issue with |
Missing GC root registrations in runtime/io.c (cherry picked from commit f272e37)
#12436 led me to realize that most of the code of io.c is subtly wrong because it does not root its value arguments when it should. (This was already broken for OCaml 4.x.)
In this PR I decided to systematically review all functions, root arguments if necessary, and add an explicit justification comment for any function with unrooted value arguments or temporaries.
(cc @damiendoligez @yallop)
(This PR supersedes #12436.)