Join GitHub today
API changes for multicore #1003
The multicore branch requires several changes to OCaml's C API, affecting those functions which expose raw pointers into mutable parts the OCaml heap (which is not a safe thing to do on multicore).
While multicore can't support the current OCaml API, it's quite easy for trunk OCaml to support the multicore API. This patch adds support for the multicore API (while maintaining support for the existing one), so that we can begin porting libraries that use C stubs to work on multicore without breaking compatibility with trunk.
The most invasive changes are to field access, since multicore does not support the
The other API changes are to global roots and values registered for callbacks. The new API for global roots uses an opaque type
The second commit in this patch gives an example of the changes necessary in C bindings. I picked
By default, both the current API and the multicore API are exposed, since there are no conflicting names between them. To check for compatibility with multicore and avoid use of deprecated features, this patch introduces the macro
Only two levels of
Projects with C bindings can define
Eventually, I'd like to see the OCaml headers print a warning if
This patch shouldn't be merged until I've updated the "Interfacing with C" section of the manual, but I'd like to get feedback about the proposed design beforehand.
referenced this pull request
Jan 11, 2017
@stedolan, have you benchmarked the cost of TLS variable access? The multicore branch makes heavy use of repeated TLS access which can have non-negligible costs especially for dynamic linking (see here), and I wonder if it might be worthwhile changing the API even further to have a context TLS pointer as the first argument of C functions instead.
I think that LTS access is a discussion more relevant to the multicore runtime than to the multicore API, which is what is most appropriate to discuss here. The proposed changes do not introduced any difference in the runtime that would be performance-sensitive, or at least nothing as radical as changing where globals are stored. (We may wonder whether moving from macros to functions in some case affects performance, and that question is fully relevant to this PR, but I would expect the answer to be that it does not matter under
Here's where it pertains to the API: if Thread-Local Storage access is slow - and I'm not sure it is, though it certainly is on OSX where TLS doesn't exist for dynamic linking, and I've found articles claiming that it's slow in general - then it might make sense to pass a pointer to the context of the thread for functions that use thread-local stuff (basically all GC functionality), much as object method calls do, and for user C code to keep that thread context around.
This discussion is probably better started from the multicore runtime branch, but I've had little luck getting any feedback there, and since I saw the API changes proposal, I thought I'd bring it up, and anyone with more empirical knowledge than me about TLS performance can weigh in.
@bluddy It requires a function call per access from a function in a shared library that uses TLS. So it would be much cheaper for user C code to keep the context around. This could be avoided if dynamic linkers patched each library with the address of the TLS, but they don't.
However, not all C functions need this context. Only the ones that interact with the runtime do. There are many cases where one just wants to call a library function with. Having to write stubs is just annoying. Yes, it can be automated (and
I propose that we make a Foreign Function Interface (that doesn't require the user to write stubs) an official part of OCaml – it, by its very nature, is 100% immune to changes like this. That will require that #724 or equivalent be merged. This FFI could be based on ctypes (not a bad choice in my opinion!), but will hopefully benefit from additional optimizations in the compiler, such as avoiding the stub for any function that doesn't take a struct by value.
@bluddy @DemiMarie Thanks for the thoughts on TLS. The multicore runtime is already actively moving towards a scheme where we do not depend on thread-local variables. Instead, we use a domain-local storage located at the end of minor heap area. This domain-local storage is obtained by suitably bit twiddling the allocation pointer (which is available in a register). Many runtime functions now explicitly accept the domain state as an argument. Eventually all uses of thread-local variables will be removed from the multicore runtime. For continuing the discussion, I've created an issue in multicore: ocamllabs/ocaml-multicore#84. Please continue the discussion there.
Let us keep the discussion in this thread focussed on the C API change.
On Linux, TLS accesses are about as fast as plain globals in executables and statically linked libraries (PIC or non-PIC code), but about 2x slower than plain globals in shared libraries. Note that most OCaml programs statically link against the OCaml runtime.
On OS X, I believe pthread_getspecific is implemented the same way as Linux's executable TLS, so should be just as fast (On Linux, pthread_getspecific is slightly slower than TLS).
Incidentally, this won't affect the speed of OCaml code: the multicore runtime does not use platform TLS to access domain state while running OCaml code. This is only relevant for C bindings (even then, I can't imagine many C bindings spending a significant fraction of their time looking up globals in the runtime).
It might be better if OCaml had always passed around a context pointer, but there's no compelling performance case to add one now, and it would break a lot of code.
referenced this pull request
Jan 18, 2017
added a commit
this pull request
Jan 20, 2017
In principle I support the change, and I think that we should get it in a released version as soon as possible, to give as much time as possible for possible to update their code (and get as much backward-compatibility as possible when they do). I looked at the patch, but I am not qualified to review it fully, so I'd rather leave that to @mshinwell which has been doing an excellent job at it so far. The point on reading class and oid fields would deserve elaboration, I think.
I think two things need doing to this:
That might be everything?
@damiendoligez Are you happy with this in its present form? It would be nice to get this merged soon.
referenced this pull request
Jun 8, 2017
I've added a Changes entry, but ran into a problem with the Int_val change. The
So, a change would have to be guarded by
After a brief discussion with @mshinwell, we noticed another problem. This piece of code is wrong:
because on Win64,
(This is getting trickier than I'd thought, and might be best discussed in its own pull request).
Over at a Discuss thread, @rwmjones made the sensible suggestion that the part of this patch that defines
Would someone be motivated propose a separate PR, for example with just the levels