Skip to content

Conversation

lefessan
Copy link
Contributor

@lefessan lefessan commented Jul 6, 2016

Add the ability to link C plugins into the runtime, which can have many purposes, with a virtualization layer on file-system calls (to be able to monitor/intercept them also for many purposes). Add an option -fPIC to ./configure to the default runtime with -fPIC.

@mshinwell
Copy link
Contributor

Building the runtime with -fPIC by default probably incurs a performance penalty (as does building OCaml code with PIC by default on x86-64). I tend to think we shouldn't do it in the longer term, instead favouring a proper cross-compilation solution as is starting to evolve in #620 and #634. In the short term maybe it would be reasonable to make PIC the default, but the penalty should be measured.

#define CAML_CPLUGINS_CHDIR 6
#define CAML_CPLUGINS_GETENV 7
#define CAML_CPLUGINS_SYSTEM 8
#define CAML_CPLUGINS_READ_DIRECTORY 9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be wise to have some sort of CAML_CPLUGINS_LAST_PRIM definition giving the highest primitive number? This would allow users to distinguish unexisting primitives from primitives that they don't support but exists in the current OCaml versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it should be a value, not a macro, for a plugin to be able to access it at runtime. Or maybe passed as a parameter to the plugin init function (I am thinking about passing a record to the plugin with a variable number of fields depending on the version of OCaml).

@gasche
Copy link
Member

gasche commented Jul 6, 2016

I did a quick review (not in depth) of the patch and the code seems ok.

I'm a bit surprised by the idea of passing plugins by an environment variable. It would seem natural to use a command-line parameter instead -- wasn't the OCAMLRUNPARAM hack introduced for exactly the same reason that make you use an environment variable here?

@lefessan
Copy link
Contributor Author

lefessan commented Jul 6, 2016

@gasche: The idea here is to use C plugins on a set of programs (for example, to watch a complete build without modifying it). For that, the env. variable is the best. I don't think using OCAMLRUNPARAM for that is a good idea, as it will make the use of C plugins more complex.

@mshinwell
Copy link
Contributor

@lefessan Can you comment as to why LD_PRELOAD does not suffice for the interception of C library calls?

@lefessan
Copy link
Contributor Author

lefessan commented Jul 6, 2016

@mshinwell Indeed, using -fPIC probably decreases performances, but the slowdown is normally rather small, most people won't care about it, and people would care can easily disable cplugins to remove the default fPIC. I think the overall benefit of having the full power of dynamic linking is greater for most people (it allows them to bundle OCaml programs as dynamically linked libraries into foreign applications).

@lefessan
Copy link
Contributor Author

lefessan commented Jul 6, 2016

@mshinwell LD_PRELOAD is not portable. Moreover, the patch only intercepts calls done from OCaml (i.e. Sys.* functions), not the other ones done by the runtime for other reasons.

@alainfrisch
Copy link
Contributor

Since http://caml.inria.fr/mantis/view.php?id=6693 , an fPIC variant of the native runtime is built. Is it easy enough to use it in practice?

It would be good to actually measure the overhead of fPIC in the runtime system. It it is really small (and my guess is that it is), it would simplify the life of users and speed-up the compilation of OCaml itself to stop supporting the non fPIC runtime. (One could also make fPIC the default and let people disable it at configure time, or build a non-fPIC variant.)

@lefessan
Copy link
Contributor Author

lefessan commented Jul 6, 2016

@alainfrisch
In the configure script of this PR:

  • --no-cplugins will disable both C plugins and the use of -fPIC
  • --fPIC will re-add -fPIC when C plugins have been disabled

If -fPIC should always be the default, then I can remove --fPIC and replace it by --no-fPIC

@mshinwell
Copy link
Contributor

@alainfrisch Even if it's 1 or 2%, I don't think we should be deprecating the support for non-PIC. (Bear in mind that even with all the work on flambda, we're still only getting 10% improvement for some software at the moment, so 2% is fairly significant.) In my view the right answer is to support various combinations properly as per #620.

@lefessan Do forgive me, but your previous answer was sphinx-like. Can you go into more detail about the motivation behind this patch? What are the "other reasons"? It isn't clear to me exactly which calls need to go through this mechanism and which do not, and surely that needs to be pinned down if the division is to be sensibly maintained into the future.

@alainfrisch
Copy link
Contributor

I'm worried by this idea of building all possible variants upfront. Just for the runtime system, we'll have in a few months: debugging, frame-pointers, fPIC, spacetime, afl, multicore. So already 64 versions of asmrun/ to compile/install for every build of OCaml?

At some point, there is a tradeoff between the runtime performance and its cost in terms of complexity/performance in the code base, build system, and user experience. (If only because making the developers experience smoother gives them more time to profile/optimize their code.)

I believe that 2% is insignificant for the vast majority of users. I don't see the point in comparing this with the gains obtained by flambda: making flambda better or worse does not make these 2% more or less significant.

Anyway, doing some actual benchmarks would be useful to drive the discussion (0.5% is not the same as 4%, for instance).

@lefessan
Copy link
Contributor Author

lefessan commented Jul 7, 2016

Again, the solution is probably in OPAM and having different switches, the only problem is to find a nice way to pass options to the configure script of a new OPAM switch.

@mshinwell I mean that, for example, getenv can be called using Sys.getenv, but is also used by the runtime (OCAMLRUNPARAM, etc.). It's the same for opening a file, that the runtime might do for its own reasons (saving runtime stats, etc.). The idea is to monitor what the OCaml program (i.e. the OCaml source) is doing, not what the runtime is doing. Monitoring/virtualizing the full runtime might also be done in the future (we could add a flag for that), but right now, I have no usage for it, so I didn't implement it.

@mshinwell
Copy link
Contributor

mshinwell commented Jul 7, 2016

@alainfrisch You don't necessarily need to build all (or even any) combinations. The aim is to provide a proper framework that enables people to build what they need, whilst at the same time eliminating special cases (as we have at the moment for PIC, gprof, etc). Ideally it would work in some modular way, so if a user finds a requirement for a different set of flags later, just that portion can be built without having to rebuild the whole OCaml system again.

It's worth noting that the proposed functionality is pretty close to what GCC has provided for many years in the form of multilib support.

@mshinwell
Copy link
Contributor

@lefessan If you're specifically concerned with the OCaml code, have you considered instrumenting the "external" calls themselves? For example, some kind of attribute could be added that indicates the call should be redirected via a wrapper if it is present. It's maybe slightly less fine-grained, but perhaps that doesn't matter, and it might be more straightforward overall.

Also, what happens if the OCaml program includes its own C bindings? Should they be instrumented too? (I presume that would need LD_PRELOAD or similar.)

@shindere
Copy link
Contributor

shindere commented Jul 7, 2016 via email

@lefessan
Copy link
Contributor Author

lefessan commented Jul 7, 2016

@mshinwell Yes, I considered intercepting externals, but I had the feeling that it would require more work: for example, stat might be used by multiple OCaml functions (to check if a file is a directory, to check its size, etc.). So, I would have to write wrappers for all those OCaml externals, whereas with the current solution, I can just have a wrapper for stat. Moreover, at the external level, I would have to work with OCaml values, i.e. deal with the GC, whereas at the system call level, it's much simpler.

I had also considered virtualizing the file-system at the OCaml level directly: for example, have a record with all file-system related externals, that would be used by all Pervasives and other modules functions. Then, the user would be able to take the current record, and replace it by its own record. However, again, there are problems with some externals, for example output_value does the writes in C, not in OCaml, so the record would contain many more functions that it should...

@mshinwell
Copy link
Contributor

@shindere Not for the default I wouldn't have thought. A less frequent check of all of them would seem fine.

@mshinwell
Copy link
Contributor

@lefessan Is recording which externals get called sufficient, though (i.e. record Sys.foo rather than the C library calls that "foo" uses)? Again I'm not exactly sure of the target application.

I missed a point earlier: an OPAM solution involving configure options is not sufficient. I think I covered that on the other GPR.

@lefessan
Copy link
Contributor Author

lefessan commented Jul 7, 2016

@mshinwell No, it's not just about recording the call (and we don't really care about having the backtrace or just the caller), it's also about intercepting it. For example, we might want to implement a "replay" plugin: in "record" mode, it will monitor what an OCaml program is accessing, saving every file that is opened by the program somewhere, so that in "replay" mode, it will provide the previous files to the open, even if the real files have been modified since then. Such a plugin can be done easily by intercepting open at the C level, whereas it cannot be done easily at the external level.

@mshinwell
Copy link
Contributor

I see. I still don't really understand how it's supposed to work with libraries outside the stdlib though. Isn't it the case that many programs that do interesting I/O things will be doing them via some external library that isn't instrumented in the way you propose?

@avsm
Copy link
Member

avsm commented Jul 7, 2016

it the case that many programs that do interesting I/O things will be doing them via some external library that isn't instrumented in the way you propose

I'm also trying to understand how this might work (since the overall functionality being proposed is interesting, but the patches seem to be coming in piecemeal). As a concrete example, moby/vpnkit#69 replaces most of the I/O in Docker for Mac/Win with a libuv based implementation. Could these C calls also be intercepted with the proposed patch?

@lefessan
Copy link
Contributor Author

lefessan commented Jul 7, 2016

For now, my own use cases only need the primitives in the patch, but indeed, it might become interesting in the future to provide a way for libraries to extend the current mechanism for their own stubs.

We will always find cases that the proposed mechanism cannot handle, but the idea for me is to provide a simple way to handle the majority of simple cases, and then let more complex cases be handled by more complex solutions, such as LD_PRELOAD, user-fs or linux namespaces, etc.

@lefessan
Copy link
Contributor Author

lefessan commented Jul 7, 2016

@avsm I am not sure if it would catch all the I/O that vpnkit does. It will only intercept standard I/O functions from the stdlib, not the ones done by the runtime (other getenv or reads of a bytecode file) nor the ones done by other libraries (it seems that vpnkit uses Lwt, whose I/O are not intercepted by this patch). So, probably, you will still need some work around Lwt at least.

@damiendoligez
Copy link
Member

@lefessan About PIC, you need to do the benchmarking that both @mshinwell and @alainfrisch have requested.

@lefessan
Copy link
Contributor Author

lefessan commented Jul 8, 2016

@damiendoligez I don't really understand why benchmarking is important here. This patch only changes the default, it does not prevent people for who performance is important to change their own settings to no-fpic, recovering the same performance as before.
The question is more (and it's related to a discussion on the caml-list): what is important for us, to make OCaml as powerful as possible by default (in which case we should turn all user-friendly settings on by default), or as fast as possible (in which case the current settings should not be changed) ? My opinion is that the default should be "as powerful/user-friendly as possible": the debugging flag (-g) should be on by default and extensibility should be maximized (through dynamic linking and -fPIC). Moreover, do we really care about 5% or 10% speed decrease for the standard developer, when computers have been getting faster and faster over the last twenty years, so that the edit-compile-run loop has never been so fast ?

@alainfrisch
Copy link
Contributor

Moreover, do we really care about 5% or 10% speed decrease for the standard developer

I'm confused. This is about runtime of compiled programs, not compilation time.

Each users will have a different sensitivity to runtime performance. In the next release, assuming -fPIC becomes the default, one will need to tell people that they can disable it "if they want". This will be much more user friendly if this information comes together with some indication of the slowdown to be expected if they don't.

But if the slowdown is really 5% or 10% (which I doubt), the discussion about making it the default becomes a bit different from the one if it is below 1%.

@lefessan
Copy link
Contributor Author

Running operf-micro on the two branches (left one is this PR, right one is standard trunk), I got the following results (on a remote server, with nobody else connected):

                                 ocaml-fp ocaml-st 
almabench.20                     1.00     1.00     
bdd.10                           1.00     1.05     
bigarray_rev.0                   1.00     1.00     
fft.4                            1.00     0.98     
fibonnaci.0                      1.00     0.93     
format.complicated.direct        1.00     0.91     
format.complicated.noop          1.00     1.12     
format.complicated.str           1.00     0.99     
format.complicated_empty.direct  1.00     0.97     
format.complicated_empty.noop    1.00     1.12     
format.complicated_empty.str     1.00     0.98     
format.simple.format             1.00     1.00     
format.simple.format_pp          1.00     1.00     
format.simple.format_pp_cont     1.00     1.00     
format.simple.printf             1.00     0.98     
format.simple.printf_cont        1.00     1.02     
format.simple_ignore.format      1.00     1.02     
format.simple_ignore.format_pp   1.00     1.03     
format.simple_ignore.format_pp_c 1.00     1.17     
format.simple_ignore.printf      1.00     1.03     
format.simple_ignore.printf_cont 1.00     1.13     
format.simple_ignore_ref.format  1.00     1.01     
format.simple_ignore_ref.format_ 1.00     1.03     
format.simple_ignore_ref.format_ 1.00     1.17     
format.simple_ignore_ref.printf  1.00     1.02     
format.simple_ignore_ref.printf_ 1.00     1.12     
format.simple_ref.format         1.00     0.99     
format.simple_ref.format_pp      1.00     1.00     
format.simple_ref.format_pp_cont 1.00     1.00     
format.simple_ref.printf         1.00     0.98     
format.simple_ref.printf_cont    1.00     1.02     
hamming                          1.00     1.01     
hamming.10                       1.00     1.00     
kahan_sum.kahan_sum.array_fold.0 1.00     1.00     
kahan_sum.kahan_sum.baseline.0   1.00     1.00     
lens.rect_area.baseline.-6140    1.00     1.03     
lens.rect_area.lens.-6140        1.00     0.99     
list.fold_left add.tail_rec.0    1.00     0.94     
list.fold_left add.while.0       1.00     1.00     
list.fold_left add.while_exn.0   1.00     0.99     
list.fold_left add_float.tail_re 1.00     0.94     
list.fold_left add_float.while.0 1.00     1.00     
list.fold_left add_float.while_e 1.00     0.99     
list.interval.direct.0           1.00     1.07     
list.interval.tail_rec.0         1.00     1.04     
list.interval.tail_rec_with_clos 1.00     1.07     
list.map succ.closure.0          1.00     1.03     
list.map succ.direct.0           1.00     0.99     
list.map succ.tail_rec.0         1.00     1.02     
list.rev.rec.0                   1.00     1.00     
list.rev.rev_while.0             1.00     1.00     
list.rev_map succ.rev_map_tail_r 1.00     1.00     
list.rev_map succ.rev_map_while  1.00     1.06     
nucleic                          1.00     0.99     
nullable_array.sum 0.01.nullable 1.00     1.01     
nullable_array.sum 0.01.nullable 1.00     1.01     
nullable_array.sum 0.01.option_a 1.00     1.00     
nullable_array.sum 0.30.nullable 1.00     1.17     
nullable_array.sum 0.30.nullable 1.00     1.04     
nullable_array.sum 0.30.option_a 1.00     0.97     
nullable_array.sum 1.00.nullable 1.00     1.08     
nullable_array.sum 1.00.nullable 1.00     1.11     
nullable_array.sum 1.00.option_a 1.00     1.00     
nullable_array.walk 0.01.nullabl 1.00     1.00     
nullable_array.walk 0.01.nullabl 1.00     0.99     
nullable_array.walk 0.01.nullabl 1.00     1.02     
nullable_array.walk 0.01.nullabl 1.00     0.98     
nullable_array.walk 0.01.option_ 1.00     0.98     
nullable_array.walk 0.01.option_ 1.00     0.98     
nullable_array.walk 0.30.nullabl 1.00     1.05     
nullable_array.walk 0.30.nullabl 1.00     0.98     
nullable_array.walk 0.30.nullabl 1.00     1.06     
nullable_array.walk 0.30.nullabl 1.00     0.99     
nullable_array.walk 0.30.option_ 1.00     1.00     
nullable_array.walk 0.30.option_ 1.00     0.99     
nullable_array.walk 1.00.nullabl 1.00     1.07     
nullable_array.walk 1.00.nullabl 1.00     0.99     
nullable_array.walk 1.00.nullabl 1.00     1.07     
nullable_array.walk 1.00.nullabl 1.00     0.99     
nullable_array.walk 1.00.option_ 1.00     1.01     
nullable_array.walk 1.00.option_ 1.00     0.99     
sequence.flat_map_fold.baseline. 1.00     0.99     
sequence.flat_map_fold.sequence. 1.00     0.96     
sequence.map_fold.baseline.10    1.00     1.09     
sequence.map_fold.sequence.10    1.00     0.96     
sieve                            1.00     0.98     
sieve.10                         1.00     0.99     
vector_functor.-6140             1.00     0.93     
vector_functor.vec2 record dot p 1.00     0.97     
vector_functor.vec2 record dot p 1.00     0.98     

Surprisingly, -fPIC does not always degrade performances (usually by less then 2%, worst is 7%), but sometimes it improves performances (17% on one of them). These are of course micro-benchmarks, since running macro-benchmarks would require much more time (most benchmarks are not working on trunk...).
Testing on ocamlopt.opt, I found a 4% slowdown when running with -fPIC.

@bluddy
Copy link
Contributor

bluddy commented Jul 12, 2016

@lefessan:

Moreover, do we really care about 5% or 10% speed decrease for the standard developer, when computers have been getting faster and faster over the last twenty years, so that the edit-compile-run loop has never been so fast ?

I just want to point out that we are mostly at the end of that era, barring a very surprising new technological development. The technology that allowed this speed gain over the last 20 years has matured, and it can no longer give us serious speed improvements. Thinking about speed is now becoming more important.

@lefessan
Copy link
Contributor Author

lefessan commented Jul 12, 2016

@bluddy I was more discussing the "default" settings of OCaml, i.e. what trade-off we should provide by default, for the newcomer, and I think the performance is good enough now, that we could degrade it a little for the benefit of other features, such as the extensibility of the system. This PR does not degrade the performance of OCaml when compiled without -fPIC.

I was also discussing yesterday with a time-traveler, who told me something new was coming, and that we shouldn't worry too much for speed improvements in the next 13 years. But I have to keep my mouth shut on it !

@bluddy
Copy link
Contributor

bluddy commented Jul 12, 2016

Yay!

I generally agree that PIC is the way to go, btw. It was a nitPIC.

@lefessan lefessan force-pushed the 2016-07-06-cplugins-and-fPIC branch from 63719ed to 006bc0c Compare July 12, 2016 17:10
@xavierleroy
Copy link
Contributor

@lefessan: of course there is a lot of noise in these measurements, because using the PIC runtime system changes code placement, impacting performance in a random manner.

My own quick benchmarking (on KB and my other favorite small benchmarks) is similarly noisy, but suggests that the PIC runtime degrades performance by about 2% on average.

This is for x86 64 bits. I'd expect more degradation for x86 32 bits and for PowerPC 32 bits, which lack hardware support for PC-relative addressing.

@xavierleroy
Copy link
Contributor

There is too much rethorics in this discussion. Trying to stick with facts:

  • There is a small but non-zero performance penalty with the PIC runtime.
  • It is very easy for end-users to select the PIC runtime at link-time: ocamlopt -runtime-variant _pic .... No recompilation of OCaml code and libraries is needed.
  • I don't see what the "working OCaml user" is gaining from having PIC runtime as the default. AFAIK a non-PIC runtime can still dynamically load plugins -- the ocamlrun bytecode interpreter does that all the time.

I'd sugges that @lefessan continues his experiments with plugins and virtualization using -runtime-variant _pic if needed, then comes back with a stronger user story.

@lefessan lefessan force-pushed the 2016-07-06-cplugins-and-fPIC branch from 006bc0c to 6346455 Compare July 13, 2016 11:46
@lefessan lefessan changed the title Add cplugins and compile with -fPIC by default Add cplugins and add a configure option -fPIC Jul 13, 2016
@lefessan lefessan force-pushed the 2016-07-06-cplugins-and-fPIC branch from 6346455 to 6a83bdd Compare July 13, 2016 11:56
@lefessan
Copy link
Contributor Author

Ok, I removed the -fPIC by default. I think I replied to all the requests, is there still some issues pending ?

CAMLextern int caml_read_directory(char * dirname, struct ext_table * contents);


#ifdef CAML_INTERNALS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it on purpose that definitions just above (caml_ext_table) are no longer protected by CAML_INTERNALS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a comment line 218 explaining that, because we intercept and must be able to call caml_read_directory.

@lefessan
Copy link
Contributor Author

The current proposal has been reviewed, and there is no more the objection about-fPIC since it is not activated by default anymore, so merging.

@lefessan lefessan merged commit fe96ec6 into ocaml:trunk Jul 17, 2016
@gasche
Copy link
Member

gasche commented Jul 17, 2016

I'm curious: has there been a discussion somewhere about whether we want this form of plugin interception, or for example another interface to extend the compiler capabilities? There has not been in this PR thread, but maybe in a developer meeting?

@lefessan
Copy link
Contributor Author

@gasche I am not sure I understand your question, but Mark asked if the same result could be achieved using annotations on externals, and I also suggested the use of an OCaml record to intercept calls to the file-system. However, both approaches would be much more complex to implement, since some C externals are actually performing the work of other externals (for example, output_value does both marshaling and writing to a file).

@avsm
Copy link
Member

avsm commented Jul 18, 2016

has there been a discussion somewhere about whether we want this form of plugin interception, or for example another interface to extend the compiler capabilities?

This plugin functionality doesn't seem particularly useful outside of a very narrow usecase involving code that exclusively uses the standard library, as far I can tell from the answer above to my and @mshinwell's query. Is the interface at least experimental, or are these plugins expected to be supported forever with their current interface?

camlspotter pushed a commit to camlspotter/ocaml that referenced this pull request Oct 17, 2017
Add cplugins and add a configure option `-fPIC`
@lefessan lefessan deleted the 2016-07-06-cplugins-and-fPIC branch January 24, 2021 16:11
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants