Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make (nat)dynlink sound (also fixes MPR#4208, MPR#4229, MPR#4839, MPR#6462, MPR#6957, MPR#6950) #1063

Merged
merged 7 commits into from Nov 15, 2018

Conversation

mshinwell
Copy link
Contributor

@mshinwell mshinwell commented Feb 24, 2017

This patch forbids dynlinking of compilation units that have already been loaded (either as part of the main program or via a previous dynlink). This appears to be sufficient to solve the problems described in the five Mantis issues in the title of this pull request.

In bytecode, it was previously possible to break type abstraction, since there is no implementation CRC for bytecode. In native, it was also possible to produce segmentation faults, via various means.

It has previously been suggested that platform-dependent solutions such as RTLD_DEEPBIND on glibc systems might be sufficient to solve these problems. @chambart and I discussed this and it appears not to be the case: for example suppose there is some module N that generates fresh names. A depends on N in the main program. We dynlink B which also depends on N. Further, B passes values made in N to A. Whilst under RTLD_DEEPBIND the code of B successfully references its own copy of N, it passes values to A that could break invariants of A's copy of N if they were to flow into there---and they might. It seems safer to just be conservative.

@chambart and I discussed whether this new restriction would be problematic for authors of plugin systems that might end up loading modules more than once. However we believe these problems can be solved by such systems using their own caching layer that can catch the new error value returned (Dynlink.Error(Module_already_loaded ...)).

A further improvement would be able to effectively incorporate such caching into the dynlink system itself such that, for example, if a module had already been dynlinked then some handle to that module could be returned. Such a handle would naturally be a corresponding first class module. Some of this work is already in GPR #100, although that does not include the protections introduced by this pull request. @chambart will follow up about that.

It is possible that there remains one problematic situation: a module that tries to load itself whilst it is being initialised. @chambart is checking this now.

@lpw25
Copy link
Contributor

lpw25 commented Feb 24, 2017

I think that RTLD_DEEPBIND could still be useful for the loadfile_private case. Your problematic example relies on there being a module A in the non-private linked modules which has N in its interface dependencies. This can be avoided with the rules:

  • A module can only be linked publically if there is no other module with that name already linked publically

  • A module can only be linked privately if there is no other module with that name linked publically, or mentioned in the interface dependencies of a module linked publically.

This would be a useful rule to support (on platforms with RTLD_DEEPBIND or equivalent) since it allows you to load plugins without having to worry if they conflict with each other -- only whether they conflict with the main program.

@mshinwell
Copy link
Contributor Author

After looking at MPR#4208 it looks like the dynlink.ml code needs a bit more work, e.g. due to prohibit. It's also not immediately obvious whether loadfile_private is going to be safe at present even after MPR#4208 is fixed. The easiest solution seems like a separate table of all modules ever loaded, which I'll investigate in due course.

@mshinwell
Copy link
Contributor Author

@lpw25 Yes, perhaps the rules could be refined to take into account the private case. However it isn't clear to me whether it's worth spending the time to implement and test that, especially if it only works on a subset of platforms. At the least I think we should probably get the basic fix in first before relaxing it, given how long some of these issues have been open. Other opinions on the benefit of a more precise treatment would be welcome.

@xavierleroy
Copy link
Contributor

Thanks for getting the ball rolling on this long-standing issue. Maybe @maximedenes and the Coq dev team wish to follow the discussion, as Coq is a heavy users of plugins that don't respect any naming discipline :-) and will be broken by the proposed solution.

@maximedenes
Copy link

@xavierleroy thanks for drawing my attention to this patch. I don't expect so much trouble because nowadays, Coq plugins are packed (as in -pack). Not sure what happens when we play the same Require multiple times, though. I'll do a quick test and give feedback.

@mshinwell
Copy link
Contributor Author

@maximedenes Do you use natdynlink or the bytecode version? If testing, it's probably best to use the native version with this patch at the moment, since the bytecode version may need a few tweaks (though it should approximately work).

@maximedenes
Copy link

What is the easiest way to get camlp4/camlp5 compiled for ocaml with this patch? It is required for testing Coq plugins.

@ghost
Copy link

ghost commented Mar 6, 2017

@maximedenes, for camlp4 I think the easiest way it is build it by hand from the git repository

@mshinwell
Copy link
Contributor Author

@maximedenes Has there been any progress on testing this patch?

@maximedenes
Copy link

Oh! I thought I had sent a partial report, sorry.

I could only devote a few minutes to it, but I can already say that this patch breaks the compilation of Coq's stdlib. It is surprising, but could be explained if the behavior of some static initializers (code like let _ = register... at toplevel in dynamically loaded units) has changed in some way.

I can investigate more and report next week.

@mshinwell
Copy link
Contributor Author

@maximedenes Was that when running bytecode or native code?

@maximedenes
Copy link

I tested a bit more, and the incompatibility I was mentioning is in fact already present in the base commit of this PR (sorry, I should have checked earlier). So not directly related to this PR, but it prevents me from testing properly. I'll try with 4.05 beta 3 and a recent 4.06 to see if rebasing this PR could help.

@maximedenes
Copy link

@mshinwell I was running native code.

The failure is in fact unrelated, so I'll try again with a rebased version of this PR.

So far, testing did reveal that we link twice a camlp5 module, but it is easy to fix on Coq's side.

@maximedenes
Copy link

Ok, so after rebasing, I confirm that modulo a small fix, Coq compiles fine with this PR, and plugins are usable. I tested only native code.

@mshinwell
Copy link
Contributor Author

OK, that's good news, thanks. I need to do some more careful work on the bytecode part of this PR, but will try to do that soon, and then it can be considered further.

@chambart
Copy link
Contributor

chambart commented May 2, 2017

I did check various self loading situations and couldn't come up with any failing one. Not that there isn't any possibility, but this would certainly not appear by surprise.

chambart
chambart previously approved these changes May 2, 2017
Copy link
Contributor

@chambart chambart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This effectively checks the presence of previously loaded modules. A test exercising it might be useful. I can provide it.

@mshinwell
Copy link
Contributor Author

I actually noticed a bug in this and the bytecode version needed some more work (as per previous conversation with @chambart ). Whilst looking at that the large amount of duplicated code made me think that we can probably make this more robust with a bit more refactoring and sharing of code, which I'm working on now. I will post here again once it's ready.

@mshinwell
Copy link
Contributor Author

@alainfrisch In the existing natdynlink.ml file there is a conditional which circumvents various checks if the name of a compilation unit being loaded happens to match the name of a predefined exception. Leo and I experimented with this and couldn't find why this restriction, which appears to introduce an unsoundness, would be required. Could you explain? (As far as I can tell from git this is your code.)

@mshinwell mshinwell changed the title Forbid dynlinking of previously-loaded units (MPR#4229, MPR#4839, MPR#6462, MPR#6957, MPR#6950) Make (nat)dynlink sound (also fixes MPR#4229, MPR#4839, MPR#6462, MPR#6957, MPR#6950) May 2, 2017
@mshinwell
Copy link
Contributor Author

@chambart Would you be able to take a look at this version? This now passes the test suite although I need to do a pass of review myself. Adding the test case you suggest would be appreciated.

This pull request refactors the dynlink library by introducing a new Dynlink_common module. The implementation of this module is based on the old natdynlink implementation; in particular it uses maps rather than hashtables, which should solve MPR#4208. It has checks which should prevent a module being dynlinked if it is already loaded, which should solve the various other Mantis issues listed above. It does not contain the checks in the old natdynlink.ml concerning the names of exceptions, since they appear to be unsound.

This patch also removes the deprecated functions in the interface and the digest_interface internal function (I grepped OPAM for real uses of the latter and didn't find any). I haven't surveyed usage of the deprecated functions, but given that some of them appear to have been deprecated I think it was 15 years ago and others deprecated about 9 years ago, they seem ripe for removal.

I hope that overall this produces a significant increase in robustness and maintainability for this library.

I had a long struggle with the various makefiles and the requirement that programs using dynlink.cma don't need compilerlibs too. Care is thus taken to ensure that dynlink_common.ml only depends on the stdlib. The dynlink code is built no fewer than three times (in otherlibs/dynlink/, in driver/ and in debugger/)!

I would like to tidy the formatting of these files, which is currently rather inconsistent; this can be done as a separate changeset after the main review is done.

The diff for this PR is probably best read using patdiff.

@mshinwell mshinwell changed the title Make (nat)dynlink sound (also fixes MPR#4229, MPR#4839, MPR#6462, MPR#6957, MPR#6950) Make (nat)dynlink sound (also fixes MPR#4208, MPR#4229, MPR#4839, MPR#6462, MPR#6957, MPR#6950) May 5, 2017
@alainfrisch
Copy link
Contributor

In the existing natdynlink.ml file there is a conditional which circumvents various checks if the name of a compilation unit being loaded happens to match the name of a predefined exception.

Probably some left over of a work-around required at that time, where builtin exception global identifiers were recorded in ui_imports_cmx.

FTR, this was introduced in:

http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/branches/natdynlink/otherlibs/dynlink/natdynlink.ml?r1=7893&r2=7894&pathrev=8477&

apparently to make Camlp4 compile. This can probably go away now.

(On a related note: https://caml.inria.fr/mantis/view.php?id=3829 )

@mshinwell
Copy link
Contributor Author

MPR#3829 appears fixed, so I've closed it. We also experimented the other day with various scenarios involving compilation units with the same names as predefined exceptions and couldn't get anything to fail.

@alainfrisch
Copy link
Contributor

MPR#3829 is actually not fixed, I've reopened and added a slightly modified example.

@stedolan
Copy link
Contributor

It has previously been suggested that platform-dependent solutions such as RTLD_DEEPBIND on glibc systems might be sufficient to solve these problems. @chambart and I discussed this and it appears not to be the case: for example suppose there is some module N that generates fresh names. A depends on N in the main program. We dynlink B which also depends on N. Further, B passes values made in N to A. Whilst under RTLD_DEEPBIND the code of B successfully references its own copy of N, it passes values to A that could break invariants of A's copy of N if they were to flow into there---and they might. It seems safer to just be conservative.

This explains why it is wrong for a dynlinked plugin to contain a copy of a module that is part of the main program. What about the case when two dynlinked plugins both contain a copy of the same module, which the main program does not contain? I would expect this to work when I load the plugins via Dynlink.loadfile_private - that seems to be what this function is for.

Does this patch support such usage?

@mshinwell
Copy link
Contributor Author

I've noticed that an important piece of this patch appears to be missing; I will fix that and then think further about @stedolan 's comment. (I think the answer is that we could support such uses so long as the platform supports RTLD_DEEPBIND.)

@stedolan
Copy link
Contributor

I think the answer is that we could support such uses so long as the platform supports RTLD_DEEPBIND.

I don't understand why RTLD_DEEPBIND is needed for the usage I described. Why isn't the standard flag RTLD_LOCAL (which is the default behaviour of dlopen, and currently used by loadfile_private) sufficient?

@mshinwell
Copy link
Contributor Author

@dra27 I've fixed .gitignore, good thinking. Hopefully this will now pass CI.

@nojb nojb merged commit fc23887 into ocaml:trunk Nov 15, 2018
@nojb
Copy link
Contributor

nojb commented Nov 15, 2018

Merged, thanks everyone!

@ejgallego
Copy link

Dear Ocaml devs, see https://caml.inria.fr/mantis/view.php?id=7876 which could be related to this PR.

@damiendoligez
Copy link
Member

Looks like this PR broke the cygwin64 CI (which currently does not support dynlink). See https://ci.inria.fr/ocaml/job/main/flambda=false,label=ocaml-cygwin-64/833/console

You can reproduce the bug on other architectures by configuring with -no-shared-libs.

AFAICT it's simply that the nodynlink.ml file is out of date wrt the rest of the patch.

cc @mshinwell @lpw25 @nojb @chambart

@mshinwell
Copy link
Contributor Author

Ack

@mshinwell
Copy link
Contributor Author

@damiendoligez Please see #2197

@garrigue
Copy link
Contributor

This indeed did break old versions of Coq, including 8.9.1: #8868.

@gadmm
Copy link
Contributor

gadmm commented Oct 18, 2019

I would like to offer a small nitpick and advice about backwards-compatibility.

This commit removed the deprecated function [Dynlink.init]. It was commented as deprecated since 2008, however there had been no warning at compilation. The result is unexplained breakage of legacy code with no documented evolution path (see ppedrot/ocaml-melt#1). The adverse effect is that although the solution is trivial (remove the call), this results in disproportionate time spent by the non-author user looking up for a solution, reading git commits and github PRs, to find what is the appropriate action. Moreover, it seems that the issue was properly caught by the test lib-dynlink-csharp, but not understood as such.

As a suggested alternative, it would have been possible to have deprecation warnings in place in 2008 with suggested evolution path, or, even simpler for everybody, keep providing init with the current PR with a dummy implementation. Indeed, AFAIU there was no imperative to break legacy code.

(cc @tabareau)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet