-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
-safe-string should be a global property #7113
Comments
Comment author: @alainfrisch This would basically amount to having a configure-time switch, since the way the stdlib is compiled would define whether strings are mutable or not. This is different from rectypes, where the constraint is only from a library to its clients. Here, we need the constraint in both directions. We should really push for actually dropping mutable strings altogether. Has anyone looked at how many OPAM packages still rely on mutable strings? |
Comment author: @gasche To clarify: the -rectypes suggestion is to store the safe-string status inside compiled interfaces (.cmi files). When you compile A which depends on B, you check that the current setting is the same as the one stored in b.cmi. However, in this we could consider a weaker semantics:
In particular, we would accept to compile a user module demanding -safe-string against a library module that did not use -safe-string. If we go this way, I don't think there is any risk of "cascade from stdlib", as long as we are careful to not store "safe-string" in the stdlib's cmi files. This semantics allows library authors to decide (locally for their library) to force their users to also use safe-string. Of course this is not as safe as Jeremy's suggestion: when library authors provide higher-order functions of the kind (string -> ...), they don't know whether user modules will respect their immutability expectations. Note that we could adopt this weaker semantics in a future OCaml version, and later strengthen it to the bidirectional implication (if user demands -safe-string, library must use -safe-string). Both steps of the transition will break code and require people to update either user code or library authors build scripts. In any case, I don't think it is reasonable to aim to perform this change in time for 4.03. Would 4.04+dev be a reasonable target? |
Comment author: @alainfrisch
I don't think this is the problem with the unidirectional constraint that you suggest, rather the opposite: a user module assuming safe-string for its internal operations would be at risk if it passes its strings to libraries compiled without safe-string. |
Comment author: @damiendoligez I think the plan was to make -safe-string the default in 4.04, with -unsafe-string an explicit option, then remove the -unsafe-string option in some later version (maybe 4.06). But maybe that's going too fast. As for this:
Or, if you force each library author to choose one side, we risk having everyone choose the -unsafe-string option (for backward compatibility or simply because it's the default) and then nobody ever moves to -safe-string. Another possibility would be a link-time option (-force-safe-string?) to ensure that all the modules that get linked in were compiled with -safe-string. Then you'd have your check without having to force it on everyone. |
Comment author: @yallop In fact, linking together a module that's compiled with -safe-string and a module that's compiled without -safe-string breaks type safety. Facts that are true in one module are false in the other module, and sending a proof of some fact into a module that relies on there not being such a proof ends in disaster. Here's an example. We'll start off with a module where $ cat a.ml let y : string t = Y and a module where $ cat b.ml let () = f A.y Here's what happens when the modules are linked together and the result is run: $ ocamlc -o crash a.cmo b.cmo Making -safe-string a global property would fix this, of course. But perhaps the problem can also be fixed by making bytes and string compatible in the -safe-string case. |
Comment author: @alainfrisch
I don't see how this could be done. Would one decide at configure-time if the stdlib is "safe-string" or not (and then all libraries would inherit from that, excluding library which are not "safe-string" ready when in this mode)?
It's probably a required step in the short term to restore type-safety. Have people investigated the "state of the migration to -safe-string"? How many OPAM packages have already been switched, how much effort is typically required to switch one package, etc? Perhaps one should put more pressure on the community to move to -safe-string? (e.g. by creating a dashboard of non-ready OPAM packages and sending regular reminders to their authors and on caml-list) |
Comment author: @gasche If you make bytes and string compatible, that hampers future ability to write GADTs handling them as distinct types. Or maybe we could revert that change by the time we either remove the -unsafe-string option or make -safe-string a global property. |
Comment author: @yallop
There are various options, none of which is absolutely ideal. Some kind of global aspect seems unavoidable, though. As long as the compiler treats string and bytes as equal in one module and incompatible in another it'll be possible to construct programs like the one above which crash at runtime. As far as I can see, the realistic options are
There are other possibilities, e.g. marking modules that don't rely on string/bytes compatibility or equality in some way, but they seem to complex to be worth considering. On another note, should we aim to fix this issue in 4.03? The fact that it's a soundness issue that can result in a fatal crash makes it seem fairly pressing. |
Comment author: @alainfrisch
Making them compatible does not prevent from making incompatible later on (when mutable strings are effectively dead). Could that break any code written under the assumption that they are compatible? |
Comment author: @yallop Yes: pattern match cases that equate incompatible types are rejected. So the following program is accepted if string and bytes are compatible, but rejected if they're incompatible: type _ ty = String : string ty | Bytes : bytes ty let f : string ty -> unit = function |
Comment author: @garrigue The GADT unsoundness is fixed in 4.03 and trunk, in commits c989c82 and df23448, Indeed, removing the compatibility if we disable completely -unsafe-string could in theory break some programs, but the impact of removing -unsafe-string seems some much bigger that this is not a real problem in comparison. Note that this resolves only the unsoundness. |
Comment author: @yallop Thanks for the bug-fix, Jacques. I'd like to keep this PR open, since the original issue is not yet addressed. Currently safe-string has the same issues as C's const: it allows me to ensure that my code doesn't modify a particular string, but doesn't give me any guarantees about the what other people's code can do to the same value. As a result, it doesn't really help with reasoning about code, so there's little incentive to switch.
A clearly-communicated path would be a big help here. Once we have a definite plan (e.g. making safe-string the default in 4.04 or 4.05) then setting up some OPAM-based incentives to switch is a good idea. As a first step, it'd be good to have a warning for code that will break once safe-string becomes the default. |
Comment author: @xavierleroy Now that #687 is merged, can we close this PR? |
Comment author: @yallop Alain's patch is a large step in the direction of a solution to this issue. With his patch, if you configure OCaml with And Jacques' patch fixes the type soundness problem, by making string and bytes incompatible types. The default configuration still allows modules with -safe-string and -unsafe-string to be linked, so the original issue is not solved for the moment. Could we plan to make Alain's configure-time -safe-string the default in some version of OCaml (perhaps 4.06)? That'd solve this problem once and for all. |
Comment author: @alainfrisch
I'm not sure about the schedule, but I fully agree that we should start sending very strong messages that -safe-string will become the default at some point. It should be considered a top priority by the community to migrate existing packages. I assume that doing the switch now would break a very large number of OPAM packages. Publishing a "wall of shame" might help here. |
Comment author: @damiendoligez Maybe it's a good time now to set the compiler's command line default to -safe-string ? This would mean that -safe-string would be the default in 4.06, and would be a big push toward safe strings while still providing a workaround for libraries that haven't been ported yet. |
Comment author: @gasche For 4.06, why not. "Right now", I would rather wait after 4.05 is released: I suspect that the people that will invest efforts in discovering the impact of the change and getting code fixed are the same that are currently working on the 4.05 release preparation. |
Closing this because |
Original bug ID: 7113
Reporter: @yallop
Status: confirmed (set by @yallop on 2016-03-21T12:19:00Z)
Resolution: fixed
Priority: normal
Severity: feature
Fixed in version: 4.03.0+dev / +beta1
Category: typing
Monitored by: @diml
Bug description
In OCaml 4.02.3, modules compiled with -safe-string and modules compiled without -safe-string can be linked into the same program. This makes -safe-string less useful than it would otherwise be.
Libraries can often be implemented more efficiently if strings are known to be immutable. Examples include libraries for immutable ropes, hash consing, and hash tables whose keys involve strings. Unfortunately, it's not currently safe to take advantage of immutable strings in this way: using -safe-string ensures that my library doesn't mutate the strings it uses, but those strings can still be mutated by other modules in the program.
It would be helpful to have a -safe-string option that behaved like -rectypes with respect to module compatibility, i.e. a guarantee that either all modules or no modules in a program were compiled with -safe-string.
Besides the extra guarantees for library authors, making -safe-string a global property would also enable a number of useful optimizations, such as sharing of string literals.
The text was updated successfully, but these errors were encountered: