Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-safe-string should be a global property #7113

Closed
vicuna opened this issue Jan 4, 2016 · 19 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link

commented Jan 4, 2016

Original bug ID: 7113
Reporter: @yallop
Status: confirmed (set by @yallop on 2016-03-21T12:19:00Z)
Resolution: fixed
Priority: normal
Severity: feature
Fixed in version: 4.03.0+dev / +beta1
Category: typing
Monitored by: @diml

Bug description

In OCaml 4.02.3, modules compiled with -safe-string and modules compiled without -safe-string can be linked into the same program. This makes -safe-string less useful than it would otherwise be.

Libraries can often be implemented more efficiently if strings are known to be immutable. Examples include libraries for immutable ropes, hash consing, and hash tables whose keys involve strings. Unfortunately, it's not currently safe to take advantage of immutable strings in this way: using -safe-string ensures that my library doesn't mutate the strings it uses, but those strings can still be mutated by other modules in the program.

It would be helpful to have a -safe-string option that behaved like -rectypes with respect to module compatibility, i.e. a guarantee that either all modules or no modules in a program were compiled with -safe-string.

Besides the extra guarantees for library authors, making -safe-string a global property would also enable a number of useful optimizations, such as sharing of string literals.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jan 4, 2016

Comment author: @alainfrisch

This would basically amount to having a configure-time switch, since the way the stdlib is compiled would define whether strings are mutable or not. This is different from rectypes, where the constraint is only from a library to its clients. Here, we need the constraint in both directions.

We should really push for actually dropping mutable strings altogether. Has anyone looked at how many OPAM packages still rely on mutable strings?

@vicuna

This comment has been minimized.

Copy link
Author

commented Jan 4, 2016

Comment author: @gasche

To clarify: the -rectypes suggestion is to store the safe-string status inside compiled interfaces (.cmi files). When you compile A which depends on B, you check that the current setting is the same as the one stored in b.cmi.

However, in this we could consider a weaker semantics:

  • if safe-string is stored in the .cmi, then safe-string must be enabled by user modules
  • if safe-string is not stored in the .cmi, then user modules are free to enable -safe-string or not

In particular, we would accept to compile a user module demanding -safe-string against a library module that did not use -safe-string.

If we go this way, I don't think there is any risk of "cascade from stdlib", as long as we are careful to not store "safe-string" in the stdlib's cmi files.

This semantics allows library authors to decide (locally for their library) to force their users to also use safe-string.

Of course this is not as safe as Jeremy's suggestion: when library authors provide higher-order functions of the kind (string -> ...), they don't know whether user modules will respect their immutability expectations.

Note that we could adopt this weaker semantics in a future OCaml version, and later strengthen it to the bidirectional implication (if user demands -safe-string, library must use -safe-string). Both steps of the transition will break code and require people to update either user code or library authors build scripts.

In any case, I don't think it is reasonable to aim to perform this change in time for 4.03. Would 4.04+dev be a reasonable target?

@vicuna

This comment has been minimized.

Copy link
Author

commented Jan 4, 2016

Comment author: @alainfrisch

Of course this is not as safe as Jeremy's suggestion: when library authors provide higher-order functions of the kind (string -> ...), they don't know whether user modules will respect their immutability expectations.

I don't think this is the problem with the unidirectional constraint that you suggest, rather the opposite: a user module assuming safe-string for its internal operations would be at risk if it passes its strings to libraries compiled without safe-string.

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 4, 2016

Comment author: @damiendoligez

I think the plan was to make -safe-string the default in 4.04, with -unsafe-string an explicit option, then remove the -unsafe-string option in some later version (maybe 4.06). But maybe that's going too fast.

As for this:

a guarantee that either all modules or no modules in a program were compiled with -safe-string.
it would be a dependency nightmare, OPAM would have to have every library in both versions.

Or, if you force each library author to choose one side, we risk having everyone choose the -unsafe-string option (for backward compatibility or simply because it's the default) and then nobody ever moves to -safe-string.

Another possibility would be a link-time option (-force-safe-string?) to ensure that all the modules that get linked in were compiled with -safe-string. Then you'd have your check without having to force it on everyone.

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 26, 2016

Comment author: @yallop

In fact, linking together a module that's compiled with -safe-string and a module that's compiled without -safe-string breaks type safety. Facts that are true in one module are false in the other module, and sending a proof of some fact into a module that relies on there not being such a proof ends in disaster.

Here's an example. We'll start off with a module where bytes is known to be equal to string:

$ cat a.ml
type _ t =
X of string
| Y : bytes t

let y : string t = Y
$ ocamlc -c a.ml

and a module where bytes is known to be unequal to string:

$ cat b.ml
let f : string A.t -> unit = function
A.X s -> print_endline s

let () = f A.y
$ ocamlc -c -safe-string b.ml

Here's what happens when the modules are linked together and the result is run:

$ ocamlc -o crash a.cmo b.cmo
$ ./crash
Segmentation fault

Making -safe-string a global property would fix this, of course. But perhaps the problem can also be fixed by making bytes and string compatible in the -safe-string case.

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 26, 2016

Comment author: @alainfrisch

Making -safe-string a global property would fix this, of course.

I don't see how this could be done. Would one decide at configure-time if the stdlib is "safe-string" or not (and then all libraries would inherit from that, excluding library which are not "safe-string" ready when in this mode)?

But perhaps the problem can also be fixed by making bytes and string compatible in the -safe-string case.

It's probably a required step in the short term to restore type-safety.

Have people investigated the "state of the migration to -safe-string"? How many OPAM packages have already been switched, how much effort is typically required to switch one package, etc?

Perhaps one should put more pressure on the community to move to -safe-string? (e.g. by creating a dashboard of non-ready OPAM packages and sending regular reminders to their authors and on caml-list)

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 26, 2016

Comment author: @gasche

If you make bytes and string compatible, that hampers future ability to write GADTs handling them as distinct types. Or maybe we could revert that change by the time we either remove the -unsafe-string option or make -safe-string a global property.

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 29, 2016

Comment author: @yallop

I don't see how this could be done. Would one decide at configure-time if the stdlib is "safe-string" or not (and then all libraries would inherit from that, excluding library which are not "safe-string" ready when in this mode)?

There are various options, none of which is absolutely ideal. Some kind of global aspect seems unavoidable, though. As long as the compiler treats string and bytes as equal in one module and incompatible in another it'll be possible to construct programs like the one above which crash at runtime.

As far as I can see, the realistic options are

  • make it impossible to link modules that were compiled with -safe-string with modules that weren't

    As Alain points out, this amounts to having a configure-time switch, since the standard library will be affected. That's not necessarily a problem, though: OPAM should make the situation a lot more manageable. We can add constraints to packages which won't compile with safe-string, which should also serve as an incentive to transition.

  • make string and bytes compatible now

    As Gabriel notes, this might make it difficult to transition to a world where they're incompatible. Treating string and bytes as completely distinct types seems like rather a useful facility, so I think we should take this concern seriously.

  • drop support for mutable strings altogether

    This is coming sooner or later, in any case. It's probably too drastic a step for 4.03.

There are other possibilities, e.g. marking modules that don't rely on string/bytes compatibility or equality in some way, but they seem to complex to be worth considering.

On another note, should we aim to fix this issue in 4.03? The fact that it's a soundness issue that can result in a fatal crash makes it seem fairly pressing.

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 29, 2016

Comment author: @alainfrisch

If you make bytes and string compatible, that hampers future ability to write GADTs handling them as distinct types.

Making them compatible does not prevent from making incompatible later on (when mutable strings are effectively dead). Could that break any code written under the assumption that they are compatible?

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 29, 2016

Comment author: @yallop

Yes: pattern match cases that equate incompatible types are rejected. So the following program is accepted if string and bytes are compatible, but rejected if they're incompatible:

type _ ty = String : string ty | Bytes : bytes ty

let f : string ty -> unit = function
String -> ()
| Bytes -> ()

@vicuna

This comment has been minimized.

Copy link
Author

commented Mar 1, 2016

Comment author: @garrigue

The GADT unsoundness is fixed in 4.03 and trunk, in commits c989c82 and df23448,
by making string and bytes compatible types.

Indeed, removing the compatibility if we disable completely -unsafe-string could in theory break some programs, but the impact of removing -unsafe-string seems some much bigger that this is not a real problem in comparison.

Note that this resolves only the unsoundness.
The non-modularity problem is not resolved, but I do not see any solution for that, that is without breaking the whole ecosystem.

@vicuna

This comment has been minimized.

Copy link
Author

commented Mar 1, 2016

Comment author: @yallop

Thanks for the bug-fix, Jacques.

I'd like to keep this PR open, since the original issue is not yet addressed. Currently safe-string has the same issues as C's const: it allows me to ensure that my code doesn't modify a particular string, but doesn't give me any guarantees about the what other people's code can do to the same value. As a result, it doesn't really help with reasoning about code, so there's little incentive to switch.

Perhaps one should put more pressure on the community to move to -safe-string? (e.g. by creating a dashboard of non-ready OPAM packages and sending regular reminders to their authors and on caml-list)

A clearly-communicated path would be a big help here. Once we have a definite plan (e.g. making safe-string the default in 4.04 or 4.05) then setting up some OPAM-based incentives to switch is a good idea.

As a first step, it'd be good to have a warning for code that will break once safe-string becomes the default.

@vicuna

This comment has been minimized.

Copy link
Author

commented Jul 13, 2016

Comment author: @yallop

There's a GitHub PR open that addresses this issue:

#687

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 16, 2017

Comment author: @xavierleroy

Now that #687 is merged, can we close this PR?

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 17, 2017

Comment author: @yallop

Alain's patch is a large step in the direction of a solution to this issue. With his patch, if you configure OCaml with -safe-string, then all strings are immutable, since there's no way to compile code that uses mutable strings.

And Jacques' patch fixes the type soundness problem, by making string and bytes incompatible types.

The default configuration still allows modules with -safe-string and -unsafe-string to be linked, so the original issue is not solved for the moment.

Could we plan to make Alain's configure-time -safe-string the default in some version of OCaml (perhaps 4.06)? That'd solve this problem once and for all.

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 17, 2017

Comment author: @alainfrisch

Could we plan to make Alain's configure-time -safe-string the default in some version of OCaml (perhaps 4.06)?

I'm not sure about the schedule, but I fully agree that we should start sending very strong messages that -safe-string will become the default at some point. It should be considered a top priority by the community to migrate existing packages. I assume that doing the switch now would break a very large number of OPAM packages. Publishing a "wall of shame" might help here.

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 28, 2017

Comment author: @damiendoligez

Maybe it's a good time now to set the compiler's command line default to -safe-string ?

This would mean that -safe-string would be the default in 4.06, and would be a big push toward safe strings while still providing a workaround for libraries that haven't been ported yet.

@vicuna

This comment has been minimized.

Copy link
Author

commented Feb 28, 2017

Comment author: @gasche

For 4.06, why not. "Right now", I would rather wait after 4.05 is released: I suspect that the people that will invest efforts in discovering the impact of the change and getting code fixed are the same that are currently working on the 4.05 release preparation.

@vicuna vicuna added the typing label Mar 14, 2019

@stedolan

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

Closing this because -safe-string has been the default since 4.06, which seems to resolve the issue here. Reopen if you disagree!

@stedolan stedolan closed this Mar 19, 2019

@vicuna vicuna added the feature-wish label Mar 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.