-
-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Byte)string contracts unsound in the presence of mutable values #2179
Comments
I wonder how terrible it would be to have the contracts change to accept
only immutable strings and byte strings.
Robby
…On Sat, Jul 21, 2018 at 10:20 AM Philip McGrath ***@***.***> wrote:
Per the docs <http://docs.racket-lang.org/reference/contracts.html>,
strings and byte strings can be used as contracts "that recognize
themselves using equal?", and regular expressions "are treated as
contracts that recognize byte strings and strings that match the regular
expression." Such contracts seem to be unsound when applied to mutable
(byte)strings. The problem also affects predicates that inspect the
contents of (byte)strings and are frequently used as contracts, such as
path-string? and string-no-nuls?.
For example, in the following program:
#lang racket
(module server racket
(provide (contract-out
[struct a-string-box
([string (and/c string? #rx"a")])]
))
(struct a-string-box (string)
#:transparent))
(require 'server)
(define my-string
(string #\c #\a #\r))
(define boxed
(a-string-box my-string))
(string-set! my-string 1 #\d)
(a-string-box-string boxed)
the use of a-string-box-string raises an exception:
a-string-box-string: broke its own contract
promised: "car"
produced: "cdr"
in: an and/c case of
the range of
(-> a-string-box? (and/c string? "car"))
contract from: (anonymous-module server)
blaming: (anonymous-module server)
(assuming the contract is correct)
at: unsaved-editor:10.10
The problem is exacerbated because Racket doesn't implement
chaperone-string or chaperone-bytes. A module that wants to enforce an
invariant like the one server tries to put on a-string-box must either
insist that its clients supply only immutable? (byte)strings, implement
an impersonator contract that coerces mutable values to immutable ones, or
resort to manual checking.
I suspect this problem is rare in practice, given that Racketeers avoid
mutating (byte)strings, but it does seem worth addressing, especially given
the great effort racket/contract makes to provide sound contracts for
other mutable values. At a minimum, I think it would be worth noting in the
docs, along the lines of the explicit warnings given for other parts of the
contract library that can be used to create sound contracts.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2179>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAYWsHIL71HKkb2wEIrvgxtT0WmriNQqks5uIo_qgaJpZM4VZc-8>
.
|
I'm not sure. I would love for (byte) strings and regular expressions used as contracts to be sound by default, if the backwards-compatibility issues are surmountable. I suspect (without any data) that just rejecting mutable values would cause problems. I often get (byte) strings from IO and treat them as immutable without actually coercing them with For me personally, I think it would be nicer for the contracts themselves to do the coercion when needed with If In any case, dealing with (byte) strings and regular expressions used as contracts still leaves predicates like As a matter of principle, I think sound-by-default is better, but I don't feel like I have a good sense of just how daunting the backwards-compatibility would be in practice or what potential change would break the least amount of currently-working code. |
It does seem hard to assess the backwards compatibility implications for
making the contracts copy the bytes and strings and I think you're right
that rejecting mutable ones is unlikely to work. I think that making
chaperones for bytes might be doable (but lots of work) but chaperones for
strings might be off the table because of the performance implications in
the runtime. I'm not sure, tho.
Bummer. I wish I had thought of this when I first added these contracts. :(
Robby
|
I think we should admit a design mistake here and forget about backwards compatibility.
Protect immutable strings. Leave it to programmers to protect structures of mutable strings.
|
I'm happy to admit it was a mistake, but I'm concerned about compounding the mistake. Considering how strings and regular expressions are turned into contracts, we have some choices:
Are there other choices? I feel like the second one is worse because it may introduce extremely subtle bugs in programs and anyway copying mutable data seems unfriendly. The third one seems like it will introduce some slowdowns and it seems like a lot of work. But it also seems the best from the backwards compatibility perspective. I think we can probably see how bad option 1 is by making the change and then running a pkg test and see what changes. (I don't feel confident about that for option 2). |
I had envisioned something slightly different, though I don't know that it's better: a contract that would accept any kind of string, but copy mutable strings to immutable ones with I am also concerned about the possibility for subtle bugs. With option 1 in particular, a complication is that string literals are immutable, so writing tests using mutable strings requires special effort, which I, at least, generally have not done. For a concrete example, I have some functions that use I wouldn't hate that breaking, since from a certain perspective it amounts to the contract system protecting me from a category of bug I hadn't even thought about (one of the advantages in general of contracts over manual tests). I am in fact adding |
On Jul 22, 2018, at 2:06 AM, Robby Findler ***@***.***> wrote:
I think we can probably see how bad option 1 is by making the change and then running a pkg test and see what changes.
I like option 1 and I think this is the correct way to figure out whether it breaks much. (Conjecture: it won’t break anything.)
|
I agree that testing the change would be a better idea than blindly guessing as to its impact, but I’d bet you’re wrong. Sadly, in Racket, mutable strings are really common. String literals produce immutable strings, but almost everything else produces mutable ones. Even |
That brings up another potential possibility, though a particularly radical one: make strings and byte strings immutable, as was done with pairs, presumably adding new Obviously the backwards-compatibility issues would be very significant, but arguably breaking things noisily could be better than breaking them subtly. There would be a clear path to identifying what fails to compile if This would be a far-reaching change to Racket, well beyond addressing this issue with contracts, and I'm certainly not able to assess all of the implications, especially for the runtime. If it weren't for the precedent with pairs, I wouldn't even float it as a possibility. But it seems like it would be a particularly principled response, and it might have advantages beyond this issue, in addition to the disadvantages. If I'm right that Racketeers almost always reason about strings in particular as immutable, having the language reflect that would seem to have similar advantages to making pairs immutable. Right now, as @lexi-lambda points out, enforcing immutability for strings is especially awkward. |
In general, though, I agree that guessing about what would cause the most problems ultimately must yield to someone picking an approach that seems promising, trying to implement it, and seeing what happens. Also, at least from my perspective, coming up with a solution that gets the long-term considerations right seems much more important than doing something quickly. I don't know of a real-world program that's currently having problems because of this issue: I see it as a matter of correctness more than an urgent bug. |
I was thinking about a similar problem, because string.rkt had also a problem with mutable strings. Some functions have to calculate an internal regexp of the arguments, and to gain some speed they are cached. The problem was that it was using the argument that was a (mutable) string as the key in a hash. My solution was to the leave the immutable string alone and use I'm still not 100% happy with the current implementation. I was thinking about changing Anyway, about the current issue: I think that the option 1 is a good solution. Probably most of the times the strings in the arguments are not mutated. I'm more worried about the contract in the result of the function. What about using There were some ideas to make the use of immutable strings easier, but they were never merged. See #1219 and #1341 . I like immutability by default, but it's a big not backward compatible change. Some intermediate step like a #:immutable? keyword may be better for now. |
This issue has been mentioned on Racket Discussions. There might be relevant details there: https://racket.discourse.group/t/advice-on-implementing-a-contract-system/832/1 |
Per the docs, strings and byte strings can be used as contracts "that recognize themselves using
equal?
", and regular expressions "are treated as contracts that recognize byte strings and strings that match the regular expression." Such contracts seem to be unsound when applied to mutable (byte)strings. The problem also affects predicates that inspect the contents of (byte)strings and are frequently used as contracts, such aspath-string?
andstring-no-nuls?
.For example, in the following program:
the use of
a-string-box-string
raises an exception:The problem is exacerbated because Racket doesn't implement
chaperone-string
orchaperone-bytes
. A module that wants to enforce an invariant like the oneserver
tries to put ona-string-box
must either insist that its clients supply onlyimmutable?
(byte)strings, implement an impersonator contract that coerces mutable values to immutable ones, or resort to manual checking.I suspect this problem is rare in practice, given that Racketeers avoid mutating (byte)strings, but it does seem worth addressing, especially given the great effort
racket/contract
makes to provide sound contracts for other mutable values. At a minimum, I think it would be worth noting in the docs, along the lines of the explicit warnings given for other parts of the contract library that can be used to create unsound contracts.The text was updated successfully, but these errors were encountered: