std/core library functions with preconditions are a pain to use #1805

Closed
marijnh opened this Issue Feb 10, 2012 · 24 comments

9 participants

@marijnh
Contributor
marijnh commented Feb 10, 2012

I find myself doing myvec[vec::len(myvec)-1u] because the alternatives are check vec::is_not_empty(myvec); vec::last_total(myvec); and alt vec::last(myvec) { some(e) { e } _ { fail "..."; } }, both of which are way, way too kludgy for such a simple operation.

Let's make the default, short-name library functions just do the right thing unsafely, and provide other versions for those who are exhaustively tracking typestate on their values (which is currently no one, as far as I am aware).

@kud1ing
kud1ing commented Feb 10, 2012

I think we will end up with wide use of those unsafe shortcuts, effectively replacing compile time checks with runtime errors.

This causes great pain in the Haskell world.
See http://www.reddit.com/r/haskell/comments/lf71l/deprecate_preludehead_and_partial_functions/

There was some related discussion in #1105

@marijnh
Contributor
marijnh commented Feb 10, 2012

Rust is not Haskell, though. Our index [x] operator, for example, is also not guaranteed to succeed.

I'd be very happy if we had a way to make things like this safe without imposing a lot of extra load on the programmer, but we don't at the moment, and having preconditions on stdlib functions will just make people work around those in ways that are no safer than the unsafe version of the function would be. (All of the three snippets I gave above still cause a run-time failure when the vector is empty. No safety is gained.)

@kud1ing
kud1ing commented Feb 10, 2012

You are right, the operator [x] should be made safe using Typestate .

I agree, Typestate needs to be more developer-friendly.
Making it easier to create runtime errors is a step backwards imho.

@catamorphism
Contributor

I agree with @Lenny222 that the [] operator should be made safe (though we don't have good infrastructure to do that right now).

The reason that typestate guarantees are useful is that it gives the programmer the tools to guarantee statically that certain errors won't occur at runtime. It's true that if you just insert a check before every call, it could fail, and you're just moving the error to a different place (though even in that case, I'd argue that it's useful to have to think about error conditions, which are easy to ignore when the error handling is in the callee). But that would be un-Rust-like, just like using unsafe pointers all over the place is un-Rust-like. If functions are annotated properly with their preconditions and (once implemented) postconditions, then checks should be sparse.

There are lots of improvements we could make to typestate, but I'd prefer we not just give up on it because it's not perfect right now. In my opinion, we need to be using typestate within rustc so that we can work out the bugs, not take shortcuts around it.

Rather than simply making it easier to be unsafe, let's talk about syntax / semantics that we could introduce to make typestate less cumbersome in the specific cases you've brought up. That's my preference, anyway. For example, no one should be calling last on a vector that could be empty, and we can use functions like may and maybe in the option module to make it easier to handle the empty case. Explicit alts should never be necessary for that.

@brson
Contributor
brson commented Feb 10, 2012

My opinion is that we should either make a big push to make typestate useful or remove it from all core/std functions.

@catamorphism
Contributor

I'd be happy to work on making it useful, but I would need feedback from other people about how to do that, and since I am working on classes right now, we would want to discuss what our priorities are.

@killerswan
Contributor

Hmm, I need to play more with handling failures...

@vlasovskikh

If myvec is a return value of some fn f(...), then Issue #586 might help here by moving the problem to a potential library code. f(...) should narrow the set of possible return values using post-conditions:

fn f<T>(...) -> [T]: is_not_empty(*) { ... }

let myvec = f(...);
vec::last_total(myvec);

If myvec is assigned a local (vector) literal expression (maybe even a reasonably simple expression that requires evaluation), then it is possible to evaluate the pure fn predicate statically and implicity (assuming it doesn't contain unchecked blocks):

let myvec = [1, 2, 3];
...
vec::last_total(myvec);

If myvec is a fn parameter, then there is no problem (typestate predicates for fn params).

Otherwise there is a problem, and we have to use an explicit typestate or an option check or an unsafe [] operator.

@catamorphism
Contributor

@vlasovskikh The problem with "evaluate the pure fn predicate statically and implicity" is that it violates one of our guiding principles behind typestate, which is that the compiler doesn't take advantage of any information about the semantics of the predicates -- predicates are opaque. We could consider special-casing some knowledge about vector literals (and integer literals, and...) but this would be a new iteration on the design. The question has certainly arisen before.

@vlasovskikh

@catamorphism The compiler doesn't have to have any special knowledge about predicate semantics here. It should only evaluate (some of) them as regular Rust code, but at compile-time.

@catamorphism
Contributor

@vlasovskikh Evaluating code is pretty much the epitome of knowing what its semantics are -- if you can evaluate it, you know its semantics completely :-)

@catamorphism
Contributor

In general, it would be dodgy to evaluate any code at compile-time without some special-case hints or even tighter restrictions than we already have. pure fn predicates are (currently) not required to be terminating, so evaluating calls to them could always potentially make the typechecker non-terminating.

@vlasovskikh

@catamorphism Yes, I meant that no special-case hints were needed. Why pure fn's without unchecked blocks can be non-terminating?

@catamorphism
Contributor

Why not? The point of pure fn is to disallow non-referentially-transparent behavior. A nonterminating function (that doesn't have observable effects) is perfectly referentially transparent.

@vlasovskikh

I've read in the docs that self-recusive calls, higher-order params and non-pure calls are not allowed in a pure fn. I erroneously concluded, that it was impossible to use a general recursion (Y-combinator) inside pure fn and that pure fns always halted. But I forgot about mutual recursion and while loops :)

@catamorphism
Contributor

The documentation about that is likely to be full of lies right now (because I wrote it) -- the rules for what's allows in a pure fn are kind of in flux, and have been for a while. It's not our intent to disallow non-terminating pure fns, though.

@marijnh
Contributor
marijnh commented Feb 11, 2012

Even when typestate becomes more powerful, annotating one's programs with the right pre- and post-conditions is bound to be a lot of work. Some programs will want to do this to gain extra safety, but a lot of programs won't need to pay this price -- they are either prototypes, or something that is just not very critical. I think forcing every Rust programmer into the typestate straitjacket by integrating typestate heavily into the stdlib is not a good idea.

@catamorphism
Contributor

@marijnh What's your threshold for easiness? That is, how would you characterize how easy it would have to be to annotate programs with preconditions and postconditions before you would want to allow typestate in the stdlib?

@catamorphism
Contributor

Also, independently of that question, I was under the impression that one of the things claim was for was to help with prototyping -- you can always get out of having to do a check by using a claim instead. It still requires the programmer to write down claim statements explicitly, but I had been under the impression that this was one of the core principles of Rust: you can play fast and loose, but you have to declare explicitly that you intend to play fast and loose. Rust is for writing reliable software, and there are many other languages for prototyping or writing "not very crtitical" code; in my opinion, it's useful to encourage programmers to get it right the first time.

But a lof of this is just my impression, and I'd like to hear comments from other people.

@killerswan
Contributor

I need some more examples. Otherwise, umm, what would Rust's recommended prototyping language be?

@graydon
Contributor
graydon commented Feb 13, 2012

A few things:

  • I don't want rustc to get much further into the game of partial evaluation or abstract interpretation than typestate and the lurching-into-existence constant folder. That's a bottomless timesink and I'd prefer we spend our time on other matters.
  • While it was certainly my original intention to be bureaucratic about "always declaring explicitly when you intend to break rules", I am no longer the sole audience. We are serving multiple audiences, and first impressions count.
  • I agree with marijn that typestate is unusual enough as a language feature that casual users should not have to even know it exists at all in order to use the standard library.
  • I also feel that in cases where typestate is playing a role in differentiating base cases from inductive cases in data structures, say, or establishing a datasort refinement, requiring the use of check statements turns out to be a pretty high notational burden. We need to find some ways to lower that burden if we're to attract users to the feature. It's already underused; chastising people into using something they don't feel is paying for itself is not likely to work.

That said, I really want to encourage use of typestate, not just because "it's a cool word / concept" but rather because it's really assert in slightly more formal attire, and I love assert in my code. It saves me huge debugging energy, in every program I write that leans on it. To me, typestate predicates are assertions with 3 positive additional properties:

  1. They have names! Helpful for debugging and surfacing in API docs.
  2. The compiler can tell you where you're missing a check.
  3. The compiler can tell when a check is redundant, doesn't have to run.

Currently I think we're really only chafing under a very strict interpretation of the second of these properties -- compilation stops when you miss a check -- and only because it's mandatory now rather than optional. I think it's reasonable to consider the following change:

  • When a callee foo() has predicate preconditions, we produce two wrapper functions foo_dyn and foo_all. The foo_dyn function takes 1 or more extra parameter(s) -- uints -- that have a bit chosen to represent every incoming predicate. The foo_dyn function checks each bit, runs a check for any bit set to 0, and then calls through to the foo() callee. The foo_all function has the same signature as foo -- no extra parameters -- but calls foo_dyn with 0 for each extra param in foo_dyn.
  • When a caller makes a call with predicates absent in the typestate, the caller calls foo_dyn with 1 or more uints indicating the static typestate at the call site. This will wind up as 1 or more (usually 1) extra constant(s) in the instruction stream, not so bad.
  • When a caller takes a first class reference to foo, they take a pointer to foo_all (unless we support first class function types with predicates, in which case they might get a reference to foo_dyn, but we can leave that as a future extension).
  • We have a lint / error-checking flag / mode, available on a caller-scope by caller-scope basis, that re-enables the current behavior: that of requiring explicit checks on all paths to the call site. This amounts to insisting that the scope always calls foo directly, never foo_dyn.
  • Possibly make this mode take a list of the predicates the user wants to require explicit checks for.

This would move typestate a fair bit closer still to a fancy DBC system: it'd be a DBC system that has a formal model of check redundancy and elimination (thereby: a means to safely and mindlessly reduce the execution penalty of over-checking by adding check statements as far out as you can). But it would also make it much more likely for people to use predicates to constrain function inputs from the get-go. This is a much more compelling stales pitch: "they're just like assertions in the callee, but the caller has a way to inhibit them, to make the callee go faster".

Keep in mind, assert is already compelling enough to see naturally-occurring use. There are a dozen uses in production-code pathways of core::vec already (i.e. outside the tests). They're not even expensive enough to warrant static elimination, but they would (in a few cases) be worth surfacing in API docs. So I think if we can focus on keeping the experience "like assert but better", we'll be ok.

Thoughts?

@pcwalton
Contributor

My main concern is that refinement types have ended up being used in very few places, primarily because they only work on immutable data and with pure predicates. So, honestly, I'm not sure whether their existence is justified…

Maybe cooking dynamically-checked contracts into the language in some form is really what we want. This could be a solution to points (1) and (2) that Graydon brought up. Refinements don't seem to have solved (3) that often in practice, unfortunately.

@dherman
Contributor
dherman commented Feb 16, 2012

I've been concerned about refinement types, too. I totally agree that having assert is great. But when it's tied into the type system, I think there are two main problems:

(a) As Patrick said, it can only do sound static reasoning about immutable data. Earlier on in Rust's history, this was looking like a very large set of data in Rust, because all message-passing was going to be based on immutable data. Now that that's no longer the case, predicates don't seem to be able to account for as much of the data in the language.

(b) The modularity problem, as Marijn points out in the bug. A library wants to state its invariants, but when those invariants are required by the type system, the programmer has to prove the invariant not only to herself but also to the type system. Since you can't really construct data that upholds a predicate a priori, you're forced to insert a check every time you create data that needs to be sent to a function with a refined input type.

There's this tension between wanting to express your invariants, especially when you have a system that can enforce some conservative approximation of some of them, but being limited by the expressiveness of the invariant system as well as the serious inconvenience (and/or runtime cost) it imposes on your clients.

Graydon: IIUC, the variant you outline above could roughly be described as gradual typing for refinement types. This is the direction recent research in blended type systems has gone: a fully static version that imposes no runtime cost, a dynamic version with dynamic checks, and the ability for the two to interact without mandating explicit casts by inserting them automatically. I do still share Patrick's concern that it doesn't cover enough of the language (you can't put predicates on stateful types, or higher-order types like functions or objects) to warrant its status in the type system.

If we had a less high-powered refinement type system, we could still have assert. I think putting it in the language would continue to have many benefits: a language-blessed way to spot-check invariants, a way to generate better assertion error information (e.g., having baked-in syntax means you can report good source location information), and possibly a hook to do control-flow reasoning like we do now but for optimizations.

Just my current thoughts, anyway.

@graydon
Contributor
graydon commented Feb 16, 2012

@dherman Yeah, what I was describing is a fair bit like gradual typing.

I mean, to put this in most-blunt terms: even if we never managed to find a formulation of typestate that was smooth-feeling enough to get extensive static use, I'd sorta like a DBC system to be present, if only to hoist function pre/post conditions up to API-documentation level rather than "asserts buried in the body of the function".

As far as whether refinement types will work out in the face of more-pervasive mutability, keep in mind that we didn't just blindly give up leaning on immutability in message passing; we shifted to leaning on unique types. I think the same is fair game here: a unique mutable type is perfectly valid for constraining, you just lose the constraint (or are obliged to re-check it) when you mutate the body.

In any case, we still don't have function postconditions working. Or constants in constraints. Or declared-constrained types. Or ... lots of parts. The fact that we're not leaning on this not-fully-mature subsystem heavily yet doesn't mean we ought to be angling for its obsolescence. Let it grow some more and see what we can do with it. We're not using resources or tasks extensively yet either. And hardly any macros! A lot of parts of rust have taken a number of passes to refine to the point of tolerable usability; they were just more often things so firmly in the critical path that we were forced to dogfood through them earlier :)

@marijnh marijnh added a commit that closed this issue Feb 22, 2012
@marijnh marijnh Remove preconditions from libraries
Closes #1805
ad03761
@marijnh marijnh closed this in ad03761 Feb 22, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment