New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ghost erasure and tracked variables #347
Conversation
…hanges the way Verus erases ghost code for compilation and the way Verus checks tracked variable lifetimes in ghost code. In the long run, this should allow us to completely replace the old ghost code erasure with the proposed ghost block extension to rustc. In the short run, this should allow us to replace the old ghost code erasure with a macro-based erasure that would remove the last major dependency on our fork of rustc, hopefully allowing us to use a standard rustc build. It also simplifies the way programmers use `tracked` variables (at least compared to the current rules for `tracked`). Note: lifetime_generate.rs is still incomplete in this commit.
I'm going to mark this ready for review. There's still work to do, but I think the pull request can be merged before the remaining tasks are completed. The remaining tasks are:
By default, the new ghost code erasure is disabled. The command line options |
By the way, here's some updated test code (the "synthetic code generation" description above is slightly out of date). For this code:
the generated synthetic code is:
For this code:
the generated synthetic code is:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly looks good to me.
It would be good to preserve your write-ups from the PR into the code documentation. I'd also like to see a high-level write-up explaining exactly what's getting preserved in the translation versus what gets simplified out.
The cell, cell_old_style, atomic, atomic_ghost, and ptr_old_style libraries need to be ported to the verus macro syntax. Currently, these libraries are disabled via cfg when using the new ghost erasure (they are still enabled when using the old erasure, so the tests that rely on them still work via the old erasure)
We can probably remove the 'old_style' libraries now that the artifact is squared away.
The 'atomic' file is a little annoying because it has a really painful start-up cost due to the macro (moreso than any other pervasive file). Even though it's ugly, I think it'd be best to keep it in the "expanded" style with attributes, at least for now.
Exec closures are almost supported, but they currently rely on old-style requires/ensures functions, which aren't supported. We might just want to introduce new syntax for requires/ensures on exec closures.
I agree, I always meant to do this at some point anyway.
source/rust_verify/src/lifetime.rs
Outdated
) -> State { | ||
let gen_state = | ||
crate::lifetime_generate::gen_check_tracked_lifetimes(tcx, krate, erasure_hints); | ||
emit_state.writeln("#![feature(box_patterns)]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rust allows multi-line string literals
} | ||
state.newline(); | ||
state.write("open_invariant_end(guard, "); | ||
emit_pattern(state, &un_mut_pattern(inner_pat)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this get un_mut'ed?
span: Span, | ||
mk_typ: impl FnOnce(&mut State) -> Typ, | ||
exps: Vec<Option<Exp>>, | ||
force_some: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you document what 'force' means?
} | ||
} | ||
|
||
fn erase_abstract_datatype<'tcx>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's an abstract datatype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tjhance for going over things first. I don't have further notes.
@Chris-Hawblitzel we may want to consider the Creusot-inspired MIR-based strategy to hide spec code from the borrow-checker we discussed, but we should see what the impact of ghost!
in rustc would be first.
(There's a bunch of merge commits that I don't understand, I imagine it's to keep the branch up-to-date. It'd be great if you could squash them, but it shouldn't be a merge blocker.)
* (Initial, incomplete commit for draft pull request) This completely changes the way Verus erases ghost code for compilation and the way Verus checks tracked variable lifetimes in ghost code. In the long run, this should allow us to completely replace the old ghost code erasure with the proposed ghost block extension to rustc. In the short run, this should allow us to replace the old ghost code erasure with a macro-based erasure that would remove the last major dependency on our fork of rustc, hopefully allowing us to use a standard rustc build. It also simplifies the way programmers use `tracked` variables (at least compared to the current rules for `tracked`). Note: lifetime_generate.rs is still incomplete in this commit. * cargo fmt * Continued work on lifetime_generate.rs * Continued work on lifetime_generate.rs * In lifetime_generate, index on HirId rather than Span * Continued work on lifetime_generate.rs * Continued work on lifetime_generate.rs * Continued work on lifetime_generate.rs * For now, disable pervasive cache for macro_erasure * Continued work on lifetime_generate.rs * Continued work on lifetime_generate.rs * Continued work on lifetime_generate.rs * Add more comments * Add more comments * Port more pervasive libraries to new_erasure_tracked * In lifetime_generate, use MIR generics only, not HIR
(Initial, incomplete commit for draft pull request) This completely changes the way Verus erases ghost code for compilation and the way Verus checks tracked variable lifetimes in ghost code. In the long run, this should allow us to completely replace the old ghost code erasure with the proposed ghost block extension to rustc. In the short run, this should allow us to replace the old ghost code erasure with a macro-based erasure that would remove the last major dependency on our fork of rustc, hopefully allowing us to use a standard rustc build. It also simplifies the way programmers use
tracked
variables (at least compared to the current rules fortracked
).I'll first review the old erasure mechanism, then discuss the long-term proposal, then the short-term proposal.
Note: lifetime_generate.rs is still incomplete in this draft commit.
Old Erasure Mechanism
The old mechanism was based on the observation that only exec code gets compiled, and only exec and proof code get borrow checked:
Based on this, the old mechanism erased spec code after type checking and then erased proof code after lifetime checking, leaving only exec code at the end to compile. However, while this erasure was simple in principle, it was difficult to erase code in Rust's HIR/THIR/MIR formats without making major changes to rustc. Therefore, the actual implementation ran rustc three times, erasing different amounts of code near the rustc front end (in Rust's AST format) in each of the three runs. This had many drawbacks:
Ghost blocks
There has been a recent proposal to add ghost code as a Rust feature. This proposal is syntactic: ghost code is marked explicitly in the source code by writing the ghost code in "ghost blocks". In addition, ghost variables may be represented using new types (e.g.
x: Ghost<t>
instead ofghost x: t
). This proposal would have the advantage of requiring only one run of rustc, rather than three.To prepare for this proposal, Verus has been switching over the last year to a more coarse-grained, syntactic style of expressing ghost code. For example, calls from exec code to proof lemmas are now written inside of explicit
proof
blocks. In fact, the syntactic distinction between exec and non-exec code may help make the code more readable. However, we've also moved to a style that syntactically distinguishes between proof code and spec code, and this seems to be just annoying. In particular, proof (tracked) arguments to functions have to be marked astracked
, as inlet z = f(tracked x, tracked y);
rather than justlet z = f(x, y);
. The mode checker can easily infer thesetracked
annotations, but unfortunately, the mode checking runs after the syntactic splitting into exec/proof/spec code, when it's too late for the mode inference to help:Long-term proposal
To make programming in Verus more pleasant, this pull request reduces the reliance on syntax to distinguish between the different code modes. Syntax is used to distinguish between exec and non-exec code (e.g. by using
proof
blocks), but not to distinguish between proof and spec code inside aproof
block. This allows programmers to writelet z = f(x, y);
rather thanlet z = f(tracked x, tracked y);
.This pull request assumes the simplest possible treatment of lifetime checking for non-exec code in the new ghost code Rust proposal: it assumes that the proposal skips lifetime checking of non-exec code entirely. This means that Verus needs to have a separate way to check lifetimes of proof code. This separate checking is the main purpose of this pull request. Roughly, this separate checking looks like the following:
In this design, rustc performs type checking of all code, Verus performs mode checking of all code, rustc performs lifetime checking and compilation of exec code, and Verus performs lifetime checking of proof code. (Verus will only need to check proof code that manipulates tracked data; proofs that work only on spec data trivially satisfy lifetime constraints.)
This pull request implements Verus's lifetime checking for proof code. This checking no longer uses erase.rs and erase_rewrite.rs; these will eventually be deleted entirely. Instead, the checking works by constructing synthetic Rust code to abstractly represent the various actions that the proof code performs, such as variable declarations, function calls, and pattern matching. It then sends this synthetic code to a separate invocation of rustc. (We could also consider using some other checker, such as Polonius, although generating synthetic Rust code is fairly easy.) Since this synthetic code generation occurs after mode inference, it can use the infered modes to guide the synthetic code generation.
This approach should solve the drawbacks listed above:
Short-term proposal
The long-term proposal relies on the ghost code proposal being implemented and adopted. Before that happens, we can still use this pull request's proof-mode lifetime checking in a slightly hackier way to get some of the benefits of the long-term proposal. In particular, the short-term proposal should be able to run with standard rustc, without relying on a separate fork of rustc, and the short-term proposal will allow programmers to write
let z = f(x, y);
rather thanlet z = f(tracked x, tracked y);
. The short-term proposal still requires running rustc multiple times on the original source code, although only twice instead of three times:To avoid needing a fork of rustc, the short-term proposal uses the syntax macro to erase non-exec code, rather than relying on erase.rs and erase_rewrite.rs.
This pull request implements the short-term proposal in its entirety (when using the new command-line option
--erasure macro
), except that there are still a few leftover unrelated dependencies of Verus on our rustc fork, so it doesn't completely get us off the fork. But the remaining dependencies should be relatively easy to remove.Possible alternative (even longer-term): MIR rewriting
There may be other ways to implement erasure and lifetime checking for Verus. For example, if rustc exposes an official interface to read and transform "stable" MIR code, we might be able to use MIR transformation to erase spec code and proof code. However, this support is likely to be further in the future. It would also mean we'd have to perform MIR transformations based on HIR mode checking, although we might be able to rerun the mode checking at the MIR level. I think we should continue to think about alternatives like this, but my hope is that this pull request can improve our lives in the short term regardless of future alternatives.
Ghost types
Right now, our ghost types (
Ghost<T>
andTracked<T>
) are somewhat annoying because programmers have to explicitly dereference them to retrieve the underlyingT
ghost value. This is an orthogonal issue to this pull request, and this pull request does not address it. However, as part of the long-term solution above, but separately from this pull request, we could try to integrate some automatic dereferencing of ghost types into the ghost code proposal, so that the Rust type checker could automatically add the necessary dereferencing. As far as I recall, something like this was part of the original discussion for ghost code, although I don't think the details were ever settled.Implementation details
Order of operations: long-term proposal
As discussed above, in the long-term proposal, rustc would only run once on the original source code. Here's what would happen during that one run:
Order of operations: short-term proposal
The order of operations in the short-term proposal is a little messier because we have to run rustc twice on the original source code, once erasing ghost code and once keeping ghost code. This causes some tension in the order of operations: it would be cleanest to run one rustc invocation entirely, and then the other entirely, but this would be slow. For example, if we ran the ghost-erase rustc and then the ghost-keep rustc, then the user would receive no feedback about verification until parsing and macro expansion for both rustc invocations had completed. On the other hand, if we ran the ghost-keep rustc and then the ghost-erase rustc, the user would receive no feedback about lifetime/borrow checking errors in exec functions until verification had completed. So for better latency, the implementation interleaves the two rustc invocations (marked GHOST for keeping ghost and EXEC for erasing ghost):
To avoid having to run the EXEC lifetime/borrow checking before verification, which would add latency before verification, the short-term implementation takes advantage of the synthetic-code lifetime/borrow checking to perform a quick early test for lifetime/borrow checking on all code (even pure exec code). If this detects any lifetime/borrow errors, the implementation skips verification and gives the EXEC rustc a chance to run, on the theory that the EXEC rustc will generate better error messages for any exec lifetime/borrow errors. The GHOST lifetime/borrow errors are printed only if the EXEC rustc finds no errors. Note that the long-term proposal avoids this complex interleaving; rustc only runs once on the original source code, and it naturally prints the regular exec lifetime/borrow checking errors first without so much effort. The long-term proposal also doesn't require the synthetic code generation for all functions; it only requires it for proof code.
Synthetic code generation
The
--log-all
or--log-lifetime
options will print the generated synthetic code for lifetime/borrow checking to a file in the.verus-log
directory. For the following example source code:the following
crate-lifetime.rs
file is generated when logging is enabled:Note that the synthetic code erases all spec code, keeping only exec and proof code. Also note that the synthetic code is simpler than the original source code in various ways, keeping only what's necessary for lifetime/borrow checking. For example, all function calls are redirected to the same function
f
, which takes the function arguments as a tuple.Error reporting
As discussed in the order of operations above, the implementation lets rustc print any exec-mode lifetime/borrow errors first. In the example above, this results in the following output:
If we change the example by commenting out the exec-mode error lines, then we will see the ghost-mode lifetime/borrow errors from the synthetic code:
The error messages here aren't quite as detailed, but hopefully they should be adequate. The ghost-mode messages are generated by running rustc on the synthetic code, with rustc configured to generate errors in JSON format, capturing the JSON errors and parsing them to retrieve the line/column information and error messages, converting the synthetic line/column information back into spans for the original source code, and then sending the error messages and spans to the rustc diagnostics for the original source code. It's possible that if we used, say, Polonius, instead of rustc, to do the lifetime/borrow checking, we wouldn't have to jump through so many hoops to manage the line/column/span information in the errors, but I haven't looked into this.