New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC for Function and Method Stubbing #1723
RFC for Function and Method Stubbing #1723
Conversation
57059b4
to
44b9290
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I haven't reviewed everything, but I left a few comments. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks great overall! Just have a couple more comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to make any change to the results report to highlight that a stub was used?
|
||
Under this substitution, Kani has a single check, which proves that the assertion can fail. Verification time is 0.02 seconds. | ||
|
||
### Mocking deserialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty cool.
|
||
To teach this feature, we will update the documentation with a section on function and method stubbing, including simple examples showing how stubbing can help Kani handle code that currently cannot be verified. | ||
|
||
## Detailed design |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will likely need some changes to the metadata so we can tell kani-driver
which files were generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense. I will add this.
|
||
## Rationale and alternatives: stubbing mechanism | ||
|
||
Our approach is based on a MIR-to-MIR transformation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would highlight that another benefit is that we can extend our compiler to integrate --concrete-playback
with the abstractions. But that this is out of the scope for this work.
fn my_harness() { ... } | ||
``` | ||
|
||
A similar mechanism can be used to aggregate stub sets: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Makes sense to explicitly write stub_by(...)
because of this.
Requiring the replacement's type to be a subtype of the original type is likely stronger than what we want. | ||
For example, if the original function is polymorphic but monomorphized to only a single type, then it seems okay to replace it with a function that matches the monomorphized type. | ||
- How can we allow a user to stub a method and access private fields of `self`? | ||
This should be possible using `std::mem::transmute`, but can we hide this from the user and make it less brittle? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's good for all cases, but one possible approach here is instrumentation.
3. Users could use stubbing to perform compositional reasoning: prove the behavior of a function/method `f`, and then in other proofs---that call `f` indirectly---use a stub of `f` that mocks that behavior but is less complex. | ||
|
||
In all cases, stubbing would enable users to verify code that cannot currently be verified by Kani (or at least not within a reasonable resource bound). | ||
Even without stubbing types, the ability to stub functions/methods can help provide verification-friendly abstractions for standard data structures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even without stubbing types
purely as a readability / flow nit: this starts talking about a case before you've really defined what these different types of stubbing are.
As I read on, I think this RFC is missing a paragraph or section discussing exactly this. We mention "stubbing types" here and "stubbing function/methods" later, but what else might there be?
One pattern I like in design docs like this is to have a section clearly articulating what's in-scope and out-of-scope. This might be a way to do this. For example, in-scope are function/methods and stub sets, out-of-scope are replacing types.
And while I'm thinking about this, I wonder if (as part of this work) we should develop a library of safe stubs for standard library functions that users can opt-into. e.g. #[kani::stubs(kani::std_stubs)]
or something. Another idea besides Vec::with_capacity
might be how HashMap::new
works (e.g. removing/mitigating the randomness and thread local?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will clarify this.
In scope:
- Replacing function bodies
- Replacing method bodies (which means that the new method body will be executed, whether the method is invoked directly or through a vtable)
Out of scope:
- Replacing type definitions
- Replacing macro definitions
- Mocking traits
|
||
## Detailed design | ||
|
||
We expect that this feature will require changes primarily to `kani-compiler`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be understood as a change to kani-driver
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I'll clarify that.
``` | ||
|
||
The benefit is that stubs are specified per harness, and (using modules) it might be possible to group stubs together. | ||
The downside is that multiple annotations are required and the stub mappings themselves are remote from the harness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the downside. We don't lose the ability to still "cherry-pick" things from a module right? Stub-sets are just as "remote" as a module like this, no?
It seems like this might be a "nicer" and more Rust-friendly way of writing stub-sets and not actually have any real downsides that I can see...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some stronger downsides with this approach:
- How do you mock multiple functions with the same stub? (Especially if, say, harness A wants to use
stub1
to mockfoo
, and harness B wants to usestub1
to mockbar
.) - How do you combine stub sets defined via modules? (This is probably possible, but would need to be fleshed out.)
- Is it a regular module or not? Modules can contain other things than functions - how should we interpret them?
|
||
The advantage with this approach is, like source transformations, it would be very flexible. | ||
The downside is that it would require modifying `rustc` (as far as we know, there is not an API for plugging in a new AST/HIR pass), and would also require performing the transformations at a very syntactic level: although the AST/HIR would likely be easier to work with than source code directly, it is still very close to the source code and not very abstract. | ||
Furthermore, it would require that we have access to the AST/HIR for all external code, and---provided we supported stubbing across crate boundaries---we would need to figure out how to inject the AST/HIR from one crate into another (the AST/HIR is usually just constructed for the crate currently being compiled). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would need to figure out how to inject the AST/HIR from one crate into another
I'm not sure this is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC is focused on function stubbing. Can we add examples of method stubbing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tedinski We don't know until we compile later crates that earlier crates would need to have their HIR modified. Relatedly, a local abstraction would not be available while compiling a dependency.
@zhassan-aws We still need to iterate on exactly what the user experience for stubbing methods will be like. However, we can add an example of one approach to the Open Questions section.
Furthermore, it would require that we have access to the AST/HIR for all external code, and---provided we supported stubbing across crate boundaries---we would need to figure out how to inject the AST/HIR from one crate into another (the AST/HIR is usually just constructed for the crate currently being compiled). | ||
|
||
## Open questions | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can the user verify that the stub is an abstraction of the original function/method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add this - a good open question.
- How can we allow a user to stub a method and access private fields of `self`? | ||
This should be possible using `std::mem::transmute`, but can we hide this from the user and make it less brittle? | ||
|
||
## Future possibilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a limitations section? E.g.:
- This won't work with
--concrete-playback
for now. - We are only able to apply abstractions to dependencies if user enables MIR Linker.
A good example of this is compositional reasoning: in some harnesses, we want to prove properties of a particular function (and so want to use its actual implementation), and in other harnesses we want to assume that that function has those properties. | ||
|
||
Users will specify stubs by attaching the `#[kani::stub_by(<original>, <replacement>)]` attribute to each harness function. | ||
The arguments `original` and `replacement` give the names of functions (as string literals), relative to the crate of the harness (*not* relative to the module of the harness). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this correctly, something like this would not be supported:
Say I had both the rand
crate dependency and a local module:
mod rand {
pub fn random() ...
}
#[kani::stub_by("rand::random", "mock_random")] // resolves to `self::rand::random`, not `::rand::random`
In the same vein, we won't handle imports either?
use path::to::rand_thing as rand;
#[kani::stub_by("rand::random", "mock_random")] // resolves to `path::to::rand_thing::random`, not `::rand::random`
To me, using the existing Rust scoping rules would be the most natural way to reference things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's a good point. I don't know why we have this restriction here. @aaronbembenek-aws can please clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The restriction was there so that the provided names would match the identifiers as they appear in the MIR. However, I agree that it would be a better UX to use the Rust scoping rules. So, I think we should target that approach, and try to resolve names on our end.
|
||
## User experience | ||
|
||
This feature is currently limited to stubbing functions and methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning on enforcing visibility rules? In other words, are we going to allow stubbing a private function?
mod thing {
pub fn foo() {
bar();
}
fn bar() {
...
}
}
#[kani::stub_by("thing::bar", "my_bar")]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you will be able to stub a private function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a tradeoff of flexibility and maintainability. On the one hand, stubbing private functions makes it really easy for writing a new proof.
However, private functions don't have any guarantees on stability or behavior. Having to maintain a proof that stubs private functions could become really tedious every time you want to refactor or upgrade a dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are good points, and I've added them to the RFC. We are still planning to support stubbing private functions/methods (we think users will ultimately demand it), but in the documentation we will discourage users from doing so unless absolutely necessary.
|
||
```rust | ||
kani::stub_set! { | ||
my_io_stubs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these global symbols or resolved through the module path in which they're defined? I would prefer the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the making this point - I agree that the latter is preferable, and that's what we'll target.
|
||
1. a specified `replacement` stub does not exist; | ||
2. the user specifies conflicting stubs for the same harness (i.e., if the same `original` function is mapped to multiple `replacement` functions); or | ||
3. the signature of the `replacement` function is not compatible with the signature of the `original` function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it's in the open questions section, but we definitely need to know what "compatible" means. Arity is an obvious aspect. But things start getting tricky when we're talking about generic constraints. For example are these compatible?
pub fn random<T>() -> T
where Standard: Distribution<T>
pub fn random<T>() -> T
where T: kani::Arbitrary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to iterate on this and see what is technically feasible. From my perspective, the optimal outcome would be that a stub is compatible if the monomorphized program is well typed. Our current prototype supports the example you give.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very-well written! Great work, @aaronbembenek-aws!
|
||
#[cfg(kani)] | ||
#[kani::proof] | ||
#[kani::stub_by(rand::random, mock_random)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we shorten it to stub
? I think it's customary to drop the preposition from function names (e.g. copy
as opposed to copy_to
or swap
as opposed to swap_with
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think that's a good idea.
|
||
### Mocking deserialization | ||
|
||
In this example, we mock a [serde_json](https://crates.io/crates/serde_json) (96M downloads) deserialization method so that we can prove a property about the following [Firecracker function](https://github.com/firecracker-microvm/firecracker/blob/01eba51ded2f5439da91a2d73280f579651b067c/src/api_server/src/request/vsock.rs#L11) that parses a configuration from some raw data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice example, but it's a bit involved, and I don't think it's adding much value to the RFC. IMHO, the previous, example illustrates the idea well, and is simple enough to understand. This example would be nice for the documentation though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've decided to move this to the end of the RFC, as an appendix.
They will be resolved using Rust's standard name resolution rules; this includes supporting imports like `use foo::bar as baz`. | ||
The attribute may be specified multiple times per harness, so that multiple (non-conflicting) stub pairings are supported. | ||
|
||
For example, this code specifies that the function `mock_random` should be used in place of the function `rand::random` and the function `my_mod::foo` should be used in place of the function `my_mod::bar` for the harness `my_mod::my_harness`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code below replaces foo
with bar
, but the description says the opposite: foo
used in place of bar
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
``` | ||
|
||
We will support the stubbing of private functions and methods. | ||
While this provides flexibility that we believe will be necessary in practice, it can also lead to brittle proofs: private functions/methods can change or disappear in even minor version upgrades (thanks to refactoring), and so proofs that depend on them might have a high maintenance burden. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this turns out to be an issue, the user can always stub the public methods that are calling private functions/methods replacing those calls with calls to another function/method.
A similar mechanism can be used to aggregate stub sets: | ||
|
||
```rust | ||
kani::stub_set!() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This macro call has parentheses, whereas the previous one doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another good catch! We don't need parentheses here.
We will also need to update the metadata that `kani-compiler` generates, so that it maps each harness to the generated code that has the right stub mapping for that harness (since there will be multiple versions of generated code). | ||
The metadata will also list the stubs applied in each harness. | ||
`kani-driver` will need to be updated to process this new type of metadata and invoke the correct generated code for each harness. | ||
We can also update the results report to include the stubs that were used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be very helpful!
### Comparison to function contracts | ||
|
||
- The [currently proposed function contract mechanism](https://github.com/model-checking/kani/tree/features/function-contracts) does not provide a way to specify contracts on external functions. | ||
This is one of the key motivations for stubbing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say this is the key motivation because conceptually, it should be possible to allow contracts to be specified on external functions (which AFAIK, CBMC allows), but it hasn't been prioritized in the current implementation. I would say the key benefit of stubbing is the simplicity of replacing a function's implementation with another that is written in Rust. The replacement can be really be anything: an over-approximation, an under-approximation, or neither. On the other hand, the disadvantage of stubbing is that there's no easy mechanism to verify that a proof is sound with the specified stubs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are good points; I've rephrased them this way:
- Function contracts could in principle be applied to external functions. Doing so would require some additional UX design decisions ("How do users specify this?"); with stubbing, we do not have to distinguish between local and external functions.
- Function contracts sometimes come with a mechanism for verifying that a function satisfies its contract. We do not provide this functionality; however, it is possible to emulate it by writing proof harnesses comparing the behavior of the original code and the stub. Our approach provides more flexibility: we do not enforce a stub to be an overapproximation, or to have the same behavior in all harnesses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice doc!
As a convenience, we will provide a macro `kani::stub_set` that allows users to specify sets of stubs that can be applied to multiple harnesses: | ||
|
||
```rust | ||
kani::stub_set! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/bike-shed: I think this syntax isn't ideal. How much control over it do you have? If I'm not mistaken you're suggesting that the first item in the list is the name of the stub set, right?
Can we do something like kani::stub_set!(my_io_stubs) { ... the contents ... }
or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, the first item would be the name of the stub set. How about kani::stub_set!(my_io_stubs, { ... the contents ... });
? We should be able to support that. I believe there needs to be a single outer set of parentheses or braces enclosing everything (the name, the contents, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about: kani::stub_set! { my_io_stubs: ... }
? This would avoid having two types of delimiters ({}
and ()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fzaiser! I think this is a pretty good solution. I'm going to go with it for now; we can always iterate on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually not a big fan of using :
as a separator. It looks like a typo for me. I would prefer regular rust separators, ,
, or an aggregator: {}
or ()
. E.g.: like:
kani::stub_set!{my_io_stubs {
... the contents ...
}};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we use something that rustfmt will accept too 🙂. It basically needs to look like an existing construct for it to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@camshaft That's a good point about rustfmt. Apparently it doesn't format code in macro invocations between braces. This syntax (based on @celinval's) works well with rustfmt:
kani::stub_set!(my_io_stubs(
stub(std::fs::read, my_read),
stub(std::fs::write, my_write),
));
- It would probably make sense to provide a library of common stubs for users, since many applications might want to stub the same functions and mock the same behaviors (e.g., `rand::random` can be replaced with a function returning `kani::any`). | ||
- How can we provide a good user experience for accessing private fields of `self` in methods? | ||
It is possible to do so using `std::mem::transmute` (see below); this is clunky and error-prone, and it would be good to provide better support for users. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should eventually add 3 types of automatic stub generation.
- unreachable: Assert the function is unreachable.
- havoc_locals: Return non-det values and assign non-det to all mutable argument.
- havoc: Similar to havoc_locals but also assign non-det to all mutable global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the RFC overall looks good. I'm OK iterating over some of the details (such as the stub set syntax) as we move along as long.
73f16f3
to
9d69b62
Compare
Co-authored-by: Celina G. Val <celinval@amazon.com>
Co-authored-by: Zyad Hassan <88045115+zhassan-aws@users.noreply.github.com>
9d69b62
to
31888a7
Compare
Description of changes:
Create an RFC for the UX and backend implementation of function and method stubbing. This would allow users to specify that, during verification, some functions/methods should be replaced with other functions/methods. These substitutions would help Kani handle code that has unsupported features (like inline assembly) or leads to bad verification performance.
I have created a proof-of-concept of the backend MIR-to-MIR transformation; the stub mapping is read from a file.
You can also find a rendered version of this document here.
Related issues:
Resolves #1695
Related to #1809
Call-outs:
Testing:
How is this change tested?
Is this a refactor change?
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.