Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for Function and Method Stubbing #1723

Merged
merged 37 commits into from Oct 25, 2022

Conversation

aaronbembenek-aws
Copy link
Contributor

@aaronbembenek-aws aaronbembenek-aws commented Sep 27, 2022

Description of changes:

Create an RFC for the UX and backend implementation of function and method stubbing. This would allow users to specify that, during verification, some functions/methods should be replaced with other functions/methods. These substitutions would help Kani handle code that has unsupported features (like inline assembly) or leads to bad verification performance.

I have created a proof-of-concept of the backend MIR-to-MIR transformation; the stub mapping is read from a file.

You can also find a rendered version of this document here.

Related issues:

Resolves #1695

Related to #1809

Call-outs:

Testing:

  • How is this change tested?

  • Is this a refactor change?

Checklist

  • Each commit message has a non-empty body, explaining why the change was made
  • Methods or procedures are documented
  • Regression or unit tests are included, or existing tests cover the modified code
  • My PR is restricted to a single feature or bugfix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

Copy link
Contributor

@celinval celinval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I haven't reviewed everything, but I left a few comments. Thanks

rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Show resolved Hide resolved
Copy link
Contributor

@adpaco-aws adpaco-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great overall! Just have a couple more comments.

rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
@adpaco-aws adpaco-aws changed the title Create RFC for Function and Method Stubbing RFC for Function and Method Stubbing Sep 30, 2022
@adpaco-aws adpaco-aws added the T-RFC Label RFC PRs and Issues label Sep 30, 2022
@aaronbembenek-aws aaronbembenek-aws marked this pull request as ready for review October 3, 2022 20:40
@aaronbembenek-aws aaronbembenek-aws requested a review from a team as a code owner October 3, 2022 20:40
Copy link
Contributor

@celinval celinval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to make any change to the results report to highlight that a stub was used?


Under this substitution, Kani has a single check, which proves that the assertion can fail. Verification time is 0.02 seconds.

### Mocking deserialization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool.


To teach this feature, we will update the documentation with a section on function and method stubbing, including simple examples showing how stubbing can help Kani handle code that currently cannot be verified.

## Detailed design
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will likely need some changes to the metadata so we can tell kani-driver which files were generated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. I will add this.


## Rationale and alternatives: stubbing mechanism

Our approach is based on a MIR-to-MIR transformation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would highlight that another benefit is that we can extend our compiler to integrate --concrete-playback with the abstractions. But that this is out of the scope for this work.

fn my_harness() { ... }
```

A similar mechanism can be used to aggregate stub sets:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Makes sense to explicitly write stub_by(...) because of this.

Requiring the replacement's type to be a subtype of the original type is likely stronger than what we want.
For example, if the original function is polymorphic but monomorphized to only a single type, then it seems okay to replace it with a function that matches the monomorphized type.
- How can we allow a user to stub a method and access private fields of `self`?
This should be possible using `std::mem::transmute`, but can we hide this from the user and make it less brittle?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's good for all cases, but one possible approach here is instrumentation.

3. Users could use stubbing to perform compositional reasoning: prove the behavior of a function/method `f`, and then in other proofs---that call `f` indirectly---use a stub of `f` that mocks that behavior but is less complex.

In all cases, stubbing would enable users to verify code that cannot currently be verified by Kani (or at least not within a reasonable resource bound).
Even without stubbing types, the ability to stub functions/methods can help provide verification-friendly abstractions for standard data structures.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even without stubbing types

purely as a readability / flow nit: this starts talking about a case before you've really defined what these different types of stubbing are.

As I read on, I think this RFC is missing a paragraph or section discussing exactly this. We mention "stubbing types" here and "stubbing function/methods" later, but what else might there be?

One pattern I like in design docs like this is to have a section clearly articulating what's in-scope and out-of-scope. This might be a way to do this. For example, in-scope are function/methods and stub sets, out-of-scope are replacing types.

And while I'm thinking about this, I wonder if (as part of this work) we should develop a library of safe stubs for standard library functions that users can opt-into. e.g. #[kani::stubs(kani::std_stubs)] or something. Another idea besides Vec::with_capacity might be how HashMap::new works (e.g. removing/mitigating the randomness and thread local?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will clarify this.

In scope:

  • Replacing function bodies
  • Replacing method bodies (which means that the new method body will be executed, whether the method is invoked directly or through a vtable)

Out of scope:

  • Replacing type definitions
  • Replacing macro definitions
  • Mocking traits


## Detailed design

We expect that this feature will require changes primarily to `kani-compiler`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be understood as a change to kani-driver as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I'll clarify that.

```

The benefit is that stubs are specified per harness, and (using modules) it might be possible to group stubs together.
The downside is that multiple annotations are required and the stub mappings themselves are remote from the harness.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the downside. We don't lose the ability to still "cherry-pick" things from a module right? Stub-sets are just as "remote" as a module like this, no?

It seems like this might be a "nicer" and more Rust-friendly way of writing stub-sets and not actually have any real downsides that I can see...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some stronger downsides with this approach:

  1. How do you mock multiple functions with the same stub? (Especially if, say, harness A wants to use stub1 to mock foo, and harness B wants to use stub1 to mock bar.)
  2. How do you combine stub sets defined via modules? (This is probably possible, but would need to be fleshed out.)
  3. Is it a regular module or not? Modules can contain other things than functions - how should we interpret them?


The advantage with this approach is, like source transformations, it would be very flexible.
The downside is that it would require modifying `rustc` (as far as we know, there is not an API for plugging in a new AST/HIR pass), and would also require performing the transformations at a very syntactic level: although the AST/HIR would likely be easier to work with than source code directly, it is still very close to the source code and not very abstract.
Furthermore, it would require that we have access to the AST/HIR for all external code, and---provided we supported stubbing across crate boundaries---we would need to figure out how to inject the AST/HIR from one crate into another (the AST/HIR is usually just constructed for the crate currently being compiled).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would need to figure out how to inject the AST/HIR from one crate into another

I'm not sure this is true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC is focused on function stubbing. Can we add examples of method stubbing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tedinski We don't know until we compile later crates that earlier crates would need to have their HIR modified. Relatedly, a local abstraction would not be available while compiling a dependency.

@zhassan-aws We still need to iterate on exactly what the user experience for stubbing methods will be like. However, we can add an example of one approach to the Open Questions section.

Furthermore, it would require that we have access to the AST/HIR for all external code, and---provided we supported stubbing across crate boundaries---we would need to figure out how to inject the AST/HIR from one crate into another (the AST/HIR is usually just constructed for the crate currently being compiled).

## Open questions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can the user verify that the stub is an abstraction of the original function/method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add this - a good open question.

- How can we allow a user to stub a method and access private fields of `self`?
This should be possible using `std::mem::transmute`, but can we hide this from the user and make it less brittle?

## Future possibilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a limitations section? E.g.:

  • This won't work with --concrete-playback for now.
  • We are only able to apply abstractions to dependencies if user enables MIR Linker.

A good example of this is compositional reasoning: in some harnesses, we want to prove properties of a particular function (and so want to use its actual implementation), and in other harnesses we want to assume that that function has those properties.

Users will specify stubs by attaching the `#[kani::stub_by(<original>, <replacement>)]` attribute to each harness function.
The arguments `original` and `replacement` give the names of functions (as string literals), relative to the crate of the harness (*not* relative to the module of the harness).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, something like this would not be supported:

Say I had both the rand crate dependency and a local module:

mod rand {
    pub fn random() ...
}

#[kani::stub_by("rand::random", "mock_random")] // resolves to `self::rand::random`, not `::rand::random`

In the same vein, we won't handle imports either?

use path::to::rand_thing as rand;

#[kani::stub_by("rand::random", "mock_random")] // resolves to `path::to::rand_thing::random`, not `::rand::random`

To me, using the existing Rust scoping rules would be the most natural way to reference things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a good point. I don't know why we have this restriction here. @aaronbembenek-aws can please clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restriction was there so that the provided names would match the identifiers as they appear in the MIR. However, I agree that it would be a better UX to use the Rust scoping rules. So, I think we should target that approach, and try to resolve names on our end.


## User experience

This feature is currently limited to stubbing functions and methods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on enforcing visibility rules? In other words, are we going to allow stubbing a private function?

mod thing {
    pub fn foo() {
        bar();
    }

    fn bar() {
        ...
    }
}

#[kani::stub_by("thing::bar", "my_bar")]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you will be able to stub a private function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a tradeoff of flexibility and maintainability. On the one hand, stubbing private functions makes it really easy for writing a new proof.

However, private functions don't have any guarantees on stability or behavior. Having to maintain a proof that stubs private functions could become really tedious every time you want to refactor or upgrade a dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good points, and I've added them to the RFC. We are still planning to support stubbing private functions/methods (we think users will ultimately demand it), but in the documentation we will discourage users from doing so unless absolutely necessary.


```rust
kani::stub_set! {
my_io_stubs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these global symbols or resolved through the module path in which they're defined? I would prefer the latter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the making this point - I agree that the latter is preferable, and that's what we'll target.


1. a specified `replacement` stub does not exist;
2. the user specifies conflicting stubs for the same harness (i.e., if the same `original` function is mapped to multiple `replacement` functions); or
3. the signature of the `replacement` function is not compatible with the signature of the `original` function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it's in the open questions section, but we definitely need to know what "compatible" means. Arity is an obvious aspect. But things start getting tricky when we're talking about generic constraints. For example are these compatible?

pub fn random<T>() -> T
    where Standard: Distribution<T>
pub fn random<T>() -> T
    where T: kani::Arbitrary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to iterate on this and see what is technically feasible. From my perspective, the optimal outcome would be that a stub is compatible if the monomorphized program is well typed. Our current prototype supports the example you give.

Copy link
Contributor

@zhassan-aws zhassan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very-well written! Great work, @aaronbembenek-aws!


#[cfg(kani)]
#[kani::proof]
#[kani::stub_by(rand::random, mock_random)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we shorten it to stub? I think it's customary to drop the preposition from function names (e.g. copy as opposed to copy_to or swap as opposed to swap_with).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think that's a good idea.


### Mocking deserialization

In this example, we mock a [serde_json](https://crates.io/crates/serde_json) (96M downloads) deserialization method so that we can prove a property about the following [Firecracker function](https://github.com/firecracker-microvm/firecracker/blob/01eba51ded2f5439da91a2d73280f579651b067c/src/api_server/src/request/vsock.rs#L11) that parses a configuration from some raw data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice example, but it's a bit involved, and I don't think it's adding much value to the RFC. IMHO, the previous, example illustrates the idea well, and is simple enough to understand. This example would be nice for the documentation though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've decided to move this to the end of the RFC, as an appendix.

rfc/src/rfcs/0002-function-stubbing.md Outdated Show resolved Hide resolved
They will be resolved using Rust's standard name resolution rules; this includes supporting imports like `use foo::bar as baz`.
The attribute may be specified multiple times per harness, so that multiple (non-conflicting) stub pairings are supported.

For example, this code specifies that the function `mock_random` should be used in place of the function `rand::random` and the function `my_mod::foo` should be used in place of the function `my_mod::bar` for the harness `my_mod::my_harness`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code below replaces foo with bar, but the description says the opposite: foo used in place of bar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

```

We will support the stubbing of private functions and methods.
While this provides flexibility that we believe will be necessary in practice, it can also lead to brittle proofs: private functions/methods can change or disappear in even minor version upgrades (thanks to refactoring), and so proofs that depend on them might have a high maintenance burden.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this turns out to be an issue, the user can always stub the public methods that are calling private functions/methods replacing those calls with calls to another function/method.

A similar mechanism can be used to aggregate stub sets:

```rust
kani::stub_set!() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro call has parentheses, whereas the previous one doesn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good catch! We don't need parentheses here.

We will also need to update the metadata that `kani-compiler` generates, so that it maps each harness to the generated code that has the right stub mapping for that harness (since there will be multiple versions of generated code).
The metadata will also list the stubs applied in each harness.
`kani-driver` will need to be updated to process this new type of metadata and invoke the correct generated code for each harness.
We can also update the results report to include the stubs that were used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be very helpful!

### Comparison to function contracts

- The [currently proposed function contract mechanism](https://github.com/model-checking/kani/tree/features/function-contracts) does not provide a way to specify contracts on external functions.
This is one of the key motivations for stubbing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say this is the key motivation because conceptually, it should be possible to allow contracts to be specified on external functions (which AFAIK, CBMC allows), but it hasn't been prioritized in the current implementation. I would say the key benefit of stubbing is the simplicity of replacing a function's implementation with another that is written in Rust. The replacement can be really be anything: an over-approximation, an under-approximation, or neither. On the other hand, the disadvantage of stubbing is that there's no easy mechanism to verify that a proof is sound with the specified stubs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good points; I've rephrased them this way:

  • Function contracts could in principle be applied to external functions. Doing so would require some additional UX design decisions ("How do users specify this?"); with stubbing, we do not have to distinguish between local and external functions.
  • Function contracts sometimes come with a mechanism for verifying that a function satisfies its contract. We do not provide this functionality; however, it is possible to emulate it by writing proof harnesses comparing the behavior of the original code and the stub. Our approach provides more flexibility: we do not enforce a stub to be an overapproximation, or to have the same behavior in all harnesses.

Copy link
Contributor

@tedinski tedinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice doc!

As a convenience, we will provide a macro `kani::stub_set` that allows users to specify sets of stubs that can be applied to multiple harnesses:

```rust
kani::stub_set! {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/bike-shed: I think this syntax isn't ideal. How much control over it do you have? If I'm not mistaken you're suggesting that the first item in the list is the name of the stub set, right?

Can we do something like kani::stub_set!(my_io_stubs) { ... the contents ... } or similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, the first item would be the name of the stub set. How about kani::stub_set!(my_io_stubs, { ... the contents ... });? We should be able to support that. I believe there needs to be a single outer set of parentheses or braces enclosing everything (the name, the contents, etc.).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about: kani::stub_set! { my_io_stubs: ... }? This would avoid having two types of delimiters ({} and ()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fzaiser! I think this is a pretty good solution. I'm going to go with it for now; we can always iterate on it.

Copy link
Contributor

@celinval celinval Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not a big fan of using : as a separator. It looks like a typo for me. I would prefer regular rust separators, ,, or an aggregator: {} or (). E.g.: like:

kani::stub_set!{my_io_stubs {
 ... the contents ... 
}};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we use something that rustfmt will accept too 🙂. It basically needs to look like an existing construct for it to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camshaft That's a good point about rustfmt. Apparently it doesn't format code in macro invocations between braces. This syntax (based on @celinval's) works well with rustfmt:

kani::stub_set!(my_io_stubs(
    stub(std::fs::read, my_read),
    stub(std::fs::write, my_write),
));

- It would probably make sense to provide a library of common stubs for users, since many applications might want to stub the same functions and mock the same behaviors (e.g., `rand::random` can be replaced with a function returning `kani::any`).
- How can we provide a good user experience for accessing private fields of `self` in methods?
It is possible to do so using `std::mem::transmute` (see below); this is clunky and error-prone, and it would be good to provide better support for users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should eventually add 3 types of automatic stub generation.

  • unreachable: Assert the function is unreachable.
  • havoc_locals: Return non-det values and assign non-det to all mutable argument.
  • havoc: Similar to havoc_locals but also assign non-det to all mutable global.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another one is uninterpreted: treat function as an uninterpreted function (suggested by @weaversa here).

Copy link
Contributor

@celinval celinval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the RFC overall looks good. I'm OK iterating over some of the details (such as the stub set syntax) as we move along as long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-RFC Label RFC PRs and Issues
Projects
No open projects
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.

RFC: Function stubbing
7 participants