Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutagen architecure proposal #142

Closed
samuelpilz opened this issue Feb 18, 2019 · 21 comments
Closed

Mutagen architecure proposal #142

samuelpilz opened this issue Feb 18, 2019 · 21 comments

Comments

@samuelpilz
Copy link
Contributor

samuelpilz commented Feb 18, 2019

Hi, I've been working on re-structuring your project mutagen to be more production ready.
https://github.com/power-fungus/mutagen-preview

I spent the last weeks reading into the current project and re-writing parts of mutagen using the same interface, but with a different architecture. It's not completed yet but should give a rough outline of what the finished project will look like: the backbone and glue code already exists and will probably not change too widely.

@llogiq I would like to know if you are interested in my approach and if you would consider merging my proposal in your project. If so, I will finish this preview into production quality in the next months. Also I am open to any discussion about interface and design decisions.

I would like to have this rewrite published (once finished) under the "brand" of mutagen:

  • The interface is the same
    • mutators are introduced using the attribute #[mutate]
    • mutations are selected at run-time using the environment variable MUTATION_ID (renamed for clarity)
    • The target code is compiled only once
  • the rewrite is true to the original goals of mutagen
  • the used technologies is the same to the code in the proc-macro-attribute branch
    • new procedural macro attribute
    • uses syn and quote

For details about the architecture, see: https://github.com/power-fungus/mutagen-preview/tree/master/docs
Main Benefits are:

  • Debugging of the code transformation will be easier due to modular transformation
  • The behavior of the mutations can be tested a standard test suite by faking the global MUTATION_ID variable.
  • it will be possible to add customization via arguments to the #[mutate] attribute
  • community collaboration will be easier since mutators and transformers are small bits of code.
  • mutators can be documented independent from each other.

@llogiq I would like to read from you soon! :)

@llogiq
Copy link
Owner

llogiq commented Feb 18, 2019

The code can be tested using what?

I'm a bit short on time right now, but I'll review your code during the next week, I promise!

@samuelpilz
Copy link
Contributor Author

Next week is fine 👍
I updated the post above to complete the previously incomplete sentence about testing

@samuelpilz
Copy link
Contributor Author

misclick :(

@llogiq
Copy link
Owner

llogiq commented Feb 28, 2019

So if I understand your proposal right, you want to decouple various mutation-embedding parts of the plugin so that each implements its own Fold.

What I like:

  • The code looks pretty simple
  • The transformation abstraction affords us a structured way to switch transformations on or off (on the other hand, we also could extend our current restrictions system for that)
  • Future transformations should be easy enough to extend

What I dislike:

  • The transformers run one after the other. This means our current way of collecting restrictions to avoid duplicate mutations will become much more complex (essentially requiring us to store a mapping of AST nodes to restrictions) and be transformer order dependent
  • As the transformers make multiple passes over the AST, this variant will likely be slower than the current one-pass solution. I haven't measured though, it could be completely OK.

What I'm unsure about:

  • Splitting the plugin into transformers may require us to put more stuff into the shared global info, requiring more mutex accesses. I don't think that rustc currently parallelizes proc macro expansion, but based on discussions with matklad, other frontends likely will, so contention might become a problem
  • Similarly how do we share code between transformers?

@samuelpilz
Copy link
Contributor Author

Thank you for your review. For the most part, we come to the same conclusions about my proposal. Some opinions:

  • I also dislike that transformers run after each other without sharing information about what has been transformed already. I have not yet figured out a good way of dealing with this yet. For the time, I am happy with fixing the order of transformers since this gives the most repeatable results. I am open to comments on how to fix this, but think that this is not an issue for most transformers.
  • Modular code for the transformers is necessary. Having a single function body for all mutations is not sustainable for 20-50 transformers.
  • Having this simple structure for transformers gives a higher chance of community contributions for innovative transformers. When the interface of transformers becomes established, I want to write documentation "how to contribute a mutation".
  • Run-time is not an issue. The performance is dominated by its compilation and the compilation of syn. I do not worry about too much mutex contention or multiple AST passes.

How should we go from here? I would like to resolve any issues with you before finishing my implementation.

@llogiq
Copy link
Owner

llogiq commented Mar 7, 2019

I think we can have the best of both worlds. We would need to create our own trait that deviates from Fold. One problem with Fold is that we can only fold one expr variant into the same variant, where we usually want to return a different one. Changing those functions to always return Expr and doing the variant dispatch in a default function should simplify the code quite a bit. Then we need our own folder that successively calls all mutators (unless they are blocked by the restrictions, which it can also check to keep the mutators simple) on an expr or function body block.

In general, we have three kinds of mutators: Statement mutators (not yet implemented), Expr mutators (see above) and function body block mutators. The Mutator trait should reflect this.

@samuelpilz
Copy link
Contributor Author

I do not think the restriction to only convert each variant of the AST to one of the same type is a big issue - semantically, this has to happen anyway for the generated code to be correct.
I can think of impl TryFrom<Expr> for LitInt to shorten the quite large pattern necessary for MutagenTransformerLitInt, but I think this is not an architecure-question and can be addressed once the more important shortcomings have been addressed.

I am currently looking out for a good restriction-system. I did not really understand what you did in the current version. Could you explain that system to me or point to the relevant parts in the code?
However, I am in favor of fixing the order of mutators and currently no restrictions are necessary. It may be a good idea to defer the design of a restriction system to when one is really needed - so that we know exactly what features are required from such a system.

I am in favor of distinguishing different kinds of mutators and have tried to do so, but have not yet found a satisfying design. I have implemented a statement mutator, but the generated code is not like the other (expr-) mutators. It would be great if we could find something that works the same as the following snippet. (This is the current implementation of MutagenTransformerStmt)

if /* check if mutation is active */ {
  // run statement
} // else: do not run statement

PS: I am currently working on mutation of the + operator within the proposed system. I "stole" your idea for specialization for this task and it seems to work better than expected - will do some tests soon.

@llogiq
Copy link
Owner

llogiq commented Mar 9, 2019

The idea behind the restrictions system is that we may sometimes have two mutations that lead to the same outcome. For example, if we mutate a && b to false, it's useless to mutate a or b to false separately. So we want to remove those equal mutations and the restrictions system does exactly that.

Statement mutations are really hard, because you have to keep track of so many things to keep the build from failing. I have a draft blog post with all kinds of corner cases you have to care about, perhaps I should post it.

@samuelpilz
Copy link
Contributor Author

Personally, I have no problems with duplicate mutations and have decided not to include logic to prevent this. I think this will not affect the overall mutation coverage too much and will not be a large problem when reviewing the survived mutants.
Other mutation testing frameworks I know also do not care about duplicate mutaitons.

Since almost everything in rust is a expression, expressions mutators will get quite far and I will focus on implementing more of them next.

@llogiq
Copy link
Owner

llogiq commented Mar 9, 2019

There was a paper from google that introduced the duplicate avoidance logic. This has multiple benefits: Duplicate mutations cost time to run, skew the coverage results and are actually quite easy to avoid (as in the above example; a simple bitset goes a long way). There's a mutation testing implementer's slack that I frequent who eagerly discuss those results, and I know of 3 frameworks that either already have implemented something like this or have it on the roadmap.

Yes, expression mutations already go a long way, but I'd love to see statement mutations. Gotta get that blog post out first.

@samuelpilz
Copy link
Contributor Author

Interesting. If you could point me to some source (academic or not), I will habe a look.

Would you think it makes sense to continue my effort with my current approach?

@samuelpilz
Copy link
Contributor Author

Maybe I should clarify my preferred approach:

I am not opposed to duplication-detection, but do not see this as an immediate issue that needs to be solved within this proposal. It can be an item on the roadmap.
I am positive that the propsed system for transformers makes it possible to integrate such a system.

@llogiq
Copy link
Owner

llogiq commented Mar 11, 2019

Again, I'd like to change the approach so that the expressions are walked in post-order (first all subexpressions are mutated, then all mutators for the current expressions are called). Also pulling the dispatch between expression types into the mutator calling logic will make the mutator code simpler, which is a win in my book. Otherwise I'm completely fine with this proposal.

By the way, here is the blog post I promised.

@samuelpilz
Copy link
Contributor Author

I think I understand your suggestion and have implemented the approach how I understand it. Some documentation is still required. Could you have a look at the different implementation in transformer.rs?

For now, I removed the statement-mutator until we agree on a comprehensive plan how to go forward.

@llogiq
Copy link
Owner

llogiq commented Mar 13, 2019

fold_expr still relies on the Fold trait. This is problematic because Fold is inherently recursive, and we want to pull the recursion up into the caller. I'd rather have a trait Tansformer with fn transform_expr_box(e: ExprBox) -> Expr { ExprBox(e) (and so on) and one fn transform_expr(e: Expr) -> Expr that contains a big match to call the trait functions. Then we can impl Fold for the TransformerBundle to call the transformers in the specified order. I'm unsure if we should add &mut self to the API (like with Fold) or rely on lazy_statics to thread information between transformer calls, but lean towards the former.

With those nits out of he way, I'll gladly accept a PR to change mutagen over to your architecture.

@samuelpilz
Copy link
Contributor Author

I am pretty sure I have already implemented your suggestions. The transformers do not implement Fold any more and only operate on one Expr value. They are not supposed to recurse on the contents of this value, although this cannot be enforced by the signature. The transformers are applied in post-order of the expr-nodes in the AST

I have:

/// trait that is implemented by all transformers.
///
/// each transformer should not inspect the expression recursively since recursion is performed by the `MutagenTransformerBundle`
pub trait MutagenExprTransformer {
    fn map_expr(&mut self, expr: Expr) -> Expr;
}

impl Fold for MutagenTransformerBundle {
    fn fold_expr(&mut self, e: Expr) -> Expr {
        // transform content of the expression first
        let mut result = match e {
            // recurse on ALL patterns
        };

        // call all transformers on this expression
        for t in &mut self.expr_transformers {
            result = t.map_expr(result);
        }
        result
    }
}

@llogiq
Copy link
Owner

llogiq commented Mar 15, 2019

Ah, I misread the code. Ok then. I still one problem: The lit_int transformer only deals with the literal, it cannot know that the literal is part of an ExprUnary(UnOp::Neg, _), which it still must take into account. So we need to extend our interface a bit. We could for example allow transformers to see the parent Expr if any. That would allow the lit_int transformer to not transform ExprLit within ExprUnarys and instead transform the latter wholesale.

@samuelpilz
Copy link
Contributor Author

samuelpilz commented Mar 15, 2019

Regarding the transformer lit_int: I actually did this on purpose. Every mutation on -x. can be replaced by a mutation on x, possibly adding an additional minus.

Seeing the parent Expr is one solution for this. Maybe an additional transformer for negative literals and some restriction system could have the effect you think of. However, this will not be the next priority for me

@llogiq
Copy link
Owner

llogiq commented Mar 25, 2019

Ok then. I think we should move forward with this – I can add extensions I feel are warranted later. Can you set up a pull request so we can get this merged?

@samuelpilz
Copy link
Contributor Author

I will try to get a pull-request ready in the next few days.

@samuelpilz
Copy link
Contributor Author

Closing this issue after merge of pr #149 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants