Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure mutated code is always valid #301

Merged
merged 2 commits into from Apr 12, 2018
Merged

Ensure mutated code is always valid #301

merged 2 commits into from Apr 12, 2018

Conversation

maks-rafalko
Copy link
Member

@maks-rafalko maks-rafalko commented Apr 9, 2018

This PR adds a check in unit tests that mutated code is always valid.

Extracted from #262 and this comment in particular

For reference:

Created Mutant with invalid syntax must be considered as a bug in Infection.

Concerns about this PR:

  • It slows down Mutators tests (from 5s to 15s)
  • I'm not sure it's the best way of PHP linting, but this is the simplest one that I was able to find

With this new assertion we are sure that at least tested cases produce valid code for each Mutator.

Thoughts?

@@ -27,13 +27,19 @@ public function provideMutationCases(): \Generator
<<<'PHP'
<?php

(yield $a => $b);
function test()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change, this mutator was failing with PHP Fatal error: The "yield" expression can only be used inside a function in /tmp/XXX during linting

@sanmai
Copy link
Member

sanmai commented Apr 10, 2018

Because of the slowdowns IMO we should not do this in general and by default. By request and during CI - that's sure.

Another way to lint is to use fork - eval($code) - waitpid cycle. And then look at the exit status pcntl_wexitstatus to see if there was an error or now. Since this is our code, we can guarantee lack ofside effects, and with no side effects we can expect no infinite cycles. This will be as expensive as a fork call. I believe exec is more costly. Please disregard my previous comment.

Scheme with fork could be even faster if we won't wait for a single process to finish, but rather collect all exit statuses later at some point.

@theofidry
Copy link
Member

theofidry commented Apr 10, 2018

I'm not a big fan :/

While I do agree it would be better to keep generating valid code, there is two things I'm not fond of in that PR:

  • It is slower. The point of avoiding to generate redundant or invalid mutants would be to be faster because evaluating them would be more expansive than checking if they should be created in the first place; It doesn't look like to be the case here as highlighted by @sanmai and your perf results
  • It relies on the file-system. This is less of a concern right now but in the future it would be nice if we could do some monkey-patching via the autoloading rather than dumping the file and be dependent on the FS which is harder to parallelise properly and slower. That said we're not there yet so it's not the main point here

Maybe an alternative would be to try to be smarter in the way we generate the mutants? e.g. not removing a return statement if a return value is expected and given in the signature.

@sanmai
Copy link
Member

sanmai commented Apr 10, 2018

Here's a PoC of a forking linter:

function eval_fork($code)
{
    $pid = pcntl_fork();

    if ($pid == 0) {
        // Child
        set_time_limit(1);
        error_reporting(0);
        eval($code);
        exit();
    }

    if ($pid == -1) {
        // fork() failed
        return false;
    }

    // we are the parent
    pcntl_wait($status);

    return 0 == $status;
}

var_dump(eval_fork('$a = 1;')); // true

var_dump(eval_fork('$a = 1 /* ; */')); // false

$code = '<?php if (false) {class A extends B {}}';
var_dump(eval_fork('?>' . $code)); // true

One call above takes approximately 11 ms.


private function assertSyntaxIsValid(string $realMutatedCode)
{
exec(sprintf('echo %s | php -l', escapeshellarg($realMutatedCode)), $output, $returnCode);
Copy link
Member

@sanmai sanmai Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proc_open seems like a good fit here; there will be more LOC but less syscalls

I'd try with eval_fork first, though.

@theofidry
Copy link
Member

@sanmai should we start to investigate solutions like amphp? I used them for Box to process stuff in parallel and so far it was pretty good. Much more performant that whatever was done with the Symfony Process which has a design flaw for parallel processing

@maks-rafalko
Copy link
Member Author

maks-rafalko commented Apr 10, 2018

@theofidry, unfortunately, I didn't get your comment at all.

It is slower. The point of avoiding to generate redundant or invalid mutants would be to be faster because evaluating them would be more expansive than checking if they should be created in the first place;

I would like to highlight that this PR checks PHP code validity only in the PHPUnit Test Suite. Nothing has been changed in the infection itself. We don't lint Mutant code in the normal usage.

This PR just double checks that Mutator generate the valid syntax in each Test Case in @dataProvider.

It relies on the file-system.

What do you mean here? exec(sprintf('echo %s | php -l', escapeshellarg($realMutatedCode)), $output, $returnCode); uses STDIN as far as I can say. No code is persisted to the FS in these tests.

Maybe an alternative would be to try to be smarter in the way we generate the mutants? e.g. not removing a return statement if a return value is expected and given in the signature.

That's right. And adding a test, that confirms mutated code is valid, is a great addition IMO


@sanmai thanks for the suggestions, I will check it when I have a chance. From what I can see now, exec() takes 100ms on my machine, so seems like your solution is much faster.

The only one concern here is that the code is evaluated (with eval()). I'm pretty sure there can be some difficulties with it, e.g. with autoloading. But it requires testing for sure

@sanmai
Copy link
Member

sanmai commented Apr 11, 2018

I'm pretty sure there can be some difficulties with it, e.g. with autoloading.

We can no-op the code with if (false) { ... code ... } and/or with function() { ... code ... };, or like so.

var_dump(eval_fork('class A extends B {}')); // false
var_dump(eval_fork('if (false) {class A extends B {}}')); // true
var_dump(eval_fork('if (false) {yield true;}')); // false
var_dump(eval_fork('if (false) {function () {yield true;};}')); // true

Doing include $tmpfile; instead of eval() won't help much. You would have to apply these workarounds this way or another.

@sanmai
Copy link
Member

sanmai commented Apr 11, 2018

function proc_open_linter($code) {
    // We have to override stdout and stderr here,
    // else they'll get connected to *our* stdin/stderr
    // flooding them with errors
    $process = proc_open('php -l', [
        ['pipe', 'r'],
        ['pipe', 'w'],
        ['pipe', 'r'],
    ], $pipes);

    if (!is_resource($process)) {
        return false;
    }

    // Pass our code
    fwrite($pipes[0], $code);
    fclose($pipes[0]);

    // PHP won't write anything of interest into stderr,
    // no point in looking there at:
    // "Errors parsing Standard input code"

    return proc_close($process) == 0;
}

var_dump(proc_open_linter('<?php return 1;')); // bool(true)
var_dump(proc_open_linter('<?php yield true;')); // bool(false)
var_dump(proc_open_linter('<?php var_dump(true); var_dump(true)')); // bool(false)

I get about 40-50 ms per invocation

@theofidry
Copy link
Member

@borNfreee nevermind my comment; I should have checked the diff :)

@maks-rafalko
Copy link
Member Author

maks-rafalko commented Apr 11, 2018

@sanmai with proc_open_linter I have the same 100ms as with exec(), it has the same performance.

I would stick to exec() then, because:

  • it's a one-liner
  • it takes the same time
  • it works even on windows
  • it displayer error (Fatal error: The "yield" expression can only be used inside a function in) automatically while proc_open_linter does not

Regarding pcntl_fork(): it requires additional extension to be installed. Neither my local PHP setup nor our Docker images have it (and I pretty much sure many of the developers as well). So this is an additional dependency that we can avoid

@maks-rafalko maks-rafalko added this to the 0.9.0 milestone Apr 11, 2018
@maks-rafalko maks-rafalko removed the RFC label Apr 11, 2018
@maks-rafalko
Copy link
Member Author

Thank you @sanmai for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants