[RFC] Improve CLI report #222

theofidry · 2018-03-14T22:46:13Z

After playing a bit for a demo I found a few things that were either bugging me or that I think are a bit confusing.

This discussion was on Slack initially but I think putting it here is more appropriate. So right now here's the current CLI report:

Current report

You are running Infection with xdebug enabled.
    ____      ____          __  _
   /  _/___  / __/__  _____/ /_(_)___  ____
   / // __ \/ /_/ _ \/ ___/ __/ / __ \/ __ \
 _/ // / / / __/  __/ /__/ /_/ / /_/ / / / /
/___/_/ /_/_/  \___/\___/\__/_/\____/_/ /_/

Running initial test suite...

PHPUnit version: 7.0.2

    8 [============================] < 1 sec

Generate mutants...

Processing source code files: 2/2
Creating mutated files and processes: 6/6
.: killed, M: escaped, S: uncovered, E: fatal error, T: timed out

M.....                                               (6 / 6)

6 mutations were generated:
       5 mutants were killed
       0 mutants were not covered by tests
       1 covered mutants were not detected
       0 errors were encountered
       0 time outs were encountered

Metrics:
         Mutation Score Indicator (MSI): 83%
         Mutation Code Coverage: 100%
         Covered Code MSI: 83%

Please note that some mutants will inevitably be harmless (i.e. false positives).

So here's the list of my nitpicks from a potential a first-time user:

What is used to run the initial test suite? How long does it take? There is nothing to help to debug here
I left the command run for some time and looked away. When I came back I had the result but no idea how long it took
Where is the log file generated?
I only know mutant killed and escaped: what is not covered by tests, not detected, errors and time outs?
I know Mutation Score, why do I have 3 of them?

Also although I think it would be cool at some point to be able to say "execute all the tests" or specific tests for non-covered mutants, I think it's too early for this extension point and for now way too expansive. As such I would just suggest to drop the notion of "not covered" and completely ignore non-covered code for now.

But because complaining is easy, here's a suggestion for improvements:

Suggested report

# You are running Infection with xdebug enabled.: removed as displayed in the initial run block
    ____      ____          __  _
   /  _/___  / __/__  _____/ /_(_)___  ____
   / // __ \/ /_/ _ \/ ___/ __/ / __ \/ __ \
 _/ // / / / __/  __/ /__/ /_/ / /_/ / / / /
/___/_/ /_/_/  \___/\___/\__/_/\____/_/ /_/

Running initial test suite...
    Executing "/path/to/phpunit -cconfig_file ...other options"

# Show full info; just make sure the progress mode is used 
PHPUnit 7.0.2 by Sebastian Bergmann and contributors.

Runtime:       PHP 7.2.2 with Xdebug 2.6.0
Configuration: /Users/tfidry/Project/mutation-testing-demo/phpunit.xml.dist

..                                                                  2 / 2 (100%)

Time: 36 ms, Memory: 4.00MB
OK (2 tests, 2 assertions)

Generate mutants...

Processing source code files: 2/2
Creating mutated files and processes: 6/6
# Additional blank line here
.: killed, M: escaped, S: uncovered, E: fatal error, T: timed out

M.....                                               (6 / 6)

// Memory usage: 12.19MB (peak: 12.31MB), total time: 1.15s  # Log this bit of info

Generating the report "/path/to/infection-log.txt".

6 mutations were generated:
       5 mutants were killed: #renamed + group the killed mutants
             - 5 tests failures
             - 0 PHP errors
             - 0 time outs
       1 mutants escaped #renamed

Mutation Score Indicator (MSI): 83% # Simplify the metric provided

Please note that some mutants will inevitably be harmless (i.e. false positives).

WDYT?

The text was updated successfully, but these errors were encountered:

BackEndTea · 2018-03-15T08:16:21Z

As mentioned in slack, i'm a fan of making --only-covered the default, and add a --non-covered option.

And if we are generating an extensive CLI report, i'd like to use some kind of verbosity flag

theofidry · 2018-03-15T09:59:36Z

What would you use the --non-covered option for though? For now we don't do anything about non-covered mutants so we actually waste time parsing files we are not going to run the tests for

BackEndTea · 2018-03-15T10:00:49Z

Completeness is one thing.

It also helps to see what mutations would be created on not yet tested code.

maks-rafalko · 2018-03-15T20:29:54Z

First of all, thanks as always for your ideas guys 👍

I would like to move each point to a separate issue because it's too hard to follow the discussion, too many things are being discussed. Here are my 2 cents:

What is used to run the initial test suite? How long does it take? There is nothing to help to debug here

Could you please clarify what does user want to debug here? I agree that currently it's not clear how long does it take to run initial tests suite with this progress bar

53 [------------------------->--] 4 secs.

But I'm not sure we need to display the whole PHPUnit's output as it can be too long (imagine 2000 tests, or imagine --tap format for PHPSpec that displays each test case on a separate line)

I would propose here something like progress bar with percentage:

55/293 [▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░] 18%

Again, we can use Verbosity level to display percentage by default, or full output with more verbose level.

I left the command run for some time and looked away. When I came back I had the result but no idea how long it took

This is what I've been dreaming about last days ;) Just one minor suggestion, instead of

Memory usage: 12.19MB (peak: 12.31MB), total time: 1.15s

display

Total time: 1.15s, memory usage: 12.19MB (peak: 12.31MB).

as I think time - is what bothers people a lot with MT, not the memory usage.

Where is the log file generated?

Nice one. One addition from me here is to display all registered loggers, not just text one. For example:

Generating reports:
    - text: "/path/to/infection-log.txt".
    - summary: "/path/to/summary.txt".
    ...

Last two points (about killed/escaped/timeouts and different kinds of MSI) are much more complicated.

For now we don't do anything about non-covered mutants so we actually waste time parsing files we are not going to run the tests for

Yes, we are wasting time parsing them, but I can't say this is useless metric. Let me past a quote from our docs:

[...Code Coverage is 65% ...] MSI is 47%. This means that 47% of all generated mutations were detected (i.e. kills, timeouts or fatal errors). Given the code coverage of 65%, there is a 18% difference so Code Coverage was a terrible quality measurement in this example.

I like this example, because it shows that the test coverage metric (65%) of a particular project does not cover all branches/edge cases. Mutation testing shows here that there is a room for improvements.

On the other side, when you have 5% of Test Coverage, you can have 100% of Covered Code MSI. Personally, I like this mutation score much more than just MSI, because it doesn't force you to write tests for everything, but only force you to write maybe few, but good tests.

What I don't quite understand is Mutation Code Coverage.

theofidry · 2018-03-15T21:19:55Z

What is used to run the initial test suite? How long does it take? There is nothing to help to debug here

Could you please clarify what does user want to debug here? I agree that currently it's not clear how long does it take to run initial tests suite with this progress bar

I think we can skip the why for now as it should be answered in the documentation rather than here I think.

What I'm missing from the initial test run here is:

What is being executed exactly? Is my config alright? So I think showing the exact comment would be important here
Regarding the progress of the tests, I like the progress bar rather than the . progress, as you say it's less verbose. If we can manage that or alternatively allow the full output to be able to see the progress but remove it once the tests are good. Anyway I think we can discuss this specific point separately.
How long it took: mostly to know which time was taken out of the total time. Turns out that in some cases this is the slowest part for me :)

I like this example, because it shows that the test coverage metric (65%) of a particular project does not cover all branches/edge cases. Mutation testing shows here that there is a room for improvements.

I actually totally agree with this. However I think there is some confusion here. So imagine you have the following files with their respective coverage:

src/
  Foo.php - 100% 
  Bar.php - 100%
  Baz.php - 0%

So the total code coverage is 66%. Right now (or at least from my understanding) Infection takes those 3 files as an input. For the sake of the example let's say 2 mutants are generated for the first two and 1 for the last, the result would be:

5 mutants generated
  2 killed
  2 escaped (as 1 mutant in Foo and another in Bar did not cause the tests to fail)
  1 not covered (there is not test for Baz so they are not covered)

MSI = (2 / 5) * 100 = 40%
MCC = ((5 - 1) / 5) * 100 = 80%
CCMSI = (2 / (5 - 1)) * 100 = 50%

My point here is that mutating a file that is not covered is useless (at least right now). The reasoning being that the code coverage tells you what is not tested. If it is not tested, it cannot be killed, so not point measuring. So in the case above you would instead have:

5 mutants generated
  2 killed
  2 escaped (as 1 mutant in Foo and another in Bar did not cause the tests to fail)

CCMSI = (2 / 4) * 100 = 50%

And I would actually argue that I didn't encounter this notion of "not covered" mutant. So to me the MSI is nbr of mutant killed / nbr of mutant generated which is actually the CCMSI above (except the above is in percentage).

In other words, I think we agree on the purpose: MSI is more accurate than code coverage. In the example above it shows that even though the code coverage is of 66%, actually only 50% of it is actually tested (and maybe that's what the CCMSI should be instead: CC * MSI / 100 so here 33%).

If you disagree I think we should go ahead and discuss about this in another issue :)

So unless anyone has a different opinion, I think there's a couple of things we can already implement. The remaining points to discuss are:

The initial run
The metrics summary

which I think would be best discussed in dedicated issue.

sanmai · 2018-03-19T13:47:42Z

Mutating a file that is not covered isn't as useless as it seems. Think of it as a measure of potential defects. Every "not covered" mutant is a potential problem lurking out where, a problem for which one should write a test.

(So far all not covered mutant I found were harmless, but that's different question.)

theofidry · 2018-03-19T14:04:49Z

@sanmai I don't really see the point still tbh. You don't need mutation testing for that: path code coverage is enough and besides being cheaper (it's way faster to get), it is mandatory for Infection right now

sanmai · 2018-03-19T14:10:05Z

Well, with what I agree with you is that actually mutating a non-covered file is really pointless. It's uncovered so by definition no tests will show a thing. Only counting possible mutations should be enough. Am I still not any closer to your thinking?

theofidry · 2018-03-19T15:21:29Z

Only counting possible mutations should be enough. Am I still not any closer to your thinking?

Yes, except I'm not sure I really see the benefit of that :)

It does have a cool factor as you can see that there would be X more mutants generated if you covered those files, but let's keep in mind that this is quite expansive: we are parsing each of those files with PHP-Parser. So I don't find that the performance tradeoff we are doing here is worth the little (IMO) value this information provides

maks-rafalko added the RFC label Mar 15, 2018

maks-rafalko mentioned this issue Mar 26, 2018

Syntax errors are not accounted for, not counted #262

Closed

maks-rafalko mentioned this issue Jun 26, 2018

Show time and memory info to the console output #390

Merged

3 tasks

maks-rafalko mentioned this issue Mar 16, 2022

Show where the log file is located after execution (if there is one) #1675

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Improve CLI report #222

[RFC] Improve CLI report #222

theofidry commented Mar 14, 2018 •

edited

BackEndTea commented Mar 15, 2018

theofidry commented Mar 15, 2018

BackEndTea commented Mar 15, 2018

maks-rafalko commented Mar 15, 2018 •

edited

theofidry commented Mar 15, 2018

sanmai commented Mar 19, 2018

theofidry commented Mar 19, 2018 •

edited

sanmai commented Mar 19, 2018

theofidry commented Mar 19, 2018 •

edited

[RFC] Improve CLI report #222

[RFC] Improve CLI report #222

Comments

theofidry commented Mar 14, 2018 • edited

BackEndTea commented Mar 15, 2018

theofidry commented Mar 15, 2018

BackEndTea commented Mar 15, 2018

maks-rafalko commented Mar 15, 2018 • edited

theofidry commented Mar 15, 2018

sanmai commented Mar 19, 2018

theofidry commented Mar 19, 2018 • edited

sanmai commented Mar 19, 2018

theofidry commented Mar 19, 2018 • edited

theofidry commented Mar 14, 2018 •

edited

maks-rafalko commented Mar 15, 2018 •

edited

theofidry commented Mar 19, 2018 •

edited

theofidry commented Mar 19, 2018 •

edited