Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add coverage analysis to offer improved performance #388

Closed
rouke-broersma opened this issue Mar 5, 2019 · 28 comments
Closed

Add coverage analysis to offer improved performance #388

rouke-broersma opened this issue Mar 5, 2019 · 28 comments
Labels
🚀 Feature request New feature or request

Comments

@rouke-broersma
Copy link
Member

This issue follows from #183

Generate mutant coverage (which mutants is actually executed by any test). This could provide useful metrics as of now.

Offer to skip test runs for non covered mutants (hence assumed as survivors)
==> Improve performance for projects with low test coverage.
Identify tests that do not cover any mutant, offer to skip them from each run.
==> Could improve performance for projects with many tests
Optimise executed tests according to mutant coverage (i.e. run only the tests that cover the 'current' mutant)
==> Should dramatically improve performance
Step 1 requires to able to access coverage results from Stryker
Step 2 looks simple
Step 3 requires to control which tests are run
Step 4 require analysis of coverage data to build strategy

@rouke-broersma rouke-broersma added 🚀 Feature request New feature or request to do labels Mar 5, 2019
@rouke-broersma
Copy link
Member Author

From dupdop:

Some update: I have namedpipes running smoothly and stable, after a few fights.
I have reached maturity level 2: (my) Stryker now assumes non-covered mutants are survivors hence do not test them.

Code available here:
https://github.com/dupdob/stryker-net/tree/mutant_coverage

@dupdob
Copy link
Member

dupdob commented Mar 5, 2019

I have a branch ready, just waiting to finish with my current PR.
(Branch currently address use cases 1 and 2)

@rouke-broersma
Copy link
Member Author

@dupdob Yes I've seen it. The named pipe seems like a good way to implement this. Step 3 can be achieved with the vstest integration.

To achieve step 4 I was thinking we could create a named pipe from the mutated assembly to a vstest data collector (out of proc as well) and then a named pipe from the data collector to stryker. The datacollector can execute for every single test being run and while in the middle we can receive which mutants have been hit for that testrun from the mutated assembly. Then we can send that information including the test that was run to stryker through the other named pipe. And then we can use vstest to run only those unit tests that are relevant to the mutant.

@dupdob
Copy link
Member

dupdob commented Mar 15, 2019

@Mobrockers
thanks for the proposal, I have made an MVP using pipe communication between the assembly and Stryker, the objective reaching a proof of concept before considering improvement strategies.
Here is what I have achieved and learned so far. Spoiler alert, it is a bit disappointing (at least for netcore environment, it may be better with classical .Net)..

  1. Overall, VsTest architecture is complex, brittle and prone to failure . I often end up having vstest.console.exe processes eating a core in some endless loop, logging nothing.
  2. I have yet to achieve a complete run with NFluent due to above issues. At some point, Stryker runs out of available vstest runners.
  3. Performance are abysmal. To get coverage info per-test, I launch each test in isolation. It is really slow.
  4. ~1% of runs failed to provide coverage data. This is probably due to the fact I use appdomain tear down event to send the coverage data through the named pipe. As of now, this is not the most urgent issue; I assume those test covers ALL mutants and have to run each time

The biggest disappointment for me was the fast that vstest provided little more than a facade against dotnet test...

I have various leads and ideas on how to fix those issues, but the fact is it turned more difficult than anticipated...
Coming back to the list:

  1. Stability: No idea, but I assume my code and maybe the vstest stryker runner itself, still requires some clean up and bug fixing. Once code have been hardened and refactored, I hope those issue will disappear
  2. Performance: there is an interesting option (AreTestCaseLevelEventsRequired) that may allow me to capture coverage in a single run (so I have yet to read the doc, if any, about it). There is also the option KeepAlive that looks promising, but this one is documented as not implemented/for future use.
  3. Missing coverage: I hope I will be able to reduce the failure rate, plus I still can devise alternate communication strategies, including real time async, which would at least provide partial coverage in case of failure at tear down.

@dupdob
Copy link
Member

dupdob commented Mar 15, 2019

OK, went through the code, AreTestCaseLevelEventsRequired does not serve the purpose I hoped fore (sequential notification)

@rouke-broersma
Copy link
Member Author

The biggest disappointment for me was the fast that vstest provided little more than a facade against dotnet test...

How do you mean? Dotnet test internally uses vstest, not the other way around

@rouke-broersma
Copy link
Member Author

  1. Performance are abysmal. To get coverage info per-test, I launch each test in isolation. It is really slow.
  2. ~1% of runs failed to provide coverage data. This is probably due to the fact I use appdomain tear down event to send the coverage data through the named pipe. As of now, this is not the most urgent issue; I assume those test covers ALL mutants and have to run each time
  1. Performance: there is an interesting option (AreTestCaseLevelEventsRequired) that may allow me to capture coverage in a single run (so I have yet to read the doc, if any, about it). There is also the option KeepAlive that looks promising, but this one is documented as not implemented/for future use.

I hope using a datacollector can specifically help really much with this problem because of:

https://github.com/Microsoft/vstest-docs/blob/master/docs/extensions/datacollector.md#datacollectionevents

Example:
https://github.com/Microsoft/vstest/blob/master/test/TestAssets/SimpleDataCollector/Class1.cs

I understand that it's only an MVP of course, so I get that you're not immediately going for a double named pipe with all it's added complexity.

@dupdob
Copy link
Member

dupdob commented Mar 15, 2019

Well, I see vstest.console.exe launching dotnet.exe. I guess dotnet integrates some VsTest glue and logic, but from an executable perspective, that's how it works
image

Command line is:

"C:\Program Files\dotnet\dotnet.exe" exec --runtimeconfig "C:\Users\dupdo\source\repos\NFluent\tests\NFluent.NetStandard.20.Tests\bin\Debug\netcoreapp2.1\NFluent.NetStandard.20.Tests.runtimeconfig.json" --depsfile "C:\Users\dupdo\source\repos\NFluent\tests\NFluent.NetStandard.20.Tests\bin\Debug\netcoreapp2.1\NFluent.NetStandard.20.Tests.deps.json" "C:\Users\dupdo\.nuget\packages\microsoft.testplatform.testhost/15.9.0\lib/netstandard1.5/testhost.dll" --port 55100 --endpoint 127.0.0.1:055100 --role client --parentprocessid 12420 --diag "C:\Users\dupdo\source\repos\NFluent\tests\NFluent.NetStandard.20.Tests\StrykerOutput\2019-03-15.14-48-13\vstest\vstest-log.host.19-03-15_15-29-25_91599_16.txt" --tracelevel 3 --telemetryoptedin false

Ok, I get it now, it launches a dotnet host to execute the test.
thanks for challenging my comment

@rouke-broersma
Copy link
Member Author

You are right in that vstest is internally a big can of worms :) It also internally spawns new processes for dotnet.exe, testhost.exe, and sometimes datacollector.exe and other collectors depending on configurations 💃

@dupdob
Copy link
Member

dupdob commented Mar 16, 2019

Quick summary regarding various steps:

  1. is pretty solid so far. It still fails sometimes, and I fixed a bunch of edge cases/timing issues regarding the pipe mechanism. There may still be some, but I also assume that AppDomain unloading event is unreliable
  2. Is working good, albeit providing only a marginal gain from a performance perspective. But it also provides clear hindsight on non covered code
  3. This where I am pretty stuck at. Launching test one by one is so slow: ~1 hour for the 1660 NFluent's tests. I had problem with coverage sometime failing, so I implemented a retry mechanism, so I should get results now.
  4. Logic has been implemented: I have mutant to tests mapping informations, and I am able to run selected tests only. I need to stabilise step 3 before working on 4.

@Mobrockers : I missed your comment regarding DataCollector, will dig into this, indeed it looks what I need to address the issues I face with step 2!

@dupdob
Copy link
Member

dupdob commented Mar 16, 2019

I just had an idea I find interesting: I am fascinated by the mutant->tests mapping information and I had a hunch there was value in those. And I may has just found it.
We could use coverage information in reports to provide hints. E.G:

  • if a mutant is not covered, suggestion should be around adding a test or refactoring code
  • otherwise, suggestion could be to consider reviewing some of the tests that cover the survivor.
  • and we could even cross information for survivors to identify quick win, i.e. need to improve a single test to kill 5-6 mutants.

I am not sure I am clear enough and that it makes sense to you, but I add it as an idea to be considered at a later time.

@rouke-broersma
Copy link
Member Author

rouke-broersma commented Mar 16, 2019 via email

@dupdob
Copy link
Member

dupdob commented Mar 25, 2019

Monday update: I had the opportunity to explore and I made a cool discovery: I can use VsTest data collection architecture to forward coverage results to Stryker, which should help avoid the 'double pipe' strategy proposed by @Mobrockers.
Still far from a finished features, though

@dupdob
Copy link
Member

dupdob commented Mar 26, 2019

Quick update: the vstest runner now uses environment variables to transfer coverage data between test assembly and host. It means we no longer have to run tests in isolation to capture coverage.
Getting closer to a working version!

@richardwerkman
Copy link
Member

Cool! Sounds awesome. Keep up the good work!

@rouke-broersma
Copy link
Member Author

Sounds good!

@dupdob
Copy link
Member

dupdob commented Mar 26, 2019

I made a run with NFluent, time is 55 minutes versus 135 minutes without optimisation.
I just pushed the associated code. I would welcome feedbacks using other projects

@dupdob
Copy link
Member

dupdob commented Apr 1, 2019

Monday morning status update: still working on this. As environment variables do not seem target to me (performance and available space concerns), I have migrated back to pipes, but this is a constant struggle to make them work properly. Tests deadlock without identifiable cause, debugging is difficult and I can't find a way to log from the InProcDataCollector

@rouke-broersma
Copy link
Member Author

Can you not log to file?

@dupdob
Copy link
Member

dupdob commented Apr 1, 2019 via email

@rouke-broersma
Copy link
Member Author

Sorry for the confusion, I meant in relation to this part

Tests deadlock without identifiable cause, debugging is difficult and I can't find a way to log from the InProcDataCollector

You could debug-log to file from the datacollector no?

@dupdob
Copy link
Member

dupdob commented Apr 1, 2019

Good news, finally spotted the deadlock issue: a bug in my pipe reading class when receiving an empty message. I can keep on improving my code.

@rouke-broersma
Copy link
Member Author

rouke-broersma commented Apr 1, 2019 via email

@dupdob
Copy link
Member

dupdob commented Apr 2, 2019

Deadlock is gone, but I still have what looks like race conditions when capturing coverage.
I think it may relates to this issue with Nunit.
If I can confirm this is related to NUnit, I will probably add an option to workaround it until it is fixed on NUnitAdapter side

@rouke-broersma
Copy link
Member Author

rouke-broersma commented Apr 2, 2019 via email

@dupdob
Copy link
Member

dupdob commented Apr 3, 2019

I used it. (Kinda) Interestingly it triggered new deadlock issues. I guess I still have bugs. No worries, I love brainteasers and I am happy to find bugs BEFORE opening a PR.

Note that sequential execution slows test execution.

@dupdob
Copy link
Member

dupdob commented Jun 6, 2019

see #506

@rouke-broersma
Copy link
Member Author

Implemented by #506

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🚀 Feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants