-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment: random mutant sampling #2584
Comments
I have a large codebase with many tests that can't yet run in parallel (because of their use of a test database). My entire test suite takes about two minutes to run, and stryker wants to try something like 27000 mutations on my app. Random sampling is the only way I can think of to get stryker to work on an app like this. Of course the tests can be made to run in parallel, and further optimized, but I'd like to produce a mutation score NOW that shows me the effectiveness of my tests and even running small samples would give me that. Edit: Random sampling would MAKE them compatible, though, at least in part. |
I understand the need for this feature in some use cases. I'm in the same spot with one of my day-job projects as well. However, I don't like the "random" here. I would like this to be reproduced, this way you can test it locally and get the same result as you can on your CI pipeline. The way I see it now, I think a I do think the mutants should be visible in the report, but with an "ignored" state, that way you at least know which mutants weren't tested, so you won't get the false sense of security.
Indeed, we're focussed on unit testing, but Stryker works for integration tests as well. You pointed out the challenges quite well. Do you think having some way for each worker process to use a different database would help you? I've been thinking about ways to support that. For example, we could add a |
I haven't read all the way through the paper, so I can't be authoritative here, but I'd expect the randomness to be an important part of producing a valid score. Would some kind of seed help towards your concern? If random sampling was used, stryker could output a hash, and if you supplied that hash as an arg you'd get the same set of mutations. I personally don't know much about how seeding for "reproducible randomness" actually works, but I've played enough video games to know it's at least possible :)
I think that's a great idea! I was reading about how ava works with databases. I was going to experiment with setting up a blank database for each test at run time. I expect that this would increase overall test time for serial runs of course, but it would enable significant parallelism. The question is how soon does the db server become the bottleneck? Your suggestion (one database per process) is a strong middle ground between "one database" and "one database per test", though! |
This is a super interesting approach! The conclusion of the paper is that you can, fairly safely, drop the sample rate all the way down to 5% and still get >99% accurate overall mutation score! With that, I would gladly drop the sample rate down to 10% in CI. I'd bump it up to 10% to account for real-world ugliness.
They've indeed used some fancy methods for selecting the mutations. It looks like it doesn't matter a lot which method you choose though. The numbers in the paper are really close for every method. In the paper, random selection with 5% sampling is still 99.44% accurate. The highest score is 99.52% for a 5% sampling rate. I would start with a seeded random sampling of all mutations. Seeded to make it predictable as Nico said. Random sampling on all mutants because it's easiest to implement. If the implementation is somewhat stable, I'm willing to generate some measurements using it. :) |
So I've been thinking about this some more during the weekend. I think the ideal situation in CI for me would be something like this:
This will provide us with meaningful metrics over time and it prevents us from neglecting to write proper tests for new stuff. Or, in other words, it gives us Strykers main CI advantages. At the same time, we're not punished for old code. We're not ignoring old code either which will keep test quality visible. This focus on testing new/changed code also allows us to start using Stryker with existing code bases. And, last but not least, this will reduce CI runtime significantly, especially for large codebases. For large codebases, you can probably get away with lower sampling rates too, reducing execution times even further. I think it would be the best balance between using Stryker to write great tests and CI duration. It's easy to think of this stuff, though implementing will be an entirely different beast. 🤭 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I've read this paper which suggests that random mutant sampling on top of selective mutation might lead to a very efficient mutation process while still providing a relative accurate mutation score. That's why I tried to incorporate a random sampling approach into Stryker, too. In the paper they state that complete randomness performs worse than a more sophisticated approach that they also applied but none the less, it should still lead to good results and is easier to implement. :)
I did not want to make a PR directly because I plan on doing some experiments with my implementation first. But I thought it might be a good idea to already make a small note here in case anyone is interested into the implementation. It's available in the mutant-sampling branch in my fork. I've added 2 parameters for controling the sampling:
sampling <true/false>
andsamplingRate <percentage>
.Note, that it's still a little rough around the edges (e.g. the processbar is not adjusted).
The text was updated successfully, but these errors were encountered: