Incremental runs #174

cl3joly · 2023-11-30T16:31:25Z

This is a work in progress PR to add incremental runs (#53). @jared-norris is working with me on this as a part of a hackathon at our company.

TODO

Add tests
Write the logic in the new incremental module
Write user facing doc

High level description

Our vision of the use case for incremental builds is that you will add some code, try to cover it and you will have, say, 10 unviable mutants, 7 caught mutants and 3 uncaught mutants. You will then add tests, trying to catch those 3 mutants and you will run cargo mutants multiple time, catching more and more, until all mutants are either unviable or caught. That last part, where you keep adding tests, is where there is an opportunity to do incremental builds, where only the 3 uncaught mutants will be tried, ignoring the 10 unviable and the 7 caught mutants.

We are adding a flag to run cargo mutants incrementally.

The first time the tool is called with the incremental flag, we do a normal run and save a list to mutants that were caught by tests or unviable. We call the scenarios associated with those mutants positive outcomes.
The second time onward the tool is called with incremental flag, we load the previously saved list of positive outcomes and use it to exclude all mutants that were previously caught or unviable. If any more mutants are caught, they are added to the list of positive outcomes and all the positive outcomes, both from the current run and from previous runs are saved back.
The next time the tool is called without the incremental flag, the list of positive outcomes is not written and it gets removed as a part of the process that rotates mutants.out to mutants.out.old.

cl3joly · 2023-11-30T16:34:30Z

src/output.rs

+
+/// Caught and unviable scenario outcomes
+#[derive(Serialize, Deserialize, Debug, Clone, Eq, PartialEq)]
+pub struct PositiveOutcomes {


This structure is probably not that helpful, we will probably remove it and just pass around a Vec of PositiveOutcome.

cl3joly · 2023-11-30T16:34:56Z

src/output.rs

+    }
+}
+
+impl TryFrom<&ScenarioOutcome> for PositiveOutcome {


This bit is currently broken, we are working to fix it.

sourcefrog · 2023-12-01T00:37:46Z

That sounds like a reasonable high level plan.

This seems like a feature where it would be good to write the user facing doc in book/src fairly early, starting from your description here, so we can see if it's easy to explain.

cl3joly · 2023-12-01T15:44:14Z

src/mutate.rs

+    pub fn calculate_hash(&self) -> MutantHash {
+        // We need the hashes to be stable across restarts, so that they can be serialized
+        // consistently.
+        let mut s = SipHasher::new();


This is deprecated and the warning says we should use DefaultHasher, but the doc says:

The internal algorithm is not specified, and so it and its hashes should not be relied upon over releases.
https://doc.rust-lang.org/std/collections/hash_map/struct.DefaultHasher.html

and in this case, we need the hashes to be stable since they are persisted on disk.

I wonder if we could just avoid the serialized format depending on a hash algorithm: could you store them in the originally generated order or even just sorted by location and name?

cl3joly · 2023-12-01T16:10:24Z

Actually, it makes more sense to just remove the incremental file once we have every mutant resulting in a positive outcome.

We call positive outcomes scenarios where mutants were caught or unviable. These positive outcomes are now stored for use in incremental runs. Co-authored-by: Jared Norris <jared-norris@users.noreply.github.com>

sourcefrog · 2023-12-04T15:52:29Z

High level description

Our vision of the use case for incremental builds is that you will add some code, try to cover it and you will have, say, 10 unviable mutants, 7 caught mutants and 3 uncaught mutants. You will then add tests, trying to catch those 3 mutants and you will run cargo mutants multiple time, catching more and more, until all mutants are either unviable or caught. That last part, where you keep adding tests, is where there is an opportunity to do incremental builds, where only the 3 uncaught mutants will be tried, ignoring the 10 unviable and the 7 caught mutants.

This sounds just great.

I wonder if the "incremental" name is going to be confusing with the --in-diff feature of looking for mutants in code added in a PR.

I can't immediately think of a better name, although maybe it should just be more explicit, something like "--already-failing", though that's still not great.

We are adding a flag to run cargo mutants incrementally.

The first time the tool is called with the incremental flag, we do a normal run and save a list to mutants that were caught by tests or unviable. We call the scenarios associated with those mutants positive outcomes.

The second time onward the tool is called with incremental flag, we load the previously saved list of positive outcomes and use it to exclude all mutants that were previously caught or unviable. If any more mutants are caught, they are added to the list of positive outcomes and all the positive outcomes, both from the current run and from previous runs are saved back.

The next time the tool is called without the incremental flag, the list of positive outcomes is not written and it gets removed as a part of the process that rotates mutants.out to mutants.out.old.

Sounds good.

jared-norris · 2023-12-06T16:38:50Z

High level description

Our vision of the use case for incremental builds is that you will add some code, try to cover it and you will have, say, 10 unviable mutants, 7 caught mutants and 3 uncaught mutants. You will then add tests, trying to catch those 3 mutants and you will run cargo mutants multiple time, catching more and more, until all mutants are either unviable or caught. That last part, where you keep adding tests, is where there is an opportunity to do incremental builds, where only the 3 uncaught mutants will be tried, ignoring the 10 unviable and the 7 caught mutants.

This sounds just great.

I wonder if the "incremental" name is going to be confusing with the --in-diff feature of looking for mutants in code added in a PR.

I can't immediately think of a better name, although maybe it should just be more explicit, something like "--already-failing", though that's still not great.

We are adding a flag to run cargo mutants incrementally.

The first time the tool is called with the incremental flag, we do a normal run and save a list to mutants that were caught by tests or unviable. We call the scenarios associated with those mutants positive outcomes.

The second time onward the tool is called with incremental flag, we load the previously saved list of positive outcomes and use it to exclude all mutants that were previously caught or unviable. If any more mutants are caught, they are added to the list of positive outcomes and all the positive outcomes, both from the current run and from previous runs are saved back.

The next time the tool is called without the incremental flag, the list of positive outcomes is not written and it gets removed as a part of the process that rotates mutants.out to mutants.out.old.

Sounds good.

We had a similar debate when choosing the name; I think it can be improved upon.

Maybe --uncaught-only?

sourcefrog · 2023-12-06T18:22:29Z

Maybe --iterate-uncaught to emphasize that it does this special behaviour of selecting from the previous output?

ASuciuX · 2023-12-14T22:57:31Z

Hello, would this be possible on the txt files alone or would it require the jsons? Or would it be another mechanism to track these?

sourcefrog · 2023-12-15T10:11:41Z

Hello, would this be possible on the txt files alone or would it require the jsons? Or would it be another mechanism to track these?

Can you say more about why you'd want that? I think jsons are probably a better thing for computers to read: we can add more fields in future, etc.

sourcefrog · 2023-12-16T19:29:35Z

Let me know if/when you want a code review

ASuciuX · 2023-12-17T02:59:24Z

Our existing workflow consists of two phases: Logging and Tracking.

Tracking Phase
In the Tracking phase, the process unfolds as follows:

Upon the opening of a pull request or the introduction of new commits on the source branch, we execute cargo mutants on the changes between the latest commit and the current code.
This approach focuses on testing mutants exclusively on newly added or modified functions.
The CI workflow fails if there are any uncaught mutants, giving an output from the respective .txt files about which functions need to be changed or tested.

Logging Phase
The Logging phase starts with an initial output, consisting solely of .txt files generated by running mutants on the latest commit in the source code. The subsequent steps are run on merges on certain branches, and they include:

Uploading the initial output files for each package from the codebase to a GitHub Actions cache, along with the hash of the commit.
Restoring the cache and executing cargo mutants solely on the differences between the head and the hash from the restored cache.
Extracting output from the mutants.out folder and appending it to the existing cache content using operations like grep, sed, or awk.
Re-uploading the refined output to both cache and artifacts.

This logging approach ensures that mutant outputs remain human-readable and easily accessible. Simultaneously, it optimizes cargo mutants runs for efficient utilization of CI capabilities.

References
PR Tracking

Logging

The current struggle is related only to the stackslib package. Currently, we have been running mutants for 460 minutes on 2 VMs in a certain folder from stackslib package containing 1028 mutants - out of 11,448 from the whole package. There are 1634 minutes remaining with the -j 30 flag, and 1992 with -j 24.
We are taking into consideration how much it would take to build and test small changes with 1-2 jobs on the CI on the differences - given the workflow doesn't fail to begin with.
We use shell scripts for parsing diff files and appending lines to output .txt files. Thank you for taking the time to look through this, and if you have any suggestions, please let us know what could be improved.

cl3joly · 2024-01-03T20:28:31Z

Hey @sourcefrog , sorry for the long silence, we have been quite busy lately.

Some of the TODO points are still unimplemented, but feel free to share any feedback you may have on the current state.

sourcefrog · 2024-01-11T05:18:23Z

Ack, I'll come back and have a look at this

sourcefrog · 2024-01-20T03:31:01Z

Hi @cl3joly, thanks for starting on this, it's very nice and encouraging to see the energy about Mutants and about this feature.

I think the high level description in your original post is just right: once people have some missed mutants, they should be able to repeatedly run with --iterate and it will re-test only the mutants that were not know to be caught on previous runs. (Eventually, when you think you've fixed everything, maybe in CI, you'll probably want to retest everything to make sure no new gaps were introduced.)

I realized that, in doing these iterations it's pretty likely that the user might interrupt one cargo-mutants run and it's important that, when that happens, we don't forget about any known positive results.

The other thing I thought of looking at this PR thread is that as you edit the code to fix gaps, it's fairly likely that some mutant line numbers will change, so we probably don't want to match on them in saying which mutants are already covered: probably just matching on the type/function name is enough.

I think, looking at the implementation, it is maybe getting a bit little complex than is necessary, although that is totally understandable for someone new to the codebase. I think @ASuciuX was on the right track in #174 (comment) to say maybe we can do this just with the existing text files in mutants.out.

(In fact, maybe a good initial stand-alone step would be to add an option that runs only mutants explicitly named in some file, or excludes mutants named in some other files, optionally disregarding line/col. That's basically #204. That would support the use case someone mentioned of wanting to do some external filtering and processing of which ones to run. I'm inclined to split that off as a standalone feature, as a kind of step beyond the regex based filters.)

One complication, maybe handled in your PR, I'm not sure, is that even if we're interrupted we shouldn't forget that some things were already caught or unviable...

cl3joly · 2024-01-31T14:41:06Z

@sourcefrog Thanks for the review!

The other thing I thought of looking at this PR thread is that as you edit the code to fix gaps, it's fairly likely that some mutant line numbers will change, so we probably don't want to match on them in saying which mutants are already covered: probably just matching on the type/function name is enough.

That’s a great point.

I think, looking at the implementation, it is maybe getting a bit little complex than is necessary, although that is totally understandable for someone new to the codebase. I think @ASuciuX was on the right track in #174 (comment) to say maybe we can do this just with the existing text files in mutants.out.

Right, that makes sense.

The above point requires quite an extensive rework of this PR. That’s a fair thing from you to ask, as a maintainer. However, to be transparent, I don’t think I’ll have time to work on it in the next few weeks. If anyone wants to pick this up, please feel free. If not, when I have some time, I’ll come back to this PR to improve it based on your feedback. I hope that works for you @sourcefrog

sourcefrog · 2024-01-31T14:48:40Z

Yep, understandable. If I have free time I may pick this up, based on the ideas developed in your work. If I start on it I'll comment here to avoid duplication, and you can do that too if you come back to it.

sourcefrog · 2024-06-11T14:23:56Z

I'll close this and continue in #354. Thanks again for getting it started.

cl3joly commented Nov 30, 2023

View reviewed changes

cl3joly marked this pull request as draft November 30, 2023 16:35

cl3joly mentioned this pull request Nov 30, 2023

"Iterate failures" mode that skips mutations that were caught last time #53

Closed

6 tasks

cl3joly force-pushed the incremental-runs branch from 0f8c89c to 0a6ba34 Compare December 1, 2023 15:41

cl3joly commented Dec 1, 2023

View reviewed changes

Save positive outcomes

23c83dd

We call positive outcomes scenarios where mutants were caught or unviable. These positive outcomes are now stored for use in incremental runs. Co-authored-by: Jared Norris <jared-norris@users.noreply.github.com>

cl3joly force-pushed the incremental-runs branch from 0a6ba34 to 23c83dd Compare December 1, 2023 16:15

sourcefrog linked an issue Dec 6, 2023 that may be closed by this pull request

"Iterate failures" mode that skips mutations that were caught last time #53

Closed

6 tasks

sourcefrog mentioned this pull request Dec 23, 2023

Take a list of named mutants to run or to exclude #204

Open

sourcefrog self-assigned this Feb 22, 2024

sourcefrog closed this Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental runs #174

Incremental runs #174

cl3joly commented Nov 30, 2023 •

edited

Loading

cl3joly Nov 30, 2023

cl3joly Dec 1, 2023

cl3joly Nov 30, 2023

cl3joly Dec 1, 2023

sourcefrog commented Dec 1, 2023

cl3joly Dec 1, 2023

sourcefrog Dec 3, 2023

cl3joly commented Dec 1, 2023

sourcefrog commented Dec 4, 2023

High level description

jared-norris commented Dec 6, 2023

High level description

sourcefrog commented Dec 6, 2023

ASuciuX commented Dec 14, 2023

sourcefrog commented Dec 15, 2023

sourcefrog commented Dec 16, 2023

ASuciuX commented Dec 17, 2023

cl3joly commented Jan 3, 2024

sourcefrog commented Jan 11, 2024

sourcefrog commented Jan 20, 2024 •

edited

Loading

cl3joly commented Jan 31, 2024

sourcefrog commented Jan 31, 2024

sourcefrog commented Jun 11, 2024

Incremental runs #174

Incremental runs #174

Conversation

cl3joly commented Nov 30, 2023 • edited Loading

TODO

High level description

cl3joly Nov 30, 2023

Choose a reason for hiding this comment

cl3joly Dec 1, 2023

Choose a reason for hiding this comment

cl3joly Nov 30, 2023

Choose a reason for hiding this comment

cl3joly Dec 1, 2023

Choose a reason for hiding this comment

sourcefrog commented Dec 1, 2023

cl3joly Dec 1, 2023

Choose a reason for hiding this comment

sourcefrog Dec 3, 2023

Choose a reason for hiding this comment

cl3joly commented Dec 1, 2023

sourcefrog commented Dec 4, 2023

High level description

jared-norris commented Dec 6, 2023

High level description

sourcefrog commented Dec 6, 2023

ASuciuX commented Dec 14, 2023

sourcefrog commented Dec 15, 2023

sourcefrog commented Dec 16, 2023

ASuciuX commented Dec 17, 2023

cl3joly commented Jan 3, 2024

sourcefrog commented Jan 11, 2024

sourcefrog commented Jan 20, 2024 • edited Loading

cl3joly commented Jan 31, 2024

sourcefrog commented Jan 31, 2024

sourcefrog commented Jun 11, 2024

cl3joly commented Nov 30, 2023 •

edited

Loading

sourcefrog commented Jan 20, 2024 •

edited

Loading