Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental runs #174

Closed
wants to merge 1 commit into from
Closed

Conversation

cl3joly
Copy link

@cl3joly cl3joly commented Nov 30, 2023

This is a work in progress PR to add incremental runs (#53). @jared-norris is working with me on this as a part of a hackathon at our company.

TODO

  • Add tests
  • Write the logic in the new incremental module
  • Write user facing doc

High level description

Our vision of the use case for incremental builds is that you will add some code, try to cover it and you will have, say, 10 unviable mutants, 7 caught mutants and 3 uncaught mutants. You will then add tests, trying to catch those 3 mutants and you will run cargo mutants multiple time, catching more and more, until all mutants are either unviable or caught. That last part, where you keep adding tests, is where there is an opportunity to do incremental builds, where only the 3 uncaught mutants will be tried, ignoring the 10 unviable and the 7 caught mutants.

We are adding a flag to run cargo mutants incrementally.

  1. The first time the tool is called with the incremental flag, we do a normal run and save a list to mutants that were caught by tests or unviable. We call the scenarios associated with those mutants positive outcomes.
  2. The second time onward the tool is called with incremental flag, we load the previously saved list of positive outcomes and use it to exclude all mutants that were previously caught or unviable. If any more mutants are caught, they are added to the list of positive outcomes and all the positive outcomes, both from the current run and from previous runs are saved back.
  3. The next time the tool is called without the incremental flag, the list of positive outcomes is not written and it gets removed as a part of the process that rotates mutants.out to mutants.out.old.

src/output.rs Outdated

/// Caught and unviable scenario outcomes
#[derive(Serialize, Deserialize, Debug, Clone, Eq, PartialEq)]
pub struct PositiveOutcomes {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This structure is probably not that helpful, we will probably remove it and just pass around a Vec of PositiveOutcome.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it

}
}

impl TryFrom<&ScenarioOutcome> for PositiveOutcome {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is currently broken, we are working to fix it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now

@sourcefrog
Copy link
Owner

That sounds like a reasonable high level plan.

This seems like a feature where it would be good to write the user facing doc in book/src fairly early, starting from your description here, so we can see if it's easy to explain.

pub fn calculate_hash(&self) -> MutantHash {
// We need the hashes to be stable across restarts, so that they can be serialized
// consistently.
let mut s = SipHasher::new();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is deprecated and the warning says we should use DefaultHasher, but the doc says:

The internal algorithm is not specified, and so it and its hashes should not be relied upon over releases.
https://doc.rust-lang.org/std/collections/hash_map/struct.DefaultHasher.html

and in this case, we need the hashes to be stable since they are persisted on disk.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could just avoid the serialized format depending on a hash algorithm: could you store them in the originally generated order or even just sorted by location and name?

@cl3joly
Copy link
Author

cl3joly commented Dec 1, 2023

Actually, it makes more sense to just remove the incremental file once we have every mutant resulting in a positive outcome.

We call positive outcomes scenarios where mutants were caught or unviable.

These positive outcomes are now stored for use in incremental runs.

Co-authored-by: Jared Norris <jared-norris@users.noreply.github.com>
@sourcefrog
Copy link
Owner

High level description

Our vision of the use case for incremental builds is that you will add some code, try to cover it and you will have, say, 10 unviable mutants, 7 caught mutants and 3 uncaught mutants. You will then add tests, trying to catch those 3 mutants and you will run cargo mutants multiple time, catching more and more, until all mutants are either unviable or caught. That last part, where you keep adding tests, is where there is an opportunity to do incremental builds, where only the 3 uncaught mutants will be tried, ignoring the 10 unviable and the 7 caught mutants.

This sounds just great.

I wonder if the "incremental" name is going to be confusing with the --in-diff feature of looking for mutants in code added in a PR.

I can't immediately think of a better name, although maybe it should just be more explicit, something like "--already-failing", though that's still not great.

We are adding a flag to run cargo mutants incrementally.

  1. The first time the tool is called with the incremental flag, we do a normal run and save a list to mutants that were caught by tests or unviable. We call the scenarios associated with those mutants positive outcomes.
  2. The second time onward the tool is called with incremental flag, we load the previously saved list of positive outcomes and use it to exclude all mutants that were previously caught or unviable. If any more mutants are caught, they are added to the list of positive outcomes and all the positive outcomes, both from the current run and from previous runs are saved back.
  3. The next time the tool is called without the incremental flag, the list of positive outcomes is not written and it gets removed as a part of the process that rotates mutants.out to mutants.out.old.

Sounds good.

@jared-norris
Copy link

High level description

Our vision of the use case for incremental builds is that you will add some code, try to cover it and you will have, say, 10 unviable mutants, 7 caught mutants and 3 uncaught mutants. You will then add tests, trying to catch those 3 mutants and you will run cargo mutants multiple time, catching more and more, until all mutants are either unviable or caught. That last part, where you keep adding tests, is where there is an opportunity to do incremental builds, where only the 3 uncaught mutants will be tried, ignoring the 10 unviable and the 7 caught mutants.

This sounds just great.

I wonder if the "incremental" name is going to be confusing with the --in-diff feature of looking for mutants in code added in a PR.

I can't immediately think of a better name, although maybe it should just be more explicit, something like "--already-failing", though that's still not great.

We are adding a flag to run cargo mutants incrementally.

  1. The first time the tool is called with the incremental flag, we do a normal run and save a list to mutants that were caught by tests or unviable. We call the scenarios associated with those mutants positive outcomes.
  2. The second time onward the tool is called with incremental flag, we load the previously saved list of positive outcomes and use it to exclude all mutants that were previously caught or unviable. If any more mutants are caught, they are added to the list of positive outcomes and all the positive outcomes, both from the current run and from previous runs are saved back.
  3. The next time the tool is called without the incremental flag, the list of positive outcomes is not written and it gets removed as a part of the process that rotates mutants.out to mutants.out.old.

Sounds good.

We had a similar debate when choosing the name; I think it can be improved upon.

Maybe --uncaught-only?

@sourcefrog
Copy link
Owner

Maybe --iterate-uncaught to emphasize that it does this special behaviour of selecting from the previous output?

@sourcefrog sourcefrog linked an issue Dec 6, 2023 that may be closed by this pull request
6 tasks
@ASuciuX
Copy link
Contributor

ASuciuX commented Dec 14, 2023

Hello, would this be possible on the txt files alone or would it require the jsons? Or would it be another mechanism to track these?

@sourcefrog
Copy link
Owner

Hello, would this be possible on the txt files alone or would it require the jsons? Or would it be another mechanism to track these?

Can you say more about why you'd want that? I think jsons are probably a better thing for computers to read: we can add more fields in future, etc.

@sourcefrog
Copy link
Owner

Let me know if/when you want a code review

@ASuciuX
Copy link
Contributor

ASuciuX commented Dec 17, 2023

Our existing workflow consists of two phases: Logging and Tracking.

Tracking Phase
In the Tracking phase, the process unfolds as follows:

  • Upon the opening of a pull request or the introduction of new commits on the source branch, we execute cargo mutants on the changes between the latest commit and the current code.
  • This approach focuses on testing mutants exclusively on newly added or modified functions.
  • The CI workflow fails if there are any uncaught mutants, giving an output from the respective .txt files about which functions need to be changed or tested.

Logging Phase
The Logging phase starts with an initial output, consisting solely of .txt files generated by running mutants on the latest commit in the source code. The subsequent steps are run on merges on certain branches, and they include:

  • Uploading the initial output files for each package from the codebase to a GitHub Actions cache, along with the hash of the commit.
  • Restoring the cache and executing cargo mutants solely on the differences between the head and the hash from the restored cache.
  • Extracting output from the mutants.out folder and appending it to the existing cache content using operations like grep, sed, or awk.
  • Re-uploading the refined output to both cache and artifacts.

This logging approach ensures that mutant outputs remain human-readable and easily accessible. Simultaneously, it optimizes cargo mutants runs for efficient utilization of CI capabilities.

References
PR Tracking

Logging


The current struggle is related only to the stackslib package. Currently, we have been running mutants for 460 minutes on 2 VMs in a certain folder from stackslib package containing 1028 mutants - out of 11,448 from the whole package. There are 1634 minutes remaining with the -j 30 flag, and 1992 with -j 24.
We are taking into consideration how much it would take to build and test small changes with 1-2 jobs on the CI on the differences - given the workflow doesn't fail to begin with.
We use shell scripts for parsing diff files and appending lines to output .txt files. Thank you for taking the time to look through this, and if you have any suggestions, please let us know what could be improved.

@cl3joly
Copy link
Author

cl3joly commented Jan 3, 2024

Hey @sourcefrog , sorry for the long silence, we have been quite busy lately.

Some of the TODO points are still unimplemented, but feel free to share any feedback you may have on the current state.

image

@sourcefrog
Copy link
Owner

Ack, I'll come back and have a look at this

@sourcefrog
Copy link
Owner

sourcefrog commented Jan 20, 2024

Hi @cl3joly, thanks for starting on this, it's very nice and encouraging to see the energy about Mutants and about this feature.

I think the high level description in your original post is just right: once people have some missed mutants, they should be able to repeatedly run with --iterate and it will re-test only the mutants that were not know to be caught on previous runs. (Eventually, when you think you've fixed everything, maybe in CI, you'll probably want to retest everything to make sure no new gaps were introduced.)

I realized that, in doing these iterations it's pretty likely that the user might interrupt one cargo-mutants run and it's important that, when that happens, we don't forget about any known positive results.

The other thing I thought of looking at this PR thread is that as you edit the code to fix gaps, it's fairly likely that some mutant line numbers will change, so we probably don't want to match on them in saying which mutants are already covered: probably just matching on the type/function name is enough.

I think, looking at the implementation, it is maybe getting a bit little complex than is necessary, although that is totally understandable for someone new to the codebase. I think @ASuciuX was on the right track in #174 (comment) to say maybe we can do this just with the existing text files in mutants.out.

(In fact, maybe a good initial stand-alone step would be to add an option that runs only mutants explicitly named in some file, or excludes mutants named in some other files, optionally disregarding line/col. That's basically #204. That would support the use case someone mentioned of wanting to do some external filtering and processing of which ones to run. I'm inclined to split that off as a standalone feature, as a kind of step beyond the regex based filters.)

One complication, maybe handled in your PR, I'm not sure, is that even if we're interrupted we shouldn't forget that some things were already caught or unviable...

@cl3joly
Copy link
Author

cl3joly commented Jan 31, 2024

@sourcefrog Thanks for the review!

The other thing I thought of looking at this PR thread is that as you edit the code to fix gaps, it's fairly likely that some mutant line numbers will change, so we probably don't want to match on them in saying which mutants are already covered: probably just matching on the type/function name is enough.

That’s a great point.

I think, looking at the implementation, it is maybe getting a bit little complex than is necessary, although that is totally understandable for someone new to the codebase. I think @ASuciuX was on the right track in #174 (comment) to say maybe we can do this just with the existing text files in mutants.out.

Right, that makes sense.


The above point requires quite an extensive rework of this PR. That’s a fair thing from you to ask, as a maintainer. However, to be transparent, I don’t think I’ll have time to work on it in the next few weeks. If anyone wants to pick this up, please feel free. If not, when I have some time, I’ll come back to this PR to improve it based on your feedback. I hope that works for you @sourcefrog

@sourcefrog
Copy link
Owner

Yep, understandable. If I have free time I may pick this up, based on the ideas developed in your work. If I start on it I'll comment here to avoid duplication, and you can do that too if you come back to it.

@sourcefrog sourcefrog self-assigned this Feb 22, 2024
@sourcefrog
Copy link
Owner

I'll close this and continue in #354. Thanks again for getting it started.

@sourcefrog sourcefrog closed this Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Iterate failures" mode that skips mutations that were caught last time
4 participants