-
-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cache] Cache unchanged files on --dry-run #3834
[cache] Cache unchanged files on --dry-run #3834
Conversation
@yguedidi I have updated the implementation as you have suggested. There is a End to end test - e2e/timeout-file-not-cached failing, but I am not sure if this is a problem in this particular case, maybe its the test that should be changed somehow. The test case is about parallel process timing out, which happens on the first run, but thanks to the caching change in this PR does not happen on the second run (with cache), because the parallel process actually manages to save the file to cache after the timeout somehow. And the following process run with cache, thanks to this, is able to finish within that 1 second timer, producing a different output (since no timeout happens). What do you think about this? Also, there was one other failing test |
The order test expectation fixed in latest main branch, you need to rebase. The current e2e test should keep working with expected crash as currently shown, as they demo the double error keep showing on multiple run. |
That means that the timeout has to happen even when the cache is in effect, which we could perhaps make somehow happen in the e2e test scenario, by making the timeout 0.01s or by adding larger volume so even with cache, the timeout happens, but that does not seem right. In real-world scenario, we have no control over volume / cpu speed / timeout duration, etc, so it is impossible to guarrantee that the Timeout happens with cache just as it does without cache. To do so would be to artificially force with cache runs to be just as slow as those without the cache. Does that argument make sense? I propose excluding timeout exception from the "same output with or without cache" rule based on the above. |
The e2e demonstrate crash on timeout, it configured with 1 second on purpose, that need to be shown again on run multiple times.
|
@samsonasik with cache it easily finishes under 1 second. I don't think that is wrong, which is probably what we don't agree on. The "same output with or without cache" makes sense probably for all cases but the ones where the only difference is processing time. |
@dorrogeray thanks for the work on this! that test is using a big ruleset, so shouldn't take less than 1 seconds. and in that case the file shouldn't be marked as cachable and so not cached, so the second run fails too. |
Yes, I think this is +- what happens:
I assume that the child worker is who decides about marking the file as cacheable? If the child worker runs a bit longer after termination signal, it might not have a reason to not cache it.. Or something along those lines.. |
@dorrogeray if I remember correctly, it's the job worker that fails on the memory limit, as the memory limit is passed to the command. |
c311017
to
7d71fe5
Compare
I have done some further investigation, and have updated the PR with a change which would make the main process clear the files cached by the worker process if that process times out. Here is PR which would be necessary in The findings:
I still think that this is an edge case, and the easiest solution would be just to allow different outputs if caching of unchanged files makes the second run finish faster, but I would be repeating my previous argument. Also, I do not know what is the proper way of sending PR's to |
@@ -97,7 +97,7 @@ public function run(Encoder $encoder, Decoder $decoder, Configuration $configura | |||
|
|||
if ($errorAndFileDiffs[Bridge::SYSTEM_ERRORS] !== []) { | |||
$this->invalidateFile($file); | |||
} elseif (! $configuration->isDryRun()) { | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this require e2e test for:
- after multiple --dry-run keep show the diff, run apply changes to still apply the change.
For note: I think if you want current e2e timeout be deleted, that need to be replaced with "real crash use case", eg: can create a custom rule in e2e that must be crash, eg: echo dump($If_->else) node which the else still null so that will always crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samsonasik I have opened a new PR Cache unchanged files on dry run v2, it seems that all the tests are passing now. It looks like that this rework of the timeout test fixed the issue. Additionaly, I have updated the e2e_consecutive_changes.yaml
to include two dry runs, covering the test case you mentioned. Please let me know if this is sufficient for the PR to get merged, thanks!
// This sleep has to be here, because event though we have called $this->processPool->quitAll(), | ||
// it takes some time for the child processes to actually die, and if we would delete the offending cache | ||
// files right away, they could still write them "back" before they die | ||
sleep(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am interesting with this sleep(1);
part, I can found a usecase in the past which CTRL+C sometime still keep running other process in the background, and the file keep changing, especially when running from IDE.
Could you cherry pick this part into separate PR? Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cherrypicked into #4280
Fixed in #4281 |
PR for issue rectorphp/rector#7932
The idea is to cache unchanged files on --dry-run, so subsequent --dry-runs can be faster, while always producing same results.