New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry for filesystem changes more quickly, indefinitely, and with logging #10139
Retry for filesystem changes more quickly, indefinitely, and with logging #10139
Conversation
The commits are useful to review independently, but the first two overlap a little bit. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
src/rust/engine/graph/src/tests.rs
Outdated
|
||
let delay_for_root = Duration::from_millis(2000); | ||
let delay_for_mid = Duration::from_millis(2000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mid
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The middle of three.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, maybe say middle
. I spent a solid minute trying to figure out what mid
could be an acronym for.
@@ -22,6 +22,7 @@ | |||
def launch_file_toucher(f): | |||
"""Launch a loop to touch the given file, and return a function to call to stop and join it.""" | |||
if not os.path.isfile(f): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this was intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess which file I was smoke testing with, heh.
inner.unsafe_entry_for_id(entry_id).node().clone() | ||
}; | ||
info!( | ||
"Filesystem changed during run: retrying `{}` in {:?}...", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce the risk of truncation:
"Filesystem changed during run: retrying `{}` in {:?}...", | |
"Filesystem changed: retrying `{}` in {:?}...", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it would be worth rewording so that the time appears before the node description also. I'm concerned some nodes may be really long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to mention "during run", because as you said, this is independent from --loop
... also, I want to send the signal that this is tied specifically to the thing that is re-running (it is: if a file unrelated to the thing that was running is changed, you won't see this).
Regarding truncation: these messages are not truncated. Perhaps they should be, but.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, fair. Thanks
NodeKey::MultiPlatformExecuteProcess(mp_epr) => mp_epr.0.user_facing_name(), | ||
NodeKey::DigestFile(DigestFile(File { path, .. })) => { | ||
Some(format!("Fingerprinting: {}", path.display())) | ||
} | ||
NodeKey::DownloadedFile(..) => None, | ||
NodeKey::DownloadedFile(ref d) => Some(format!("Downloading: {}", d.0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving these. They look great.
@@ -1040,14 +1040,16 @@ impl NodeKey { | |||
fn user_facing_name(&self) -> Option<String> { | |||
match self { | |||
NodeKey::Task(ref task) => task.task.display_info.desc.as_ref().map(|s| s.to_owned()), | |||
NodeKey::Snapshot(_) => Some(format!("{}", self)), | |||
NodeKey::Snapshot(ref s) => Some(format!("Snapshotting: {}", s.0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snapshotting
might not mean much to non-Pants devs. I can't think of a better name though. More something to consider medium term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content of the PathGlobs
object should help... it reads as Snapshotting: PathGlobs(globs=['thing.py'], ..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of which, the PathGlobs
help is very verbose. In a followup, I think it'd be great if it only showed PathGlobs(['foo', '!ignore'])
and not the other fields.
948257e
to
495299c
Compare
Note: unfortunately, rendering just the |
# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
[ci skip-jvm-tests]
…ying. # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
[ci skip-jvm-tests]
# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests]
495299c
to
8fe1bae
Compare
…ging (pantsbuild#10139) As described in pantsbuild#10081, there are a few different concerns around retrying nodes when the filesystem changes: 1. whether we have retried enough times 2. whether we've made it clear to the user that we're retrying 3. whether we retry immediately upon change, or only after something has completed To address each of the above points, we retry: 1. indefinitely 2. with logging 3. immediately upon invalidation of nodes by: * removing the `Running::dirty` flag, and moving a `Running` node to `NotStarted` to trigger invalidation * "aborting"/dropping ongoing work for a canceled attempt using `futures::Abortable` Additionally, improve the `Display` and `user_facing_name` implementations for `Node` to allow for better log messages from the `Graph`. A message like: ``` 12:39:52.93 [INFO] Completed: Building requirements.pex 15:21:58.95 [INFO] Filesystem changed during run: retrying `Test` in 500ms... 15:21:58.96 [INFO] Filesystem changed during run: retrying `@rule coordinator_of_tests((OptionsBootstrapper(args=['--no-v1', '--v2', '--no-process-execution-use-local-cache', 'test', 'tests/python/pants_test/util:'], env={'PANTS_DEV': '1'}, config=ChainedConfig(['/Users/stuhood/src/pants/pants.toml', '/Users/stuhood/.pants.rc'])), WrappedTestFieldSet(field_set=PythonTestFieldSet(address=Address(tests/python/pants_test/util, argutil), origin=SiblingAddresses(directory='tests/python/pants_test/util'), sources=<class 'pants.backend.python.target_types.PythonTestsSources'>(alias='sources', sanitized_raw_value=('test_argutil.py',), default=('test_*.py', '*_test.py', 'tests.py', 'conftest.py')), timeout=pants.backend.python.target_types.PythonTestsTimeout(alias='timeout', value=None, default=None), coverage=pants.backend.python.target_types.PythonCoverage(alias='coverage', value=('pants.util.argutil',), default=None)))))` in 500ms... 12:39:57.17 [INFO] Completed: Building requirements.pex with 1 requirement: dataclasses==0.6 ``` ...is rendered immediately when a file that a test depends on is invalidated. Fixes pantsbuild#10081.
Problem
As described in #10081, there are a few different concerns around retrying nodes when the filesystem changes:
Solution
To address each of the above points, we retry:
Running::dirty
flag, and moving aRunning
node toNotStarted
to trigger invalidationfutures::Abortable
Additionally, improve the
Display
anduser_facing_name
implementations forNode
to allow for better log messages from theGraph
.Result
A message like:
...is rendered immediately when a file that a test depends on is invalidated.
Fixes #10081.