New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore paths more deeply to avoid graph impact when ignored files are added/removed #6752

Merged
merged 2 commits into from Nov 9, 2018

Conversation

Projects
None yet
5 participants
@stuhood
Member

stuhood commented Nov 9, 2018

Problem

The path ignoring that is controlled by the --pants-ignore global option currently ignores paths during glob expansion. But glob expansion consumes cached filesystem operations, meaning that those cached filesystem operations are not pre-filtered for ignored files. The effect is that the cached DirectoryListing for a directory still changes when ignored files are added and removed.

Solution

Push path filtering down to scandir_sync, so that it is below the Scandir/DirectoryListing caching. In a separate commit, log both the number of cleared and dirtied nodes.

Result

Node dirtying should be effective in more cases, because creating/removing a file at an ignored path will only clear a single node, while all other nodes can be cleaned without re-running.

Example for creating a file ignore_this_file after running ./pants --enable-pantsd --pants-ignore='+["ignore_this_file"]' list :: :

# Before
I1109 12:14:50.272815 86270 scheduler_service.py:88] enqueuing 1 changes for subscription all_files
I1109 12:14:50.306777 86270 scheduler.py:282] invalidated 10433 nodes for: set([u'', u'ignore_this_file'])
I1109 12:14:54.746129 86270 pailgun_server.py:72] handling pailgun request: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`
W1109 12:14:54.918494 86270 native.py:413] Globs did not match. Excludes were: []. Unmatched globs were: ["testprojects/src/java/org/pantsbuild/testproject/bundle/file1.aaaa", "testprojects/src/java/org/pantsbuild/testproject/bundle/file2.aaaa"].
<snip more running globs>
I1109 12:14:56.162152 86270 pailgun_server.py:81] pailgun request completed: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`

# After
I1109 12:11:46.683013 83652 scheduler_service.py:88] enqueuing 1 changes for subscription all_files
W1109 12:11:46.719578 83652 native.py:413] invalidation: cleared 1 and dirtied 10432 nodes for: {"", "ignore_this_file"}
I1109 12:11:51.063949 83652 pailgun_server.py:72] handling pailgun request: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`
I1109 12:11:51.748687 83652 pailgun_server.py:81] pailgun request completed: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`

@stuhood stuhood force-pushed the twitter:stuhood/ignore-in-scandir branch from 8c2b866 to 059f795 Nov 9, 2018

@jsirois

jsirois approved these changes Nov 9, 2018

A test would be great to have.

@ity

ity approved these changes Nov 9, 2018 edited

+1 to adding a test

@stuhood

This comment has been minimized.

Member

stuhood commented Nov 9, 2018

Creating a non-fragile test is going to take some refactoring in this case, because we're missing the framework we'd need to test how many nodes needed to re-run in the context of a full pants run. There are solid tests of invalidation and precise node runs in https://github.com/pantsbuild/pants/blob/master/src/rust/engine/graph/src/lib.rs#L787 , but having tests that can inspect which @rules needed to run in the context of a complete pants run is a gap right now.

I could create a pantsd test that greps logs for the patterns seen above, but I think I'd prefer to open a followup ticket with a suggestion of how to do this holistically.

@cosmicexplorer

I added #6753 to cover the holes in our test infrastructure which make it difficult to add tests for cases like these. The solution to that ticket will likely arrive in multiple parts and should be scheduled alongside upcoming feature work or bug fixing in node dirtying or @rule execution.

@jsirois

This comment has been minimized.

Member

jsirois commented Nov 9, 2018

Thanks guys. I'm fine with a defer to #6753

@stuhood

This comment has been minimized.

Member

stuhood commented Nov 9, 2018

One flaky test. Landing.

@stuhood stuhood added this to the 1.11.x milestone Nov 9, 2018

@stuhood stuhood merged commit ca509a7 into pantsbuild:master Nov 9, 2018

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

@stuhood stuhood deleted the twitter:stuhood/ignore-in-scandir branch Nov 9, 2018

stuhood added a commit that referenced this pull request Nov 9, 2018

Ignore paths more deeply to avoid graph impact when ignored files are…
… added/removed (#6752)

The path ignoring that is controlled by the `--pants-ignore` global option currently ignores paths during glob expansion. But glob expansion consumes cached filesystem operations, meaning that those cached filesystem operations are not pre-filtered for ignored files. The effect is that the cached `DirectoryListing` for a directory still changes when ignored files are added and removed.

Push path filtering down to `scandir_sync`, so that it is below the `Scandir`/`DirectoryListing` caching. In a separate commit, log both the number of `cleared` and `dirtied` nodes.

Node dirtying should be effective in more cases, because creating/removing a file at an ignored path will only clear a single node, while all other nodes can be cleaned without re-running.

Example for creating a file `ignore_this_file` after running `./pants --enable-pantsd --pants-ignore='+["ignore_this_file"]' list ::` :
```
I1109 12:14:50.272815 86270 scheduler_service.py:88] enqueuing 1 changes for subscription all_files
I1109 12:14:50.306777 86270 scheduler.py:282] invalidated 10433 nodes for: set([u'', u'ignore_this_file'])
I1109 12:14:54.746129 86270 pailgun_server.py:72] handling pailgun request: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`
W1109 12:14:54.918494 86270 native.py:413] Globs did not match. Excludes were: []. Unmatched globs were: ["testprojects/src/java/org/pantsbuild/testproject/bundle/file1.aaaa", "testprojects/src/java/org/pantsbuild/testproject/bundle/file2.aaaa"].
<snip more running globs>
I1109 12:14:56.162152 86270 pailgun_server.py:81] pailgun request completed: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`

I1109 12:11:46.683013 83652 scheduler_service.py:88] enqueuing 1 changes for subscription all_files
W1109 12:11:46.719578 83652 native.py:413] invalidation: cleared 1 and dirtied 10432 nodes for: {"", "ignore_this_file"}
I1109 12:11:51.063949 83652 pailgun_server.py:72] handling pailgun request: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`
I1109 12:11:51.748687 83652 pailgun_server.py:81] pailgun request completed: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`
```

wisechengyi added a commit to wisechengyi/pants that referenced this pull request Nov 12, 2018

Ignore paths more deeply to avoid graph impact when ignored files are…
… added/removed (pantsbuild#6752)

### Problem

The path ignoring that is controlled by the `--pants-ignore` global option currently ignores paths during glob expansion. But glob expansion consumes cached filesystem operations, meaning that those cached filesystem operations are not pre-filtered for ignored files. The effect is that the cached `DirectoryListing` for a directory still changes when ignored files are added and removed.

### Solution

Push path filtering down to `scandir_sync`, so that it is below the `Scandir`/`DirectoryListing` caching. In a separate commit, log both the number of `cleared` and `dirtied` nodes.

### Result

Node dirtying should be effective in more cases, because creating/removing a file at an ignored path will only clear a single node, while all other nodes can be cleaned without re-running.

Example for creating a file `ignore_this_file` after running `./pants --enable-pantsd --pants-ignore='+["ignore_this_file"]' list ::` :
```
# Before
I1109 12:14:50.272815 86270 scheduler_service.py:88] enqueuing 1 changes for subscription all_files
I1109 12:14:50.306777 86270 scheduler.py:282] invalidated 10433 nodes for: set([u'', u'ignore_this_file'])
I1109 12:14:54.746129 86270 pailgun_server.py:72] handling pailgun request: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`
W1109 12:14:54.918494 86270 native.py:413] Globs did not match. Excludes were: []. Unmatched globs were: ["testprojects/src/java/org/pantsbuild/testproject/bundle/file1.aaaa", "testprojects/src/java/org/pantsbuild/testproject/bundle/file2.aaaa"].
<snip more running globs>
I1109 12:14:56.162152 86270 pailgun_server.py:81] pailgun request completed: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`

# After
I1109 12:11:46.683013 83652 scheduler_service.py:88] enqueuing 1 changes for subscription all_files
W1109 12:11:46.719578 83652 native.py:413] invalidation: cleared 1 and dirtied 10432 nodes for: {"", "ignore_this_file"}
I1109 12:11:51.063949 83652 pailgun_server.py:72] handling pailgun request: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`
I1109 12:11:51.748687 83652 pailgun_server.py:81] pailgun request completed: `./pants --enable-pantsd --pants-ignore=+["ignore_this_file"] list ::`
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment