-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching for fine-grained incremental mode #4483
Conversation
9a453cf
to
61c38dd
Compare
d7c3d1e
to
5a7add8
Compare
This now has a test suite and supports doing a fine-grained incremental build to initialize. There is one pretty dodgy hack to disable reading from the cache after the initial build that might need rethinking, and a few test cases have been disabled, but I think this is reviewable. |
(Be sure to remove the [WIP] from the title before submitting.)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, this should improve the performance of cold runs significantly when cache data is available. Most comments are about adding comments.
It would be nice to somehow verify that caching will improve performance. One idea is to verify which modules are fully refreshed in each update. We could ensure that unrelated modules don't get processed.
Note that I wasn't able to test this for real -- I'll ask about this offline.
mypy/build.py
Outdated
@@ -2383,6 +2388,13 @@ def process_graph(graph: Graph, manager: BuildManager) -> None: | |||
manager.log("Processing SCC of size %d (%s) as %s" % (size, scc_str, fresh_msg)) | |||
process_stale_scc(graph, scc, manager) | |||
|
|||
# If we are running in fine-grained incremental mode with caching, | |||
# we need to always process fresh SCCs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also describe why we need to do this in the comment.
mypy/server/update.py
Outdated
@@ -737,7 +743,13 @@ def propagate_changes_using_dependencies( | |||
# TODO: Preserve order (set is not optimal) | |||
for id, nodes in sorted(todo.items(), key=lambda x: x[0]): | |||
assert id not in up_to_date_modules | |||
triggered |= reprocess_nodes(manager, graph, id, nodes, deps) | |||
# TODO: Is there a better way to detect that the file isn't loaded? | |||
if not manager.modules[id].defs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could move this check to a helper method within BuildManager
, which would be a bit cleaner.
mypy/build.py
Outdated
@@ -1131,6 +1131,11 @@ def validate_meta(meta: Optional[CacheMeta], id: str, path: Optional[str], | |||
if not stat.S_ISREG(st.st_mode): | |||
manager.log('Metadata abandoned for {}: file {} does not exist'.format(id, path)) | |||
return None | |||
|
|||
if manager.options.use_fine_grained_cache: | |||
manager.log('Using potentially stale metadata for {}'.format(id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment here describing why we do this.
mypy/dmypy_server.py
Outdated
@@ -262,12 +265,31 @@ def initialize_fine_grained(self, sources: List[mypy.build.BuildSource]) -> Dict | |||
messages = result.errors | |||
manager = result.manager | |||
graph = result.graph | |||
manager.options.cache_dir = os.devnull # XXX: HACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment describing a potential better way to do this?
mypy/server/update.py
Outdated
@@ -318,6 +319,9 @@ def mark_all_meta_as_memory_only(graph: Dict[str, State], | |||
def get_all_dependencies(manager: BuildManager, graph: Dict[str, State], | |||
options: Options) -> Dict[str, Set[str]]: | |||
"""Return the fine-grained dependency map for an entire build.""" | |||
if not options.use_fine_grained_cache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment about the dependency map being included in the cache if cache is enabled.
mypy/server/update.py
Outdated
triggered |= reprocess_nodes(manager, graph, id, nodes, deps) | ||
# TODO: Is there a better way to detect that the file isn't loaded? | ||
if not manager.modules[id].defs: | ||
# We haven't actually loaded this file! Add it to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comment to mention that we've only loaded the cache file for this, so there's no AST yet.
mypy/test/testfinegrained.py
Outdated
@@ -43,15 +43,30 @@ class FineGrainedSuite(DataSuite): | |||
optional_out = True | |||
|
|||
def run_case(self, testcase: DataDrivenTestCase) -> None: | |||
self.run_case_inner(testcase, cache=False) | |||
|
|||
# Reset the test case and run it again with caching on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of running two variants of a test case within each test case, I'd prefer a separate suite that has caching on. A reasonable way to do would be to subclass FineGrainedSuite
for caching test cases. The subclass would be trivial and only set a flag that turns caching on. This way it would be easy to run only the caching or non-caching variant of a test case, making it easier to debug failures.
mypy/test/testfinegrained.py
Outdated
main_src = '\n'.join(testcase.input) | ||
sources_override = self.parse_sources(main_src) | ||
messages, manager, graph = self.build(main_src, testcase, sources_override) | ||
|
||
print("Testing with cache: ", cache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you have two test suites for non-caching and caching variants of test cases, this print statement can be removed. We prefer to produce no output when running passing test cases.
mypy/test/testfinegrained.py
Outdated
if fine_grained_manager is None: | ||
messages, manager, graph = self.build(main_src, testcase, sources_override, | ||
build_cache=False, enable_cache=cache) | ||
manager.options.cache_dir = os.devnull # XXX: HACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document the purpose the hack and potentially a better way to do this.
@@ -237,7 +237,9 @@ a.py:1: error: invalid syntax | |||
b.py:3: error: Too many arguments for "f" | |||
a.py:3: error: Too many arguments for "g" | |||
|
|||
[case testDeleteFileWithBlockingError] | |||
[case testDeleteFileWithBlockingError-skip] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about only skipping this only in caching test cases? It would give us better test coverage. For example, add another test case name suffix that causes these test cases to be skipped in caching tests only. You could this mechanism to have two variants of each test case with different outputs as well (caching / non-caching) by having two suffixes.
Also, please create issue(s) about fixing the remaining skipped test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now, feel free to merge after having looked at my comments (only minor things). I didn't test this yet. It'll be easier to test this and iterate on this once this has been merged.
mypy/server/update.py
Outdated
"""Transitively rechecks targets based on triggers and the dependency map. | ||
|
||
Returns a list (module id, path) tuples representing modules that contain | ||
a target that needs to be reprocessed but that has not been parsed yet.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: move """ to a separate line.
mypy/test/testfinegrained.py
Outdated
# as skipped, not just elided. | ||
def should_skip(self, testcase: DataDrivenTestCase) -> bool: | ||
if self.use_cache: | ||
if testcase.name.endswith("-skip-cache"): return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: move return True
to a separate line.
mypy/test/testfinegrained.py
Outdated
if testcase.name.endswith("-skip-cache"): return True | ||
# TODO: In caching mode we currently don't well support | ||
# starting from cached states with errors in them. | ||
if testcase.output and testcase.output[0] != '==': return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above.
mypy/test/testfinegrained.py
Outdated
# starting from cached states with errors in them. | ||
if testcase.output and testcase.output[0] != '==': return True | ||
else: | ||
if testcase.name.endswith("-skip-nocache"): return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above.
mypy/test/testfinegrained.py
Outdated
@@ -90,7 +100,7 @@ def run_case_inner(self, testcase: DataDrivenTestCase, cache: bool) -> None: | |||
# cache, now we need to set it up | |||
if fine_grained_manager is None: | |||
messages, manager, graph = self.build(main_src, testcase, sources_override, | |||
build_cache=False, enable_cache=cache) | |||
build_cache=False, enable_cache=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the change to a True
argument on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. This path only executes when the cache is on.
* master: (32 commits) Fix some fine-grained cache/fswatcher problems (python#4560) Sync typeshed (python#4559) Add _cached suffix to test cases in fine-grained tests with cache (python#4558) Add back support for simplified fine-grained logging (python#4557) Type checking of class decorators (python#4544) Sync typeshed (python#4556) When loading from a fine-grained cache, use the real path, not the cached (python#4555) Switch all of the fine-grained debug logging to use manager.log (python#4550) Caching for fine-grained incremental mode (python#4483) Fix --warn-return-any for NotImplemented (python#4545) Remove myunit (python#4369) Store line numbers of imports in the cache metadata (python#4533) README.md: Fix a typo (python#4529) Enable generation and caching of fine-grained dependencies from normal runs (python#4526) Move argument parsing for the fine-grained flag into the main arg parsing code (python#4524) Don't warn about unrecognized options starting with 'x_' (python#4522) stubgen: don't append star arg when args list already has varargs appended (python#4518) Handle TypedDict in diff and deps (python#4510) Fix Options.__repr__ to not infinite recurse (python#4514) Fix some fine-grained incremental bugs with newly imported files (python#4502) ...
Work in progress for cache loading for fine-grained incremental mode.
I've tested it some manually and it seems to work all right. This needs a test suite.
A good improvement would be to load from caches without doing a full build.