Caching for fine-grained incremental mode #4483

msullivan · 2018-01-18T01:46:56Z

Work in progress for cache loading for fine-grained incremental mode.

I've tested it some manually and it seems to work all right. This needs a test suite.

A good improvement would be to load from caches without doing a full build.

msullivan · 2018-02-06T01:52:54Z

This now has a test suite and supports doing a fine-grained incremental build to initialize.

There is one pretty dodgy hack to disable reading from the cache after the initial build that might need rethinking, and a few test cases have been disabled, but I think this is reviewable.

gvanrossum · 2018-02-06T04:03:46Z

(Be sure to remove the [WIP] from the title before submitting.)

JukkaL

Great, this should improve the performance of cold runs significantly when cache data is available. Most comments are about adding comments.

It would be nice to somehow verify that caching will improve performance. One idea is to verify which modules are fully refreshed in each update. We could ensure that unrelated modules don't get processed.

Note that I wasn't able to test this for real -- I'll ask about this offline.

JukkaL · 2018-02-06T12:05:33Z

mypy/build.py

@@ -2383,6 +2388,13 @@ def process_graph(graph: Graph, manager: BuildManager) -> None:
                manager.log("Processing SCC of size %d (%s) as %s" % (size, scc_str, fresh_msg))
            process_stale_scc(graph, scc, manager)

+    # If we are running in fine-grained incremental mode with caching,
+    # we need to always process fresh SCCs.


Also describe why we need to do this in the comment.

JukkaL · 2018-02-06T12:11:14Z

mypy/server/update.py

@@ -737,7 +743,13 @@ def propagate_changes_using_dependencies(
        # TODO: Preserve order (set is not optimal)
        for id, nodes in sorted(todo.items(), key=lambda x: x[0]):
            assert id not in up_to_date_modules
-            triggered |= reprocess_nodes(manager, graph, id, nodes, deps)
+            # TODO: Is there a better way to detect that the file isn't loaded?
+            if not manager.modules[id].defs:


You could move this check to a helper method within BuildManager, which would be a bit cleaner.

JukkaL · 2018-02-06T12:26:16Z

mypy/build.py

@@ -1131,6 +1131,11 @@ def validate_meta(meta: Optional[CacheMeta], id: str, path: Optional[str],
    if not stat.S_ISREG(st.st_mode):
        manager.log('Metadata abandoned for {}: file {} does not exist'.format(id, path))
        return None
+
+    if manager.options.use_fine_grained_cache:
+        manager.log('Using potentially stale metadata for {}'.format(id))


Add comment here describing why we do this.

JukkaL · 2018-02-06T13:59:05Z

mypy/dmypy_server.py

@@ -262,12 +265,31 @@ def initialize_fine_grained(self, sources: List[mypy.build.BuildSource]) -> Dict
        messages = result.errors
        manager = result.manager
        graph = result.graph
+        manager.options.cache_dir = os.devnull  # XXX: HACK


Add comment describing a potential better way to do this?

JukkaL · 2018-02-06T14:12:01Z

mypy/server/update.py

@@ -318,6 +319,9 @@ def mark_all_meta_as_memory_only(graph: Dict[str, State],
 def get_all_dependencies(manager: BuildManager, graph: Dict[str, State],
                         options: Options) -> Dict[str, Set[str]]:
    """Return the fine-grained dependency map for an entire build."""
+    if not options.use_fine_grained_cache:


Add comment about the dependency map being included in the cache if cache is enabled.

JukkaL · 2018-02-06T14:14:10Z

mypy/server/update.py

-            triggered |= reprocess_nodes(manager, graph, id, nodes, deps)
+            # TODO: Is there a better way to detect that the file isn't loaded?
+            if not manager.modules[id].defs:
+                # We haven't actually loaded this file! Add it to the


Update comment to mention that we've only loaded the cache file for this, so there's no AST yet.

JukkaL · 2018-02-06T14:17:22Z

mypy/test/testfinegrained.py

@@ -43,15 +43,30 @@ class FineGrainedSuite(DataSuite):
    optional_out = True

    def run_case(self, testcase: DataDrivenTestCase) -> None:
+        self.run_case_inner(testcase, cache=False)
+
+        # Reset the test case and run it again with caching on


Instead of running two variants of a test case within each test case, I'd prefer a separate suite that has caching on. A reasonable way to do would be to subclass FineGrainedSuite for caching test cases. The subclass would be trivial and only set a flag that turns caching on. This way it would be easy to run only the caching or non-caching variant of a test case, making it easier to debug failures.

JukkaL · 2018-02-06T14:20:06Z

mypy/test/testfinegrained.py

        main_src = '\n'.join(testcase.input)
        sources_override = self.parse_sources(main_src)
-        messages, manager, graph = self.build(main_src, testcase, sources_override)
-
+        print("Testing with cache: ", cache)


Once you have two test suites for non-caching and caching variants of test cases, this print statement can be removed. We prefer to produce no output when running passing test cases.

JukkaL · 2018-02-06T14:20:55Z

mypy/test/testfinegrained.py

+            if fine_grained_manager is None:
+                messages, manager, graph = self.build(main_src, testcase, sources_override,
+                                                      build_cache=False, enable_cache=cache)
+                manager.options.cache_dir = os.devnull  # XXX: HACK


Document the purpose the hack and potentially a better way to do this.

JukkaL · 2018-02-06T14:22:51Z

test-data/unit/fine-grained-blockers.test

@@ -237,7 +237,9 @@ a.py:1: error: invalid syntax
 b.py:3: error: Too many arguments for "f"
 a.py:3: error: Too many arguments for "g"

-[case testDeleteFileWithBlockingError]
+[case testDeleteFileWithBlockingError-skip]


What about only skipping this only in caching test cases? It would give us better test coverage. For example, add another test case name suffix that causes these test cases to be skipped in caching tests only. You could this mechanism to have two variants of each test case with different outputs as well (caching / non-caching) by having two suffixes.

Also, please create issue(s) about fixing the remaining skipped test cases.

…-cache

JukkaL

Looks good now, feel free to merge after having looked at my comments (only minor things). I didn't test this yet. It'll be easier to test this and iterate on this once this has been merged.

JukkaL · 2018-02-07T18:25:43Z

mypy/server/update.py

+    """Transitively rechecks targets based on triggers and the dependency map.
+
+    Returns a list (module id, path) tuples representing modules that contain
+    a target that needs to be reprocessed but that has not been parsed yet."""


Style nit: move """ to a separate line.

JukkaL · 2018-02-07T18:27:55Z

mypy/test/testfinegrained.py

+    # as skipped, not just elided.
+    def should_skip(self, testcase: DataDrivenTestCase) -> bool:
+        if self.use_cache:
+            if testcase.name.endswith("-skip-cache"): return True


Style nit: move return True to a separate line.

JukkaL · 2018-02-07T18:28:03Z

mypy/test/testfinegrained.py

+            if testcase.name.endswith("-skip-cache"): return True
+            # TODO: In caching mode we currently don't well support
+            # starting from cached states with errors in them.
+            if testcase.output and testcase.output[0] != '==': return True


Similar to above.

JukkaL · 2018-02-07T18:28:09Z

mypy/test/testfinegrained.py

+            # starting from cached states with errors in them.
+            if testcase.output and testcase.output[0] != '==': return True
+        else:
+            if testcase.name.endswith("-skip-nocache"): return True


Similar to above.

JukkaL · 2018-02-07T18:29:25Z

mypy/test/testfinegrained.py

@@ -90,7 +100,7 @@ def run_case_inner(self, testcase: DataDrivenTestCase, cache: bool) -> None:
            # cache, now we need to set it up
            if fine_grained_manager is None:
                messages, manager, graph = self.build(main_src, testcase, sources_override,
-                                                      build_cache=False, enable_cache=cache)
+                                                      build_cache=False, enable_cache=True)


Is the change to a True argument on purpose?

Yeah. This path only executes when the cache is on.

* master: (32 commits) Fix some fine-grained cache/fswatcher problems (python#4560) Sync typeshed (python#4559) Add _cached suffix to test cases in fine-grained tests with cache (python#4558) Add back support for simplified fine-grained logging (python#4557) Type checking of class decorators (python#4544) Sync typeshed (python#4556) When loading from a fine-grained cache, use the real path, not the cached (python#4555) Switch all of the fine-grained debug logging to use manager.log (python#4550) Caching for fine-grained incremental mode (python#4483) Fix --warn-return-any for NotImplemented (python#4545) Remove myunit (python#4369) Store line numbers of imports in the cache metadata (python#4533) README.md: Fix a typo (python#4529) Enable generation and caching of fine-grained dependencies from normal runs (python#4526) Move argument parsing for the fine-grained flag into the main arg parsing code (python#4524) Don't warn about unrecognized options starting with 'x_' (python#4522) stubgen: don't append star arg when args list already has varargs appended (python#4518) Handle TypedDict in diff and deps (python#4510) Fix Options.__repr__ to not infinite recurse (python#4514) Fix some fine-grained incremental bugs with newly imported files (python#4502) ...

msullivan force-pushed the fg-cache branch 2 times, most recently from 9a453cf to 61c38dd Compare January 18, 2018 21:25

msullivan force-pushed the fg-cache branch 2 times, most recently from d7c3d1e to 5a7add8 Compare February 1, 2018 01:57

msullivan added 4 commits February 5, 2018 14:02

Support loading from cache in fine-grained incremental mdoe

22918dc

Do a fine-grained incremental run when loading from cache

e5afefd

Fix some bugs in fg cache loading

b1a5a25

Run fine-grained tests using a cache

e386820

msullivan force-pushed the fg-cache branch from 5a7add8 to e386820 Compare February 6, 2018 01:48

msullivan changed the title ~~[WIP] Caching for fine-grained incremental mode~~ Caching for fine-grained incremental mode Feb 6, 2018

msullivan requested a review from JukkaL February 6, 2018 01:51

JukkaL reviewed Feb 6, 2018

View reviewed changes

msullivan added 3 commits February 6, 2018 15:35

Some code/comment cleanup

39c2709

Make the fg-cache test a separate suite.

22dd19a

Avoid needing to specify --cache-fine-grained with --use-fine-grained…

bbd49a2

…-cache

JukkaL approved these changes Feb 7, 2018

View reviewed changes

style tweaks

376db14

msullivan merged commit 4352a44 into master Feb 7, 2018

msullivan deleted the fg-cache branch February 7, 2018 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching for fine-grained incremental mode #4483

Caching for fine-grained incremental mode #4483

msullivan commented Jan 18, 2018

msullivan commented Feb 6, 2018

gvanrossum commented Feb 6, 2018 via email

JukkaL left a comment

JukkaL Feb 6, 2018

JukkaL Feb 6, 2018

JukkaL Feb 6, 2018

JukkaL Feb 6, 2018

JukkaL Feb 6, 2018

JukkaL Feb 6, 2018

JukkaL Feb 6, 2018

JukkaL Feb 6, 2018

JukkaL Feb 6, 2018

JukkaL Feb 6, 2018

JukkaL left a comment

JukkaL Feb 7, 2018

JukkaL Feb 7, 2018

JukkaL Feb 7, 2018

JukkaL Feb 7, 2018

JukkaL Feb 7, 2018

msullivan Feb 7, 2018

Caching for fine-grained incremental mode #4483

Caching for fine-grained incremental mode #4483

Conversation

msullivan commented Jan 18, 2018

msullivan commented Feb 6, 2018

gvanrossum commented Feb 6, 2018 via email

JukkaL left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JukkaL left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment