Refine cache for typing the buffer #1637

pitag-ha · 2023-06-26T12:36:56Z

The buffer cache splits the parsetree into "last typing result can be re-used" and "needs to be typed". That splitting is currently only done on the level of top-level structure/signature items. Let's refine it on two levels:

module (type) definitions/declarations
function definitions

pitag-ha · 2023-10-23T16:20:04Z

Let me give a bit more detail on this:

Merlin needs the typedtree of the current buffer for most of its queries. The typedtree is produced in two steps (up to details): First, the buffer's source code is parsed into the parsetree. Then, the parsetree is typed, producing the typedtree. Since the typing is very time-intense, that typedtree is cached.

So, if the next query is run on the same buffer, Merlin pulls as much as it can from the cached typedtree. If the source code wasn't modified, it can pull the whole typedtree from cache. If it was modified, it can only pull a part of the typedtree, concretely the part before the modification. Everything from the modification on needs to be retyped.

This issue is about the granularity with which Merlin splits the tree into "before modification" and "after modification". The current workflow is as follows: The parsetree is a list, concretely a list of structure items for ml-files (aka structure) or a list of signature items for mli-files (aka signature). Merlin goes through the new parsetree and the old parsetree simultaneously and looks for the first item in the list that's different between the two. It does so via a polymorphic compare between the two items. So the splitting is done on top-level structure/signature items.

This issue suggests to refine the splitting. Let us only focus on refining it to structures or signatures inside of modules or module types (i.e. the first point above). For that, I suggest the following steps:

Write tests
- Integrity test, e.g. queries on nodes right after a modification on a typed buffer whose reply depends on whether the node was retyped (expected behavior) or not (we've messed something up).
- Cache efficiency test: Query something on typed buffer that was modified inside a module definition; print Merlin's log statement of how many items were reused from the buffer cache.
Turn the polymorphic compare of structure/signature items into an explicit compare. You can use the Ast_mapper module for that.
Refine the newly created explicit compare so that it captures the situation "the two items are module definitions, both module expressions contain a structure, and the two structures coincide up to item X" (and similarly for signatures). This also means that the prefix of the tree that can be re-used, which currently is a list of structure/signature items, as well as the re-typing need to gain complexity.

3Rafal · 2023-10-26T14:21:30Z

I've investigated this issue with @voodoos and the problem seems to be
quite challenging. Suppose that we have this code

let a = 0

module M = struct
  let x = 1
  let y = 2
  let z = 3
end

which results in this parsetree:

  structure_item (*buffer*[1,0+0]..[1,0+9])
    Pstr_value Nonrec
    [
      <def>
        pattern (*buffer*[1,0+4]..[1,0+5])
          Ppat_var \"a\" (*buffer*[1,0+4]..[1,0+5])
        expression (*buffer*[1,0+8]..[1,0+9])
          Pexp_constant PConst_int (0,None)
    ]
  structure_item (*buffer*[3,11+0]..[7,65+3])
    Pstr_module
    \"M\" (*buffer*[3,11+7]..[3,11+8])
      module_expr (*buffer*[3,11+11]..[7,65+3])
        Pmod_structure
        [
          structure_item (*buffer*[4,29+2]..[4,29+11])
            Pstr_value Nonrec
            [
              <def>
                pattern (*buffer*[4,29+6]..[4,29+7])
                  Ppat_var \"x\" (*buffer*[4,29+6]..[4,29+7])
                expression (*buffer*[4,29+10]..[4,29+11])
                  Pexp_constant PConst_int (1,None)
            ]
          structure_item (*buffer*[5,41+2]..[5,41+11])
            Pstr_value Nonrec
            [
              <def>
                pattern (*buffer*[5,41+6]..[5,41+7])
                  Ppat_var \"y\" (*buffer*[5,41+6]..[5,41+7])
                expression (*buffer*[5,41+10]..[5,41+11])
                  Pexp_constant PConst_int (2,None)
            ]
          structure_item (*buffer*[6,53+2]..[6,53+11])
            Pstr_value Nonrec
            [
              <def>
                pattern (*buffer*[6,53+6]..[6,53+7])
                  Ppat_var \"z\" (*buffer*[6,53+6]..[6,53+7])
                expression (*buffer*[6,53+10]..[6,53+11])
                  Pexp_constant PConst_int (3,None)
            ]
        ]
]

We can see that module contains a list of structure_items. The part that I find less challenging is to create a new comparison, that would traverse the inner list of module items. This still isn't easy though.

However, the biggest challenge is how to type modules partially. We would need to be able to type a module with part of it's items, and then have a way of simply appending it to already typed module. This task seems to require thorough research, as I don't see a clear way how it should be done. I also can't estimate a time cost of this task and even it's doability.

This was referenced Oct 24, 2023

Improve the granularity of the typer cache to maximise reuse. #1694

Closed

[WIP] Cache granularity #1696

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine cache for typing the buffer #1637

Refine cache for typing the buffer #1637

pitag-ha commented Jun 26, 2023

pitag-ha commented Oct 23, 2023

3Rafal commented Oct 26, 2023

Refine cache for typing the buffer #1637

Refine cache for typing the buffer #1637

Comments

pitag-ha commented Jun 26, 2023

pitag-ha commented Oct 23, 2023

3Rafal commented Oct 26, 2023