Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine cache for typing the buffer #1637

Open
2 tasks
pitag-ha opened this issue Jun 26, 2023 · 2 comments
Open
2 tasks

Refine cache for typing the buffer #1637

pitag-ha opened this issue Jun 26, 2023 · 2 comments

Comments

@pitag-ha
Copy link
Member

The buffer cache splits the parsetree into "last typing result can be re-used" and "needs to be typed". That splitting is currently only done on the level of top-level structure/signature items. Let's refine it on two levels:

  • module (type) definitions/declarations
  • function definitions
@pitag-ha
Copy link
Member Author

Let me give a bit more detail on this:

Merlin needs the typedtree of the current buffer for most of its queries. The typedtree is produced in two steps (up to details): First, the buffer's source code is parsed into the parsetree. Then, the parsetree is typed, producing the typedtree. Since the typing is very time-intense, that typedtree is cached.

So, if the next query is run on the same buffer, Merlin pulls as much as it can from the cached typedtree. If the source code wasn't modified, it can pull the whole typedtree from cache. If it was modified, it can only pull a part of the typedtree, concretely the part before the modification. Everything from the modification on needs to be retyped.

This issue is about the granularity with which Merlin splits the tree into "before modification" and "after modification". The current workflow is as follows: The parsetree is a list, concretely a list of structure items for ml-files (aka structure) or a list of signature items for mli-files (aka signature). Merlin goes through the new parsetree and the old parsetree simultaneously and looks for the first item in the list that's different between the two. It does so via a polymorphic compare between the two items. So the splitting is done on top-level structure/signature items.

This issue suggests to refine the splitting. Let us only focus on refining it to structures or signatures inside of modules or module types (i.e. the first point above). For that, I suggest the following steps:

  • Write tests
    • Integrity test, e.g. queries on nodes right after a modification on a typed buffer whose reply depends on whether the node was retyped (expected behavior) or not (we've messed something up).
    • Cache efficiency test: Query something on typed buffer that was modified inside a module definition; print Merlin's log statement of how many items were reused from the buffer cache.
  • Turn the polymorphic compare of structure/signature items into an explicit compare. You can use the Ast_mapper module for that.
  • Refine the newly created explicit compare so that it captures the situation "the two items are module definitions, both module expressions contain a structure, and the two structures coincide up to item X" (and similarly for signatures). This also means that the prefix of the tree that can be re-used, which currently is a list of structure/signature items, as well as the re-typing need to gain complexity.

@3Rafal
Copy link
Collaborator

3Rafal commented Oct 26, 2023

I've investigated this issue with @voodoos and the problem seems to be
quite challenging. Suppose that we have this code

let a = 0

module M = struct
  let x = 1
  let y = 2
  let z = 3
end

which results in this parsetree:

  structure_item (*buffer*[1,0+0]..[1,0+9])
    Pstr_value Nonrec
    [
      <def>
        pattern (*buffer*[1,0+4]..[1,0+5])
          Ppat_var \"a\" (*buffer*[1,0+4]..[1,0+5])
        expression (*buffer*[1,0+8]..[1,0+9])
          Pexp_constant PConst_int (0,None)
    ]
  structure_item (*buffer*[3,11+0]..[7,65+3])
    Pstr_module
    \"M\" (*buffer*[3,11+7]..[3,11+8])
      module_expr (*buffer*[3,11+11]..[7,65+3])
        Pmod_structure
        [
          structure_item (*buffer*[4,29+2]..[4,29+11])
            Pstr_value Nonrec
            [
              <def>
                pattern (*buffer*[4,29+6]..[4,29+7])
                  Ppat_var \"x\" (*buffer*[4,29+6]..[4,29+7])
                expression (*buffer*[4,29+10]..[4,29+11])
                  Pexp_constant PConst_int (1,None)
            ]
          structure_item (*buffer*[5,41+2]..[5,41+11])
            Pstr_value Nonrec
            [
              <def>
                pattern (*buffer*[5,41+6]..[5,41+7])
                  Ppat_var \"y\" (*buffer*[5,41+6]..[5,41+7])
                expression (*buffer*[5,41+10]..[5,41+11])
                  Pexp_constant PConst_int (2,None)
            ]
          structure_item (*buffer*[6,53+2]..[6,53+11])
            Pstr_value Nonrec
            [
              <def>
                pattern (*buffer*[6,53+6]..[6,53+7])
                  Ppat_var \"z\" (*buffer*[6,53+6]..[6,53+7])
                expression (*buffer*[6,53+10]..[6,53+11])
                  Pexp_constant PConst_int (3,None)
            ]
        ]
]

We can see that module contains a list of structure_items. The part that I find less challenging is to create a new comparison, that would traverse the inner list of module items. This still isn't easy though.

However, the biggest challenge is how to type modules partially. We would need to be able to type a module with part of it's items, and then have a way of simply appending it to already typed module. This task seems to require thorough research, as I don't see a clear way how it should be done. I also can't estimate a time cost of this task and even it's doability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants