Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't run InteractiveDriver.run if the content didn't "change" #4225

Merged
merged 4 commits into from
Aug 19, 2022

Conversation

tanishiking
Copy link
Member

@tanishiking tanishiking commented Aug 5, 2022

If the content didn't change since the last driver.run, do not compile
because the compilation unit should have the up to date typed tree.
Skip compilation saves up much of CPU usage when users are reading
source code (and hover on symbols) without modifying the source code.


What happens if we change in a file affect another file's type
information?

For example, when we have the following source files, if we hover on
B.foo in A.scala, Metals shows the symbol's type is Int.

// A.scala
def main = println(B.foo)

// B.scala
object B:
  def foo: Int = ???

When we modify B.scala to def foo: String = ???, what happens to
A.scala?
You may think that, Metals keep using the old compilation
unit that shows B.foo as Int instead of String because Metals
invalidates the compilation cache based on its content hash.

That's not true, alreadyCompiled("A.scala", <content hash>) will be
false, and Metals will run driver.run on A.scala and show String
for B.foo in A.scala.

Why? Because Metals will recreate a new MetalsDriver and lastCompiled
cache will be cleared up.

This is basically the same way to cache compilation as Scala2
See my explanation on how Scala2 + Metals compilation cache works
https://contributors.scala-lang.org/t/how-does-scala2-compiler-cache-invalidate-the-typechecking-result-for-the-given-tree/5858/4?u=tanishiking

@tanishiking tanishiking force-pushed the driver-run-less-2 branch 2 times, most recently from f63f22c to d4b80cf Compare August 6, 2022 10:31
override val settings: List[String]
) extends InteractiveDriver(settings):

private val lastCompiled: TrieMap[URI, String] = TrieMap.empty
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which is better

  • Storing content as Array[Char] and compare them by Array.equals(...) vs
  • Hash content by md5 (or something) and compare
    • md5 consumes much less memory, but it requires encryption each time
    • but lastCompiled will be short-lived ones (as it will be expired when users modify a file), and memory footprint may not matter.

Copy link
Contributor

@tgodzik tgodzik Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rely on didChange/didOpen notifications and force recompilation if the method gets invoked? We wouldn't need to compute anything in that case. so should be faster.

We could have a set with compiled file paths, remove paths when onChange and onClose notifications are run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rely on didChange/didOpen notifications and force recompilation if the method gets invoked?

Yeah, I think we can do that!
However, I prefer not to rely on those didChange/didOpen events because

  • hash comparison (or Array.equals) is already pretty fast
    • I commented about MD5 vs Array.equals, but it's about micro-optimization. TBH, either is practically fine (I was just curious) 😅
  • Cache invalidation by those events makes the source code more complicated, and even harder to troubleshoot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure, I was just worried that maybe the hash comparison was problematic 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanishiking you could use TimerProvider to measure how much time computing hash takes (but maybe log in on debug channel?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better at least to attach didClose notifications to cleanup cache.
Hypothetically, if you have a project in broken state for a long time it might took some memory ) all units will be in memory until compiler reload, which happens only if bsp compilation pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End up comparing content Array[Char] by ju.Arrays.equals as Scala2 does. Now MetalsDriver compare with the previous one by

  private def alreadyCompiled(uri: URI, content: Array[Char]): Boolean =
    compilationUnits.get(uri) match
      case Some(unit) if ju.Arrays.equals(unit.source.content(), content) =>
        true
      case _ => false

I think this is better because, now we can force re-compile if the driver somehow deleted the compilation unit. (The previous implementation may break the consistency between cache (in MetalsDriver and compilationUnits).


@dos65

happens only if bsp compilation pass

I don't think so, presentation compiler instance will be recreated even if BSP compilation fails

if (report.getErrors > 0) {
buildTargetPCFromCache(report.getTarget).foreach(_.restart())
} else {

@tanishiking tanishiking changed the title [WIP] Don't run InteractiveDriver.run if the content didn't "change" Don't run InteractiveDriver.run if the content didn't "change" Aug 6, 2022
@tanishiking tanishiking marked this pull request as ready for review August 6, 2022 11:55
Comment on lines 27 to 33
override def run(uri: URI, source: SourceFile): List[Diagnostic] =
val content = source.content.mkString
val contentHash = MD5.compute(content)
if alreadyCompiled(uri, contentHash) then Nil
else
lastCompiled.put(uri, contentHash)
super.run(uri, source)
Copy link
Member

@dos65 dos65 Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also override def currentCtx?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use the parent's currentCtx implementation (so I think we don't need to override currentCtx), what do you think about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will work incorrectly in the following case:

  • call pc for A.scala (compilation happens) - driver.currentCtx refers to the ctx from A.scala
  • call pc for B.scala (compilation happens) - driver.currentCtx refers to the ctx from B.scala
  • switch back on A.scala without changes, call pc - driver.currentCtx refers to the ctx from B.scala

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah, that should be a problem 👍 (maybe run should update the currentCtx anyway even MetalsDriver doesn't run compile).

I tried to produce some problems, but couldn't. Did you find any problem with the lack of currentCtx implementation?
I realized driver.currentCtx.source always returns <no file> and driver.currentCtx.source.tpdTree returns EmptyTree. Also, I skimmed through the InteractiveDriver and Run, but it looks like no one assigns the given source to the currentCtx. Maybe currentCtx doesn't refer to any specific files?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's being used in InferredTypeProvider maybe you can check inserting type if it works properly?

Copy link
Member Author

@tanishiking tanishiking Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to override currentCtx so that it returns the context connected to the last runed compilation unit

+  private val contextCache: TrieMap[URI, Context] = TrieMap.empty
+  private var lastCompiledURI: Option[URI] = None
+
   private def alreadyCompiled(uri: URI, content: Array[Char]): Boolean =
     compilationUnits.get(uri) match
       case Some(unit) if ju.Arrays.equals(unit.source.content(), content) =>
         true
       case _ => false
 
+  override def currentCtx: Context =
+    (for
+      uri <-
+        // lastCompiledURI is initialized as null (!?), it might be a bug in scala3
+        if lastCompiledURI != null then lastCompiledURI
+        else None
+      ctx <- contextCache.get(uri)
+    yield ctx) match
+      case Some(ctx) => ctx
+      case None => super.currentCtx
+
   override def run(uri: URI, source: SourceFile): List[Diagnostic] =
+    lastCompiledURI = Some(uri)
     if alreadyCompiled(uri, source.content) then Nil
-    else super.run(uri, source)
+    else
+      val diags = super.run(uri, source)
+      contextCache.put(uri, currentCtx)
+      diags
 
   override def run(uri: URI, sourceCode: String): List[Diagnostic] =
-    val contentHash = MD5.compute(sourceCode)
+    lastCompiledURI = Some(uri)
     if alreadyCompiled(uri, sourceCode.toCharArray()) then Nil
-    else super.run(uri, sourceCode)
+    else
+      val diags = super.run(uri, sourceCode)
+      contextCache.put(uri, currentCtx)
+      diags

This implementation solved the wrong context issue, but something funky started happening, for example when I run code completion. It looks like, under the cached context (whose runId = 3) the symbol will be invalid in the new run (where runId = 4).

SEVERE: assertion failed: denotation module class C$ invalid in run 3. ValidFor: Period(1..5, run = 4)
java.lang.AssertionError: assertion failed: denotation module class C$ invalid in run 3. ValidFor: Period(1..5, run = 4)
        at scala.runtime.Scala3RunTime$.assertFailed(Scala3RunTime.scala:8)
        at dotty.tools.dotc.core.Denotations$SingleDenotation.updateValidity(Denotations.scala:722)
        at dotty.tools.dotc.core.Denotations$SingleDenotation.bringForward(Denotations.scala:747)
        at dotty.tools.dotc.core.Denotations$SingleDenotation.toNewRun$1(Denotations.scala:806)
        at dotty.tools.dotc.core.Denotations$SingleDenotation.current(Denotations.scala:880)
        at dotty.tools.dotc.core.Symbols$Symbol.recomputeDenot(Symbols.scala:120)
        at dotty.tools.dotc.core.Symbols$Symbol.computeDenot(Symbols.scala:114)
        at dotty.tools.dotc.core.Symbols$Symbol.denot(Symbols.scala:107)
        at dotty.tools.dotc.core.Symbols$.toDenot(Symbols.scala:497)
        at dotty.tools.dotc.core.Denotations$SingleDenotation.updateValidity(Denotations.scala:721)
        at dotty.tools.dotc.core.Denotations$SingleDenotation.bringForward(Denotations.scala:747)
        at dotty.tools.dotc.core.Denotations$SingleDenotation.toNewRun$1(Denotations.scala:806)
        at dotty.tools.dotc.core.Denotations$SingleDenotation.current(Denotations.scala:880)
        at dotty.tools.dotc.core.Types$NamedType.computeDenot(Types.scala:2256)
        at dotty.tools.dotc.core.Types$NamedType.denot(Types.scala:2216)
        at dotty.tools.dotc.ast.Trees$DenotingTree.denot(Trees.scala:256)
        at dotty.tools.dotc.ast.Trees$Tree.symbol(Trees.scala:145)
        at scala.meta.internal.pc.MetalsInteractive$.contextOfPath(MetalsInteractive.scala:56)
        at scala.meta.internal.pc.MetalsInteractive$.contextOfPath(MetalsInteractive.scala:34)
        at scala.meta.internal.pc.MetalsInteractive$.contextOfPath(MetalsInteractive.scala:34)
        at scala.meta.internal.pc.completions.CompletionsProvider.completions(CompletionsProvider.scala:59)

I'm considering, maybe we should override currentCtx like

override def currentCtx =
  val ctx0 = super.currentCtx
  ctx0.setCompilationUnit(<last compiled compilation unit for the file>)

Anyway, thank you very much for finding this issue! @dos65

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering, maybe we should override currentCtx

Realized it doesn't work, we have to set compilation units to the ctx.run but it's not accessible outside of Run.
Hmmm, maybe we should do this in compiler as Scala2 does 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that there might be some issues with periods. Unfortunately, I don't know how to avoid them. I'm not fully understand this thing.

As an alternative approach (at least for a while) we can make a cache only for the last source.
Then if there will be a sequential hover calls for the same source without changes - don't run the driver and in this case currentCtx will be valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there will be a sequential hover calls for the same source without changes

Ohhh, that's a good idea 👍

Copy link
Member Author

@tanishiking tanishiking Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dos65 I implemented like

if there will be a sequential hover calls for the same source without changes

and looks like the problem has gone 🎉

If the content didn't change since the last `driver.run`, do not compile
because the compilation unit should have the up to date typed tree.
Skip compilation save up much of CPU usage when users are reading
source code (and hover on symbols) without modifying the source code.

What happens if we change in a file affect another file's type
information?

For example, when we have the following source files, if we hover on
`B.foo` in `A.scala`, Metals shows the symbol's type is `Int`.

```scala
// A.scala
def main = println(B.foo)

// B.scala
object B:
  def foo: Int = ???
```

When we modify `B.scala` to `def foo: String = ???`, what happens to
`A.scala`?
You may think that, Metals keep using the old compilation
unit that shows `B.foo` as `Int` instead of `String` because Metals
invalidates the compilation cache based on its content hash.

That's not true, `alreadyCompiled("A.scala", <content hash>)` will be
`false`, and Metals will run `driver.run` on `A.scala` and show `String`
for `B.foo` in `A.scala`.

Why? Because Metals will recreate a new `MetalsDriver` and `lastCompiled`
cache will be cleared up.

- When a file has changed, Metals call [Compilers.restart](https://github.com/scalameta/metals/blob/c6e0f30febeff67e970e463d79862e27e45ada3f/metals/src/main/scala/scala/meta/internal/metals/Compilers.scala#L247).
- The restart method calls `shutdownCurrentCompiler` that [set `_compiler` to `null`.](https://github.com/scalameta/metals/blob/c6e0f30febeff67e970e463d79862e27e45ada3f/mtags/src/main/scala/scala/meta/internal/pc/CompilerAccess.scala#L57-L74).

This is basically the same way to cache compilation as Scala2
See my explanation on how Scala2 + Metals compilation cache works
https://contributors.scala-lang.org/t/how-does-scala2-compiler-cache-invalidate-the-typechecking-result-for-the-given-tree/5858/4?u=tanishiking
…us run

Now, MetalsDriver skips running compilation if
- the target URI of `run` is the same as the previous target URI
- the content didn't change since the last compilation.

This compilation cache enables Metals to skip compilation and re-use
the typed tree under the situation like developers
sequentially hover on the symbols in the same file without any changes.

Note: we decided to cache only if the target URI only if the same as the previous run
because of `InteractiveDriver.currentCtx` that should return the context that
refers to the last compiled source file.
It would be ideal if we could update currentCtx even when we skip the compilation,
but we struggled to do that. See the discussion
scalameta#4225 (comment)

To avoid the complexity related to currentCtx,
we decided to cache only when the target URI only if the same as the previous run.
@tanishiking
Copy link
Member Author

I think it's now ready for review 👍

Copy link
Member

@dos65 dos65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tanishiking tanishiking merged commit cb753a2 into scalameta:main Aug 19, 2022
@tanishiking tanishiking deleted the driver-run-less-2 branch August 19, 2022 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants