Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache dependencies #337

Merged
merged 27 commits into from
Aug 13, 2023
Merged

Cache dependencies #337

merged 27 commits into from
Aug 13, 2023

Conversation

daplf
Copy link
Contributor

@daplf daplf commented Mar 17, 2022

Related to #328

Introduces a mechanism to cache the project's dependencies.

This improves startup performance a bit, since we no longer need to run maven/gradle tasks every time we open a project.

Every time we open a project, we now check to see if the build file has changed since the last time it was opened (i.e., the last time we updated the cache). If no changes were done, we use the cache. Otherwise, we just fetch the dependencies again. We can obviously make this more robust, but that can come later.

The caching mechanism can later be used for other things in the server as well (such as caching the IR and the symbol index).

The caching is optional at the moment. It requires sending a storagePath in the initializationOptions on the initialize request, like so:

{
   ...
   "initializationOptions": {
      "storagePath": "/some/path"
   }
   ...
}

I created a PR to update the vscode extension so it sends this as well: fwcd/vscode-kotlin#86

I used kotlinx.serialization to serialize the data. I used JSON since it's the only stable format currently supported (all the others are experimental). JSON isn't ideal, so we should change this at some point, but for dependencies it's certainly fine (it's not much data). After some discussion in the PR, we decided to go for a DB based approach.

This is the first step towards improving startup performance. In the future, we should also cache other things as mentioned above. The symbol index is especially relevant, since it takes quite a while to build.

@daplf daplf changed the title Dependency caching Cache dependencies Mar 17, 2022
@daplf daplf marked this pull request as ready for review March 17, 2022 00:53
@fwcd fwcd added enhancement New feature or request dependency resolution Related to the project dependency/standard library resolver labels Mar 17, 2022
Copy link
Owner

@fwcd fwcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this, pretty cool! I'll have to take a more detailed look at this PR (and the VSCode side), just a few comments after glancing over it.

Comment on lines 12 to 28
fun parseServerConfiguration(params: InitializeParams): ServerConfiguration? {
val gson = GsonBuilder().registerTypeHierarchyAdapter(Path::class.java, GsonPathConverter()).create()

var storage: Storage? = null

if (params.initializationOptions != null) {
val options = gson.fromJson(params.initializationOptions as JsonElement, InitializationOptions::class.java)

if (options.storagePath != null) {
if (Files.exists(options.storagePath) && Files.isDirectory(options.storagePath)) {
storage = Storage(options.storagePath)
}
}
}

return ServerConfiguration(storage)
}
Copy link
Owner

@fwcd fwcd Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand, does LSP4J pull in the Gson dependency? If that's the case, perhaps we could just use Gson everywhere for consistency? Or is kotlinx.serialization more elegant because its API is designed for Kotlin (I'd be fine both ways, just curious)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, LSP4J uses Gson for some things.

I chose kotlinx.serialization because it's an official library, so I'd expect it to be quite stable and designed for Kotlin, as you said. I was also experimenting with some of the non stable serialization formats at first and we could potentially move to those later on with little code changes I think. But it does look a bit overkill right now, since we're using json. I can change it if you prefer.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using kotlinx.serialization sounds good (I'm always a bit careful about introducing new dependencies, but this seems useful enough to have it around in any case).

private val projectDirectory: Path get() = path.getParent()
private val projectDirectory: Path get() = path.parent

private var cachedClassPath: Set<ClassPathEntry>? = storage?.getObject("gradleClassPath")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea, not sure how feasible that would be, could we have a uniform class path file that caches the entire class path (independent of whether Maven or Gradle was originally used to resolve it) and implement that as a new ClassPathResolver that wraps the existing ones (e.g. something along the lines of CachingClassPathResolver)? That way we wouldn't have to include caching in every resolver implementation separately. Feel free to correct me if I'm missing anything though. Perhaps detecting changes would become quite a bit harder that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very nice idea. I'll give it a try 👍

shared/src/main/kotlin/org/javacs/kt/storage/Storage.kt Outdated Show resolved Hide resolved
shared/build.gradle Outdated Show resolved Hide resolved
server/src/main/kotlin/org/javacs/kt/config/Config.kt Outdated Show resolved Hide resolved
Comment on lines 12 to 28
fun parseServerConfiguration(params: InitializeParams): ServerConfiguration? {
val gson = GsonBuilder().registerTypeHierarchyAdapter(Path::class.java, GsonPathConverter()).create()

var storage: Storage? = null

if (params.initializationOptions != null) {
val options = gson.fromJson(params.initializationOptions as JsonElement, InitializationOptions::class.java)

if (options.storagePath != null) {
if (Files.exists(options.storagePath) && Files.isDirectory(options.storagePath)) {
storage = Storage(options.storagePath)
}
}
}

return ServerConfiguration(storage)
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using kotlinx.serialization sounds good (I'm always a bit careful about introducing new dependencies, but this seems useful enough to have it around in any case).

daplf and others added 5 commits April 3, 2022 17:12
Co-authored-by: FW <30873659+fwcd@users.noreply.github.com>
…er.kt

Co-authored-by: FW <30873659+fwcd@users.noreply.github.com>
Co-authored-by: FW <30873659+fwcd@users.noreply.github.com>

data class InitializationOptions(val storagePath: Path?)

data class ServerConfiguration(val storage: Storage?)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we merge this configuration with org.javacs.kt.Configuration or perhaps just place them in the same file (under the .config subpackage is fine)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about that class. I moved everything there now.

server/src/test/kotlin/org/javacs/kt/ClassPathTest.kt Outdated Show resolved Hide resolved
Comment on lines 31 to 53
inline fun <reified T> getObject(objectPath: String, deserializer: KSerializer<T> = Json.serializersModule.serializer()): T? {
val fullObjectPath = Paths.get(path.toString(), objectPath)

if (Files.exists(fullObjectPath)) {
val content = fullObjectPath.toFile().readText()
try {
return Json.decodeFromString(deserializer, content)
} catch (ex: Exception) {
// We catch any exception just in case something unexpected happens. We return null in that case and delete the file.
LOG.printStackTrace(ex)
Files.deleteIfExists(fullObjectPath)
}
}

return null
}

inline fun <reified T> setObject(objectPath: String, value: T, serializer: KSerializer<T> = Json.serializersModule.serializer()) {
val fullObjectPath = Paths.get(path.toString(), objectPath)

if (!Files.exists(fullObjectPath)) {
Files.createFile(fullObjectPath)
}
Copy link
Owner

@fwcd fwcd Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea, could we use Kotlin's subscript overloading here, i.e. make getObject and setObject operator funs (see the docs)? Not sure whether that works with the optional serializer parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this would be really cool, but the optional serializer parameters kind of ruin it unfortunately. We also have the storage as a nullable property everywhere, so I don't think we'd get that much out of it even then (as far as I know, the subscript operator doesn't work well with the elvis operator for example).

Copy link
Owner

@fwcd fwcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a minor note, could you look into fixing the conflicts?

Writing #352 got me thinking, perhaps we could use a single unified on-disk (per-project) database (e.g. an SQLite DB) to hold both cached dependencies and the entire symbol index, instead of serialized JSON documents? This would both reduce the memory footprint of the language server and startup time in existing projects (we would just need to make sure to reindex whenever the index isn't up to date).

shared/build.gradle Outdated Show resolved Hide resolved
daplf and others added 2 commits April 23, 2022 21:52
Co-authored-by: FW <30873659+fwcd@users.noreply.github.com>
@daplf
Copy link
Contributor Author

daplf commented Apr 23, 2022

@fwcd conflicts fixed. For some reason, one of the CI builds failed, but I can't reproduce it locally (tests are green).

Writing #352 got me thinking, perhaps we could use a single unified on-disk (per-project) database (e.g. an SQLite DB) to hold both cached dependencies and the entire symbol index, instead of serialized JSON documents? This would both reduce the memory footprint of the language server and startup time in existing projects (we would just need to make sure to reindex whenever the index isn't up to date).

This is actually what I had in mind for the symbol index. I was thinking of keeping H2 since it can be stored on disk as well, but any DB will do (and I don't have any strong feelings against any of them tbh). Regarding the other things, I didn't consider using a DB for them, but I suppose it would work as well. I think I went with this approach because it seemed simple enough, but we can consider using a DB. I also thought about caching the symbols of certain dependencies globally (like the kotlin stdlib), since they are used in all projects.

Another aspect that we should probably look into is caching the source files' IR. Right now, when we start the server, the entire project is compiled, which can take a while since invoking the compiler on all files is not that fast. We could store the IR of the files (i.e. the SourcePath.files) to avoid this.

@daplf
Copy link
Contributor Author

daplf commented Apr 9, 2023

Hey @fwcd It's been a while 😅

I finally got back into this and refactored the PR to cache dependencies using a DB.

When a storage path is provided, we now setup a global SQLite DB for the entire server. The DB is passed to the cached class path resolver and the symbol index. Both make use of it to store their stuff. For now, we are only caching dependencies, but the ground work is done for the symbol index to be cached as well (in a future PR).

When a storage path is not provided, the symbol index just creates its own in-memory H2 DB as it did before.

Let me know what you think. I'm hoping this improves things.

Copy link
Owner

@fwcd fwcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, I like the approach and am excited to see this moving forward

@@ -89,6 +90,11 @@ class KotlinLanguageServer : LanguageServer, LanguageClientAware, Closeable {
serverCapabilities.executeCommandProvider = ExecuteCommandOptions(ALL_COMMANDS)
serverCapabilities.documentHighlightProvider = Either.forLeft(true)

val db = setupServerDatabase(params)

sourcePath.index = if (db != null) SymbolIndex(db) else SymbolIndex()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sourcePath.index = if (db != null) SymbolIndex(db) else SymbolIndex()
sourcePath.index = db?.let { SymbolIndex(it) } ?: SymbolIndex()

Alternatively...

Comment on lines 83 to 89
class SymbolIndex(
private val db: Database = Database.connect("jdbc:h2:mem:symbolindex;DB_CLOSE_DELAY=-1", "org.h2.Driver")
) {
var progressFactory: Progress.Factory = Progress.Factory.None

init {
transaction(db) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...we could do something along the lines of

Suggested change
class SymbolIndex(
private val db: Database = Database.connect("jdbc:h2:mem:symbolindex;DB_CLOSE_DELAY=-1", "org.h2.Driver")
) {
var progressFactory: Progress.Factory = Progress.Factory.None
init {
transaction(db) {
class SymbolIndex(db: Database? = null) {
private val db: Database
var progressFactory: Progress.Factory = Progress.Factory.None
init {
this.db = db ?: Database.connect("jdbc:h2:mem:symbolindex;DB_CLOSE_DELAY=-1", "org.h2.Driver")
transaction(db) {

That way we could pass the optional db in directly. Though I don't have a strong opinion on this.

@@ -89,6 +90,11 @@ class KotlinLanguageServer : LanguageServer, LanguageClientAware, Closeable {
serverCapabilities.executeCommandProvider = ExecuteCommandOptions(ALL_COMMANDS)
serverCapabilities.documentHighlightProvider = Either.forLeft(true)

val db = setupServerDatabase(params)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we need the database initialization to happen after the initialize, since the client supplies the storagePath, right? It would be neat if we wouldn't need to wire up the database lazily and could pass it using the usual constructor injection pattern, but I don't really see a better way to do this either...

Comment on lines +104 to +106
SchemaUtils.createMissingTablesAndColumns(
ClassPathMetadataCache, ClassPathCacheEntry, BuildScriptClassPathCacheEntry
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for us to evolve the schema once this ships? Since we only use the db as a cache, perhaps we could just recreate all the tables if something mismatches with our current schema?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's more or less what I had in mind. SchemaUtils.createMissingTablesAndColumns is specifically designed to be a lightweight easy-to-use way to update the schema. For things that can't be handled with this, I think recreating everything is the way to go. I thought about using something like Flyway at some point, but that felt overkill for this purpose. Worst case scenario, the symbol index (and any other cached data) will need to be rebuilt when we make a schema change (but I don't see this happening very often, especially since only a few changes would actually require that).

Comment on lines 52 to 67
fun setupServerDatabase(params: InitializeParams): Database? {
val dbName = "kls_database"

params.initializationOptions?.let { initializationOptions ->
val gson = GsonBuilder().registerTypeHierarchyAdapter(Path::class.java, GsonPathConverter()).create()
val options = gson.fromJson(initializationOptions as JsonElement, InitializationOptions::class.java)

options.storagePath?.let { storagePath ->
if (Files.isDirectory(storagePath)) {
return Database.connect("jdbc:sqlite:${Path.of(storagePath.toString(), dbName)}.db")
}
}
}

return null
}
Copy link
Owner

@fwcd fwcd Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should be moved somewhere other than Configuration, ideally where the Database connection lives. Thinking about this, perhaps we'd want to create a wrapper class around the Database that deals with this lazy setup internally? That way, we could pass the wrapper directly in the constructor to CompilerClassPath and SourcePath, wouldn't need the lateinit and could also make it db private in CompilerClassPath (fixing #337 (comment)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe elaborate on what the wrapper class would look like? Is something like this what you had in mind?

class DatabaseService {

    var db: Database? = null
        private set

    fun setupDatabase(storagePath: Path?) {
        val dbName = "kls_database"

        db = if (Files.isDirectory(storagePath)) {
            Database.connect("jdbc:sqlite:${Path.of(storagePath.toString(), dbName)}.db")
        } else {
            null
        }
    }
}

And then CompilerClassPath and SourcePath would receive the DatabaseService. And when they need to use the db they would just call databaseService.db.

databaseService.setupDatabase(storagePath) would be called on initialize.

Copy link
Owner

@fwcd fwcd Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's pretty close to what I had in mind. I'd shorten the function name to setup, since the wrapper should already make it clear that we are setting up the database, but otherwise that looks good 👍

daplf and others added 2 commits April 14, 2023 19:46
…hResolver.kt

Co-authored-by: FW <30873659+fwcd@users.noreply.github.com>
@daplf daplf requested a review from fwcd April 14, 2023 19:02
Copy link
Collaborator

@themkat themkat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 🙂

Did not get the memory usage improvement I was expecting though, but that might be some issues local to my machine or setup. (just simple monitoring with VisualVM). Experienced less memory usage with the ondisk cache, just not the extreme improvement my mind expected.... Maybe the win is bigger for even bigger projects? Will have to look further into this 🙂 Not trying to be a negative nancy in any way, this is still cool and good work 😄

One minor suggestion: If we move closer to merging this, could we document the usage somewhere? Just that we can take in a storagePath option in the initializationOptions, and that it expects to be a directory. (I had a brainfart where I tried to use a path to a sqllite file directly). Just think having it mentioned somewhere (readme, other doc etc.) would make it easier for people implementing the support for this new feature into various clients 🙂

val gson = GsonBuilder().registerTypeHierarchyAdapter(Path::class.java, GsonPathConverter()).create()
val options = gson.fromJson(initializationOptions as JsonElement, InitializationOptions::class.java)

return options.storagePath
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimented with Emacs support, and got an error here. If we sent in empty initializationOptions, then options becomes null here and I get a nullpointer exception. Maybe a simple safe-call operator could be handy here? 🙂 options?.storagePath solves all such issues.

The main reason to do this is that some LSP client libraries might have the option to either fill in initialisation options or not. To support the new on disk cache, as well as the old in-memory solution, it might be necessary to send in an empty initializationOptions. I guess we will support switching back and forth for a while anyway.

async0    Internal error: java.lang.NullPointerException: Cannot invoke "org.javacs.kt.InitializationOptions.getStoragePath()" because "options" is null
java.util.concurrent.CompletionException: java.lang.NullPointerException: Cannot invoke "org.javacs.kt.InitializationOptions.getStoragePath()" because "options" is null
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1770)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: java.lang.NullPointerException: Cannot invoke "org.javacs.kt.InitializationOptions.getStoragePath()" because "options" is null
	at org.javacs.kt.ConfigurationKt.getStoragePath(Configuration.kt:55)
	at org.javacs.kt.KotlinLanguageServer$initialize$1.invoke(KotlinLanguageServer.kt:95)
	at org.javacs.kt.KotlinLanguageServer$initialize$1.invoke(KotlinLanguageServer.kt:73)
	at org.javacs.kt.util.AsyncExecutor.compute$lambda$2(AsyncExecutor.kt:19)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
	... 3 more

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Fixed

@RenFraser
Copy link
Contributor

I'm also very keen for this as my startup time for the server is 20+ seconds :)

@daplf
Copy link
Contributor Author

daplf commented May 3, 2023

One minor suggestion: If we move closer to merging this, could we document the usage somewhere? Just that we can take in a storagePath option in the initializationOptions, and that it expects to be a directory. (I had a brainfart where I tried to use a path to a sqllite file directly). Just think having it mentioned somewhere (readme, other doc etc.) would make it easier for people implementing the support for this new feature into various clients 🙂

Thanks for the suggestion. Updated the README with a reference to the initialization options, similar to what we have for the protocol extensions.

Did not get the memory usage improvement I was expecting though, but that might be some issues local to my machine or setup. (just simple monitoring with VisualVM). Experienced less memory usage with the ondisk cache, just not the extreme improvement my mind expected.... Maybe the win is bigger for even bigger projects?

Tbh, I didn't really look into the impact on memory usage of this, since my focus was caching (especially preparing a foundation for the symbol index to be cached as well, which could be pretty helpful IMO). Memory usage is something we can (and should) definitely look at, but I personally find it out of scope for this PR. But thanks for the feedback!

@weilbith
Copy link

weilbith commented Jun 3, 2023

Sorry, I just want to express my hopes for this PR to get merged. Looking forward to see the improvements. The Kotlin language server is the by far most slow one among all I’m using. Would be great to have this resolved. 😊

@daplf
Copy link
Contributor Author

daplf commented Jun 7, 2023

Hey @fwcd
Any chance you can have another look at this soon? Thanks 🙂

@themkat
Copy link
Collaborator

themkat commented Jun 21, 2023

bump @fwcd 🙂 Any chance to have a second look here? Think it is best that you do a second look before merging. Think trying this out can prove quite useful. Might be our way to make the language server more effective, less memory usage, faster startup times etc. 🙂

@daplf
Copy link
Contributor Author

daplf commented Aug 9, 2023

Hey again @fwcd

Any chance we can move this forward? :)

@fwcd
Copy link
Owner

fwcd commented Aug 13, 2023

Sorry for the long delay here. Given that there's a lot of interest in this PR, I would say that we merge it, so everyone can play with it, and then iron out any smaller bugs to the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependency resolution Related to the project dependency/standard library resolver enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants