-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache dependencies #337
Cache dependencies #337
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this, pretty cool! I'll have to take a more detailed look at this PR (and the VSCode side), just a few comments after glancing over it.
fun parseServerConfiguration(params: InitializeParams): ServerConfiguration? { | ||
val gson = GsonBuilder().registerTypeHierarchyAdapter(Path::class.java, GsonPathConverter()).create() | ||
|
||
var storage: Storage? = null | ||
|
||
if (params.initializationOptions != null) { | ||
val options = gson.fromJson(params.initializationOptions as JsonElement, InitializationOptions::class.java) | ||
|
||
if (options.storagePath != null) { | ||
if (Files.exists(options.storagePath) && Files.isDirectory(options.storagePath)) { | ||
storage = Storage(options.storagePath) | ||
} | ||
} | ||
} | ||
|
||
return ServerConfiguration(storage) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to understand, does LSP4J pull in the Gson dependency? If that's the case, perhaps we could just use Gson everywhere for consistency? Or is kotlinx.serialization
more elegant because its API is designed for Kotlin (I'd be fine both ways, just curious)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, LSP4J uses Gson for some things.
I chose kotlinx.serialization
because it's an official library, so I'd expect it to be quite stable and designed for Kotlin, as you said. I was also experimenting with some of the non stable serialization formats at first and we could potentially move to those later on with little code changes I think. But it does look a bit overkill right now, since we're using json. I can change it if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using kotlinx.serialization
sounds good (I'm always a bit careful about introducing new dependencies, but this seems useful enough to have it around in any case).
private val projectDirectory: Path get() = path.getParent() | ||
private val projectDirectory: Path get() = path.parent | ||
|
||
private var cachedClassPath: Set<ClassPathEntry>? = storage?.getObject("gradleClassPath") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea, not sure how feasible that would be, could we have a uniform class path file that caches the entire class path (independent of whether Maven or Gradle was originally used to resolve it) and implement that as a new ClassPathResolver
that wraps the existing ones (e.g. something along the lines of CachingClassPathResolver
)? That way we wouldn't have to include caching in every resolver implementation separately. Feel free to correct me if I'm missing anything though. Perhaps detecting changes would become quite a bit harder that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very nice idea. I'll give it a try 👍
shared/src/main/kotlin/org/javacs/kt/classpath/ClassPathResolver.kt
Outdated
Show resolved
Hide resolved
fun parseServerConfiguration(params: InitializeParams): ServerConfiguration? { | ||
val gson = GsonBuilder().registerTypeHierarchyAdapter(Path::class.java, GsonPathConverter()).create() | ||
|
||
var storage: Storage? = null | ||
|
||
if (params.initializationOptions != null) { | ||
val options = gson.fromJson(params.initializationOptions as JsonElement, InitializationOptions::class.java) | ||
|
||
if (options.storagePath != null) { | ||
if (Files.exists(options.storagePath) && Files.isDirectory(options.storagePath)) { | ||
storage = Storage(options.storagePath) | ||
} | ||
} | ||
} | ||
|
||
return ServerConfiguration(storage) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using kotlinx.serialization
sounds good (I'm always a bit careful about introducing new dependencies, but this seems useful enough to have it around in any case).
Co-authored-by: FW <30873659+fwcd@users.noreply.github.com>
…er.kt Co-authored-by: FW <30873659+fwcd@users.noreply.github.com>
Co-authored-by: FW <30873659+fwcd@users.noreply.github.com>
|
||
data class InitializationOptions(val storagePath: Path?) | ||
|
||
data class ServerConfiguration(val storage: Storage?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we merge this configuration with org.javacs.kt.Configuration
or perhaps just place them in the same file (under the .config
subpackage is fine)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot about that class. I moved everything there now.
shared/src/main/kotlin/org/javacs/kt/classpath/DefaultClassPathResolver.kt
Outdated
Show resolved
Hide resolved
inline fun <reified T> getObject(objectPath: String, deserializer: KSerializer<T> = Json.serializersModule.serializer()): T? { | ||
val fullObjectPath = Paths.get(path.toString(), objectPath) | ||
|
||
if (Files.exists(fullObjectPath)) { | ||
val content = fullObjectPath.toFile().readText() | ||
try { | ||
return Json.decodeFromString(deserializer, content) | ||
} catch (ex: Exception) { | ||
// We catch any exception just in case something unexpected happens. We return null in that case and delete the file. | ||
LOG.printStackTrace(ex) | ||
Files.deleteIfExists(fullObjectPath) | ||
} | ||
} | ||
|
||
return null | ||
} | ||
|
||
inline fun <reified T> setObject(objectPath: String, value: T, serializer: KSerializer<T> = Json.serializersModule.serializer()) { | ||
val fullObjectPath = Paths.get(path.toString(), objectPath) | ||
|
||
if (!Files.exists(fullObjectPath)) { | ||
Files.createFile(fullObjectPath) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea, could we use Kotlin's subscript overloading here, i.e. make getObject
and setObject
operator fun
s (see the docs)? Not sure whether that works with the optional serializer parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this would be really cool, but the optional serializer parameters kind of ruin it unfortunately. We also have the storage as a nullable property everywhere, so I don't think we'd get that much out of it even then (as far as I know, the subscript operator doesn't work well with the elvis operator for example).
…hResolver.kt Co-authored-by: FW <30873659+fwcd@users.noreply.github.com>
Co-authored-by: FW <30873659+fwcd@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a minor note, could you look into fixing the conflicts?
Writing #352 got me thinking, perhaps we could use a single unified on-disk (per-project) database (e.g. an SQLite DB) to hold both cached dependencies and the entire symbol index, instead of serialized JSON documents? This would both reduce the memory footprint of the language server and startup time in existing projects (we would just need to make sure to reindex whenever the index isn't up to date).
Co-authored-by: FW <30873659+fwcd@users.noreply.github.com>
@fwcd conflicts fixed. For some reason, one of the CI builds failed, but I can't reproduce it locally (tests are green).
This is actually what I had in mind for the symbol index. I was thinking of keeping H2 since it can be stored on disk as well, but any DB will do (and I don't have any strong feelings against any of them tbh). Regarding the other things, I didn't consider using a DB for them, but I suppose it would work as well. I think I went with this approach because it seemed simple enough, but we can consider using a DB. I also thought about caching the symbols of certain dependencies globally (like the kotlin stdlib), since they are used in all projects. Another aspect that we should probably look into is caching the source files' IR. Right now, when we start the server, the entire project is compiled, which can take a while since invoking the compiler on all files is not that fast. We could store the IR of the files (i.e. the |
Hey @fwcd It's been a while 😅 I finally got back into this and refactored the PR to cache dependencies using a DB. When a storage path is provided, we now setup a global SQLite DB for the entire server. The DB is passed to the cached class path resolver and the symbol index. Both make use of it to store their stuff. For now, we are only caching dependencies, but the ground work is done for the symbol index to be cached as well (in a future PR). When a storage path is not provided, the symbol index just creates its own in-memory H2 DB as it did before. Let me know what you think. I'm hoping this improves things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, I like the approach and am excited to see this moving forward
shared/src/main/kotlin/org/javacs/kt/classpath/DefaultClassPathResolver.kt
Outdated
Show resolved
Hide resolved
@@ -89,6 +90,11 @@ class KotlinLanguageServer : LanguageServer, LanguageClientAware, Closeable { | |||
serverCapabilities.executeCommandProvider = ExecuteCommandOptions(ALL_COMMANDS) | |||
serverCapabilities.documentHighlightProvider = Either.forLeft(true) | |||
|
|||
val db = setupServerDatabase(params) | |||
|
|||
sourcePath.index = if (db != null) SymbolIndex(db) else SymbolIndex() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sourcePath.index = if (db != null) SymbolIndex(db) else SymbolIndex() | |
sourcePath.index = db?.let { SymbolIndex(it) } ?: SymbolIndex() |
Alternatively...
class SymbolIndex( | ||
private val db: Database = Database.connect("jdbc:h2:mem:symbolindex;DB_CLOSE_DELAY=-1", "org.h2.Driver") | ||
) { | ||
var progressFactory: Progress.Factory = Progress.Factory.None | ||
|
||
init { | ||
transaction(db) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...we could do something along the lines of
class SymbolIndex( | |
private val db: Database = Database.connect("jdbc:h2:mem:symbolindex;DB_CLOSE_DELAY=-1", "org.h2.Driver") | |
) { | |
var progressFactory: Progress.Factory = Progress.Factory.None | |
init { | |
transaction(db) { | |
class SymbolIndex(db: Database? = null) { | |
private val db: Database | |
var progressFactory: Progress.Factory = Progress.Factory.None | |
init { | |
this.db = db ?: Database.connect("jdbc:h2:mem:symbolindex;DB_CLOSE_DELAY=-1", "org.h2.Driver") | |
transaction(db) { |
That way we could pass the optional db in directly. Though I don't have a strong opinion on this.
@@ -89,6 +90,11 @@ class KotlinLanguageServer : LanguageServer, LanguageClientAware, Closeable { | |||
serverCapabilities.executeCommandProvider = ExecuteCommandOptions(ALL_COMMANDS) | |||
serverCapabilities.documentHighlightProvider = Either.forLeft(true) | |||
|
|||
val db = setupServerDatabase(params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, we need the database initialization to happen after the initialize
, since the client supplies the storagePath
, right? It would be neat if we wouldn't need to wire up the database lazily and could pass it using the usual constructor injection pattern, but I don't really see a better way to do this either...
SchemaUtils.createMissingTablesAndColumns( | ||
ClassPathMetadataCache, ClassPathCacheEntry, BuildScriptClassPathCacheEntry | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way for us to evolve the schema once this ships? Since we only use the db as a cache, perhaps we could just recreate all the tables if something mismatches with our current schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's more or less what I had in mind. SchemaUtils.createMissingTablesAndColumns
is specifically designed to be a lightweight easy-to-use way to update the schema. For things that can't be handled with this, I think recreating everything is the way to go. I thought about using something like Flyway at some point, but that felt overkill for this purpose. Worst case scenario, the symbol index (and any other cached data) will need to be rebuilt when we make a schema change (but I don't see this happening very often, especially since only a few changes would actually require that).
fun setupServerDatabase(params: InitializeParams): Database? { | ||
val dbName = "kls_database" | ||
|
||
params.initializationOptions?.let { initializationOptions -> | ||
val gson = GsonBuilder().registerTypeHierarchyAdapter(Path::class.java, GsonPathConverter()).create() | ||
val options = gson.fromJson(initializationOptions as JsonElement, InitializationOptions::class.java) | ||
|
||
options.storagePath?.let { storagePath -> | ||
if (Files.isDirectory(storagePath)) { | ||
return Database.connect("jdbc:sqlite:${Path.of(storagePath.toString(), dbName)}.db") | ||
} | ||
} | ||
} | ||
|
||
return null | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be moved somewhere other than Configuration
, ideally where the Database
connection lives. Thinking about this, perhaps we'd want to create a wrapper class around the Database
that deals with this lazy setup internally? That way, we could pass the wrapper directly in the constructor to CompilerClassPath
and SourcePath
, wouldn't need the lateinit
and could also make it db
private in CompilerClassPath
(fixing #337 (comment)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you maybe elaborate on what the wrapper class would look like? Is something like this what you had in mind?
class DatabaseService {
var db: Database? = null
private set
fun setupDatabase(storagePath: Path?) {
val dbName = "kls_database"
db = if (Files.isDirectory(storagePath)) {
Database.connect("jdbc:sqlite:${Path.of(storagePath.toString(), dbName)}.db")
} else {
null
}
}
}
And then CompilerClassPath
and SourcePath
would receive the DatabaseService
. And when they need to use the db they would just call databaseService.db
.
databaseService.setupDatabase(storagePath)
would be called on initialize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's pretty close to what I had in mind. I'd shorten the function name to setup
, since the wrapper should already make it clear that we are setting up the database, but otherwise that looks good 👍
…hResolver.kt Co-authored-by: FW <30873659+fwcd@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! 🙂
Did not get the memory usage improvement I was expecting though, but that might be some issues local to my machine or setup. (just simple monitoring with VisualVM). Experienced less memory usage with the ondisk cache, just not the extreme improvement my mind expected.... Maybe the win is bigger for even bigger projects? Will have to look further into this 🙂 Not trying to be a negative nancy in any way, this is still cool and good work 😄
One minor suggestion: If we move closer to merging this, could we document the usage somewhere? Just that we can take in a storagePath
option in the initializationOptions
, and that it expects to be a directory. (I had a brainfart where I tried to use a path to a sqllite file directly). Just think having it mentioned somewhere (readme, other doc etc.) would make it easier for people implementing the support for this new feature into various clients 🙂
val gson = GsonBuilder().registerTypeHierarchyAdapter(Path::class.java, GsonPathConverter()).create() | ||
val options = gson.fromJson(initializationOptions as JsonElement, InitializationOptions::class.java) | ||
|
||
return options.storagePath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimented with Emacs support, and got an error here. If we sent in empty initializationOptions, then options
becomes null here and I get a nullpointer exception. Maybe a simple safe-call operator could be handy here? 🙂 options?.storagePath
solves all such issues.
The main reason to do this is that some LSP client libraries might have the option to either fill in initialisation options or not. To support the new on disk cache, as well as the old in-memory solution, it might be necessary to send in an empty initializationOptions. I guess we will support switching back and forth for a while anyway.
async0 Internal error: java.lang.NullPointerException: Cannot invoke "org.javacs.kt.InitializationOptions.getStoragePath()" because "options" is null
java.util.concurrent.CompletionException: java.lang.NullPointerException: Cannot invoke "org.javacs.kt.InitializationOptions.getStoragePath()" because "options" is null
at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1770)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: java.lang.NullPointerException: Cannot invoke "org.javacs.kt.InitializationOptions.getStoragePath()" because "options" is null
at org.javacs.kt.ConfigurationKt.getStoragePath(Configuration.kt:55)
at org.javacs.kt.KotlinLanguageServer$initialize$1.invoke(KotlinLanguageServer.kt:95)
at org.javacs.kt.KotlinLanguageServer$initialize$1.invoke(KotlinLanguageServer.kt:73)
at org.javacs.kt.util.AsyncExecutor.compute$lambda$2(AsyncExecutor.kt:19)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
... 3 more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Fixed
I'm also very keen for this as my startup time for the server is 20+ seconds :) |
Thanks for the suggestion. Updated the README with a reference to the initialization options, similar to what we have for the protocol extensions.
Tbh, I didn't really look into the impact on memory usage of this, since my focus was caching (especially preparing a foundation for the symbol index to be cached as well, which could be pretty helpful IMO). Memory usage is something we can (and should) definitely look at, but I personally find it out of scope for this PR. But thanks for the feedback! |
Sorry, I just want to express my hopes for this PR to get merged. Looking forward to see the improvements. The Kotlin language server is the by far most slow one among all I’m using. Would be great to have this resolved. 😊 |
Hey @fwcd |
bump @fwcd 🙂 Any chance to have a second look here? Think it is best that you do a second look before merging. Think trying this out can prove quite useful. Might be our way to make the language server more effective, less memory usage, faster startup times etc. 🙂 |
Hey again @fwcd Any chance we can move this forward? :) |
Sorry for the long delay here. Given that there's a lot of interest in this PR, I would say that we merge it, so everyone can play with it, and then iron out any smaller bugs to the next release. |
Related to #328
Introduces a mechanism to cache the project's dependencies.
This improves startup performance a bit, since we no longer need to run maven/gradle tasks every time we open a project.
Every time we open a project, we now check to see if the build file has changed since the last time it was opened (i.e., the last time we updated the cache). If no changes were done, we use the cache. Otherwise, we just fetch the dependencies again. We can obviously make this more robust, but that can come later.
The caching mechanism can later be used for other things in the server as well (such as caching the IR and the symbol index).
The caching is optional at the moment. It requires sending a
storagePath
in theinitializationOptions
on theinitialize
request, like so:I created a PR to update the vscode extension so it sends this as well: fwcd/vscode-kotlin#86
I usedAfter some discussion in the PR, we decided to go for a DB based approach.kotlinx.serialization
to serialize the data. I used JSON since it's the only stable format currently supported (all the others are experimental). JSON isn't ideal, so we should change this at some point, but for dependencies it's certainly fine (it's not much data).This is the first step towards improving startup performance. In the future, we should also cache other things as mentioned above. The symbol index is especially relevant, since it takes quite a while to build.