New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype of SqliteSymbolIndex #94
Conversation
for { | ||
document <- documentIndex.getDocument(path.toNIO.toUri) | ||
_ = logger.info(s"Found document for $path") | ||
_ <- isFreshSemanticdb(path, document) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is odd. The naming convension says it should return a Boolean, the doc comment says it returns a bool, yet it returns Option[Unit] and performs side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this should have a different name but it's unrelated to this PR, this is copy-paste from my code ;)
} | ||
|
||
/** | ||
* Returns false this this document is stale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like the correct signature, but it actually returns Option[Unit]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR.
): Option[Unit] = { | ||
val ok = Option(()) | ||
val s = buffers.read(path) | ||
if (s == document.contents) ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an expensive check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR. This is not on a critical path so it's not a problem, we should be doing edit distance here
def findSymbol(path: AbsolutePath, line: Int, column: Int): Option[Symbol] = { | ||
??? | ||
conn match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conn.map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woot
Memory usage is tiny too, big win.
Since we're still experimenting I think it's great to have keep both in-memory and sql indices. It makes it easy to add a third implementation if we get a new idea. Once we find something more stable that we like a lot then we should consider settling on that approach and remove competing implementations.
@@ -11,7 +12,9 @@ object Main extends LazyLogging { | |||
def main(args: Array[String]): Unit = { | |||
// FIXME(gabro): this is vscode specific (at least the name) | |||
val workspace = System.getProperty("vscode.workspace") | |||
val logPath = s"$workspace/target/metaserver.log" | |||
val logDir = new File(s"$workspace/.metaserver") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/java.io/java.nio.file/
for { | ||
document <- documentIndex.getDocument(path.toNIO.toUri) | ||
_ = logger.info(s"Found document for $path") | ||
_ <- isFreshSemanticdb(path, document) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this should have a different name but it's unrelated to this PR, this is copy-paste from my code ;)
): Option[Unit] = { | ||
val ok = Option(()) | ||
val s = buffers.read(path) | ||
if (s == document.contents) ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR. This is not on a critical path so it's not a problem, we should be doing edit distance here
} | ||
|
||
/** | ||
* Returns false this this document is stale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR.
) | ||
assertNoDiff(reconstructedDatabase.syntax, originalDatabase.syntax) | ||
??? | ||
// TODO: No idea how important this test is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be turned into a test against in-memory db.
case (Some(symbolIdStmt), Some(definitionDataStmt)) => | ||
// TODO: Take into account symbol.definitionAlternative. | ||
symbolIdStmt.setString(1, symbol.toString) | ||
val symbolIdRs = symbolIdStmt.executeQuery() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For longer-term maintainability, I think we should consider using a higher-level API like Quill http://getquill.io/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can vouch for quill: extremely pleasant to work with! That said, I would make sure it works reliably with sqlite, which is often a bit "strange" compared to other dbs. I've had problems with slick and sqlite too in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more vote for quill, it has been my favorite db lib to use so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fwbrasil Can you comment on Quill's compatibility with SQLite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quill has the dialect implementation for SQLite but I've seen a complaint on Twitter that it has bugs. I guess not many people use Quill with SQLite since there are no open bugs for it. I can review the dialect later this week if you're planning to use it.
Another alternative is H2. It has better performance, can be used in a client-server setup if necessary, and at least supports standard SQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no personal attachment nor preference for sqlite over h2. It sounds like we should consider using h2, could you check how h2 impacts performance/space @xeno-by ?
@fwbrasil Perhaps unrelated, but maybe you have some insights on a problem we're facing: how do we compose indices from different sources? I imagine this could impact the choice of a db, but I'm quite a noob in this space
- We could store everything in a single database per user, but that would mean concurrent writes since indexing independent libraries can be done in parallel.
- We could store everything in a single database per workspace, but that could mean a lot of wasted disk space for machines with many projects. I personally have 587 projects in my
~/dev
directory, so this is a no-go for me. - We could separate each index by its source (
library1.jar.db
), and for each configuration we open connections to each database per entry in the classpath. I'm not sure how practical this approach is for large classpaths or evolving projects with moving modules.
I'm personally leaning towards option 1 if we can take advantage of concurrent writes. I had previously outruled this option because sqlite doesn't seem to support concurrent writes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both H2 and HSQLDB embedded dbs support multiple writers using mvcc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know about sqlite's WAL, nice find!
Reading up a bit, it seems we should also consider performance of deletions. We will need to regularly clean up semanticdb indices from stale snapshots of source files. It would be good to investigate difference between h2/sqlite with regards to deletions @xeno-by
def definitionData(symbol: Symbol): Option[SymbolData] | ||
def referencesData(symbol: Symbol): List[SymbolData] | ||
def indexDependencyClasspath(sourceJars: List[AbsolutePath]): Effects.IndexSourcesClasspath | ||
def indexDatabase(document: s.Database): Effects.IndexSemanticdb | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olafurpg What do you think about turning SymbolIndex into a small trait with a curated list of methods? If you're on board with that, I can submit a pull request with just this change right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, that's a great first step
Thank you, everyone, for your insightful comments! We'll be reworking this pull request according to the comments, and in the meanwhile, I've spun off a non-controversial commit into a separate PR: #132. |
This is a prototype implementation of SymbolIndex that works on top of SQLite databases created by https://github.com/xeno-by/scalameta/tree/topic/language-server. The code will need significant work before becoming merge-worthy, but nonetheless I wanted to share it with you guys asap to discuss associated design decisions.
The first couple commits tune the existing infrastructure to work well on huge codebases that don't necessarily conform to the SBT project layout. Nothing much to say here - the diffs should be self-explanatory.
The remaining commits factor out the SymbolIndex contract, move the existing SymbolIndex into InMemorySymbolIndex and establish SqliteSymbolIndex. SqliteSymbolIndex hasn't yet reached feature-parity with InMemorySymbolIndex, but even in its current form it is confirmed to provide instantaneous responses to textDocument/definition and textDocument/references requests with negligible memory requirements even on huge codebases.