New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype of SqliteSymbolIndex #94

Closed
wants to merge 8 commits into
base: master
from

Conversation

5 participants
@xeno-by
Member

xeno-by commented Dec 4, 2017

This is a prototype implementation of SymbolIndex that works on top of SQLite databases created by https://github.com/xeno-by/scalameta/tree/topic/language-server. The code will need significant work before becoming merge-worthy, but nonetheless I wanted to share it with you guys asap to discuss associated design decisions.

The first couple commits tune the existing infrastructure to work well on huge codebases that don't necessarily conform to the SBT project layout. Nothing much to say here - the diffs should be self-explanatory.

The remaining commits factor out the SymbolIndex contract, move the existing SymbolIndex into InMemorySymbolIndex and establish SqliteSymbolIndex. SqliteSymbolIndex hasn't yet reached feature-parity with InMemorySymbolIndex, but even in its current form it is confirmed to provide instantaneous responses to textDocument/definition and textDocument/references requests with negligible memory requirements even on huge codebases.

@xeno-by xeno-by requested review from gabro, laughedelic and olafurpg Dec 4, 2017

xeno-by added a commit to xeno-by/scalameta that referenced this pull request Dec 4, 2017

for {
document <- documentIndex.getDocument(path.toNIO.toUri)
_ = logger.info(s"Found document for $path")
_ <- isFreshSemanticdb(path, document)

This comment has been minimized.

@ShaneDelmore

ShaneDelmore Dec 4, 2017

Collaborator

This is odd. The naming convension says it should return a Boolean, the doc comment says it returns a bool, yet it returns Option[Unit] and performs side effects.

This comment has been minimized.

@olafurpg

olafurpg Dec 4, 2017

Member

Agreed, this should have a different name but it's unrelated to this PR, this is copy-paste from my code ;)

}
/**
* Returns false this this document is stale.

This comment has been minimized.

@ShaneDelmore

ShaneDelmore Dec 4, 2017

Collaborator

This sounds like the correct signature, but it actually returns Option[Unit]

This comment has been minimized.

@olafurpg

olafurpg Dec 4, 2017

Member

Unrelated to this PR.

): Option[Unit] = {
val ok = Option(())
val s = buffers.read(path)
if (s == document.contents) ok

This comment has been minimized.

@ShaneDelmore

ShaneDelmore Dec 4, 2017

Collaborator

Is this an expensive check?

This comment has been minimized.

@olafurpg

olafurpg Dec 4, 2017

Member

Unrelated to this PR. This is not on a critical path so it's not a problem, we should be doing edit distance here

def findSymbol(path: AbsolutePath, line: Int, column: Int): Option[Symbol] = {
???
conn match {

This comment has been minimized.

@ShaneDelmore

ShaneDelmore Dec 4, 2017

Collaborator

conn.map

@olafurpg

Woot ⭐️ 💯 Just gave this a swing in slick/slick and references+definition works perfectly 👍 👍 👍

screen shot 2017-12-04 at 07 53 51

Memory usage is tiny too, big win.

Since we're still experimenting I think it's great to have keep both in-memory and sql indices. It makes it easy to add a third implementation if we get a new idea. Once we find something more stable that we like a lot then we should consider settling on that approach and remove competing implementations.

@@ -11,7 +12,9 @@ object Main extends LazyLogging {
def main(args: Array[String]): Unit = {
// FIXME(gabro): this is vscode specific (at least the name)
val workspace = System.getProperty("vscode.workspace")
val logPath = s"$workspace/target/metaserver.log"
val logDir = new File(s"$workspace/.metaserver")

This comment has been minimized.

@olafurpg

olafurpg Dec 4, 2017

Member

s/java.io/java.nio.file/

for {
document <- documentIndex.getDocument(path.toNIO.toUri)
_ = logger.info(s"Found document for $path")
_ <- isFreshSemanticdb(path, document)

This comment has been minimized.

@olafurpg

olafurpg Dec 4, 2017

Member

Agreed, this should have a different name but it's unrelated to this PR, this is copy-paste from my code ;)

): Option[Unit] = {
val ok = Option(())
val s = buffers.read(path)
if (s == document.contents) ok

This comment has been minimized.

@olafurpg

olafurpg Dec 4, 2017

Member

Unrelated to this PR. This is not on a critical path so it's not a problem, we should be doing edit distance here

}
/**
* Returns false this this document is stale.

This comment has been minimized.

@olafurpg

olafurpg Dec 4, 2017

Member

Unrelated to this PR.

)
assertNoDiff(reconstructedDatabase.syntax, originalDatabase.syntax)
???
// TODO: No idea how important this test is.

This comment has been minimized.

@olafurpg

olafurpg Dec 4, 2017

Member

This can be turned into a test against in-memory db.

case (Some(symbolIdStmt), Some(definitionDataStmt)) =>
// TODO: Take into account symbol.definitionAlternative.
symbolIdStmt.setString(1, symbol.toString)
val symbolIdRs = symbolIdStmt.executeQuery()

This comment has been minimized.

@olafurpg

olafurpg Dec 4, 2017

Member

For longer-term maintainability, I think we should consider using a higher-level API like Quill http://getquill.io/

This comment has been minimized.

@gabro

gabro Dec 4, 2017

Collaborator

I can vouch for quill: extremely pleasant to work with! That said, I would make sure it works reliably with sqlite, which is often a bit "strange" compared to other dbs. I've had problems with slick and sqlite too in the past.

This comment has been minimized.

@ShaneDelmore

ShaneDelmore Dec 4, 2017

Collaborator

One more vote for quill, it has been my favorite db lib to use so far.

This comment has been minimized.

@xeno-by

xeno-by Dec 4, 2017

Member

@fwbrasil Can you comment on Quill's compatibility with SQLite?

This comment has been minimized.

@fwbrasil

fwbrasil Dec 4, 2017

Quill has the dialect implementation for SQLite but I've seen a complaint on Twitter that it has bugs. I guess not many people use Quill with SQLite since there are no open bugs for it. I can review the dialect later this week if you're planning to use it.

Another alternative is H2. It has better performance, can be used in a client-server setup if necessary, and at least supports standard SQL.

This comment has been minimized.

@olafurpg

olafurpg Dec 4, 2017

Member

I have no personal attachment nor preference for sqlite over h2. It sounds like we should consider using h2, could you check how h2 impacts performance/space @xeno-by ?

@fwbrasil Perhaps unrelated, but maybe you have some insights on a problem we're facing: how do we compose indices from different sources? I imagine this could impact the choice of a db, but I'm quite a noob in this space 😅. For example, we want to be able to query for all references of a symbol across library1.jar:library2.jar:test-classes/

  • We could store everything in a single database per user, but that would mean concurrent writes since indexing independent libraries can be done in parallel.
  • We could store everything in a single database per workspace, but that could mean a lot of wasted disk space for machines with many projects. I personally have 587 projects in my ~/dev directory, so this is a no-go for me.
  • We could separate each index by its source (library1.jar.db), and for each configuration we open connections to each database per entry in the classpath. I'm not sure how practical this approach is for large classpaths or evolving projects with moving modules.

I'm personally leaning towards option 1 if we can take advantage of concurrent writes. I had previously outruled this option because sqlite doesn't seem to support concurrent writes.

This comment has been minimized.

@ShaneDelmore

ShaneDelmore Dec 5, 2017

Collaborator

Both H2 and HSQLDB embedded dbs support multiple writers using mvcc.

This comment has been minimized.

@olafurpg

olafurpg Dec 5, 2017

Member

I didn't know about sqlite's WAL, nice find!

Reading up a bit, it seems we should also consider performance of deletions. We will need to regularly clean up semanticdb indices from stale snapshots of source files. It would be good to investigate difference between h2/sqlite with regards to deletions @xeno-by

def definitionData(symbol: Symbol): Option[SymbolData]
def referencesData(symbol: Symbol): List[SymbolData]
def indexDependencyClasspath(sourceJars: List[AbsolutePath]): Effects.IndexSourcesClasspath
def indexDatabase(document: s.Database): Effects.IndexSemanticdb
}

This comment has been minimized.

@xeno-by

xeno-by Dec 4, 2017

Member

@olafurpg What do you think about turning SymbolIndex into a small trait with a curated list of methods? If you're on board with that, I can submit a pull request with just this change right away.

This comment has been minimized.

@olafurpg

olafurpg Dec 4, 2017

Member

I like it, that's a great first step 👍

@xeno-by xeno-by referenced this pull request Dec 5, 2017

Closed

add sqlite-contrib #223

xeno-by added a commit to xeno-by/scalameta that referenced this pull request Dec 11, 2017

@xeno-by

This comment has been minimized.

Member

xeno-by commented Dec 15, 2017

Thank you, everyone, for your insightful comments! We'll be reworking this pull request according to the comments, and in the meanwhile, I've spun off a non-controversial commit into a separate PR: #132.

@xeno-by xeno-by closed this Dec 15, 2017

@xeno-by xeno-by deleted the xeno-by:topic/sqlite branch Dec 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment