Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype of SqliteSymbolIndex #94

Closed
wants to merge 8 commits into from

Conversation

xeno-by
Copy link
Member

@xeno-by xeno-by commented Dec 4, 2017

This is a prototype implementation of SymbolIndex that works on top of SQLite databases created by https://github.com/xeno-by/scalameta/tree/topic/language-server. The code will need significant work before becoming merge-worthy, but nonetheless I wanted to share it with you guys asap to discuss associated design decisions.

The first couple commits tune the existing infrastructure to work well on huge codebases that don't necessarily conform to the SBT project layout. Nothing much to say here - the diffs should be self-explanatory.

The remaining commits factor out the SymbolIndex contract, move the existing SymbolIndex into InMemorySymbolIndex and establish SqliteSymbolIndex. SqliteSymbolIndex hasn't yet reached feature-parity with InMemorySymbolIndex, but even in its current form it is confirmed to provide instantaneous responses to textDocument/definition and textDocument/references requests with negligible memory requirements even on huge codebases.

for {
document <- documentIndex.getDocument(path.toNIO.toUri)
_ = logger.info(s"Found document for $path")
_ <- isFreshSemanticdb(path, document)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd. The naming convension says it should return a Boolean, the doc comment says it returns a bool, yet it returns Option[Unit] and performs side effects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this should have a different name but it's unrelated to this PR, this is copy-paste from my code ;)

}

/**
* Returns false this this document is stale.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like the correct signature, but it actually returns Option[Unit]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR.

): Option[Unit] = {
val ok = Option(())
val s = buffers.read(path)
if (s == document.contents) ok
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an expensive check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR. This is not on a critical path so it's not a problem, we should be doing edit distance here

def findSymbol(path: AbsolutePath, line: Int, column: Int): Option[Symbol] = {
???
conn match {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conn.map

Copy link
Member

@olafurpg olafurpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woot ⭐️ 💯 Just gave this a swing in slick/slick and references+definition works perfectly 👍 👍 👍

screen shot 2017-12-04 at 07 53 51

Memory usage is tiny too, big win.

Since we're still experimenting I think it's great to have keep both in-memory and sql indices. It makes it easy to add a third implementation if we get a new idea. Once we find something more stable that we like a lot then we should consider settling on that approach and remove competing implementations.

@@ -11,7 +12,9 @@ object Main extends LazyLogging {
def main(args: Array[String]): Unit = {
// FIXME(gabro): this is vscode specific (at least the name)
val workspace = System.getProperty("vscode.workspace")
val logPath = s"$workspace/target/metaserver.log"
val logDir = new File(s"$workspace/.metaserver")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/java.io/java.nio.file/

for {
document <- documentIndex.getDocument(path.toNIO.toUri)
_ = logger.info(s"Found document for $path")
_ <- isFreshSemanticdb(path, document)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this should have a different name but it's unrelated to this PR, this is copy-paste from my code ;)

): Option[Unit] = {
val ok = Option(())
val s = buffers.read(path)
if (s == document.contents) ok
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR. This is not on a critical path so it's not a problem, we should be doing edit distance here

}

/**
* Returns false this this document is stale.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR.

)
assertNoDiff(reconstructedDatabase.syntax, originalDatabase.syntax)
???
// TODO: No idea how important this test is.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be turned into a test against in-memory db.

case (Some(symbolIdStmt), Some(definitionDataStmt)) =>
// TODO: Take into account symbol.definitionAlternative.
symbolIdStmt.setString(1, symbol.toString)
val symbolIdRs = symbolIdStmt.executeQuery()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For longer-term maintainability, I think we should consider using a higher-level API like Quill http://getquill.io/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can vouch for quill: extremely pleasant to work with! That said, I would make sure it works reliably with sqlite, which is often a bit "strange" compared to other dbs. I've had problems with slick and sqlite too in the past.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more vote for quill, it has been my favorite db lib to use so far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwbrasil Can you comment on Quill's compatibility with SQLite?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quill has the dialect implementation for SQLite but I've seen a complaint on Twitter that it has bugs. I guess not many people use Quill with SQLite since there are no open bugs for it. I can review the dialect later this week if you're planning to use it.

Another alternative is H2. It has better performance, can be used in a client-server setup if necessary, and at least supports standard SQL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no personal attachment nor preference for sqlite over h2. It sounds like we should consider using h2, could you check how h2 impacts performance/space @xeno-by ?

@fwbrasil Perhaps unrelated, but maybe you have some insights on a problem we're facing: how do we compose indices from different sources? I imagine this could impact the choice of a db, but I'm quite a noob in this space 😅. For example, we want to be able to query for all references of a symbol across library1.jar:library2.jar:test-classes/

  • We could store everything in a single database per user, but that would mean concurrent writes since indexing independent libraries can be done in parallel.
  • We could store everything in a single database per workspace, but that could mean a lot of wasted disk space for machines with many projects. I personally have 587 projects in my ~/dev directory, so this is a no-go for me.
  • We could separate each index by its source (library1.jar.db), and for each configuration we open connections to each database per entry in the classpath. I'm not sure how practical this approach is for large classpaths or evolving projects with moving modules.

I'm personally leaning towards option 1 if we can take advantage of concurrent writes. I had previously outruled this option because sqlite doesn't seem to support concurrent writes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both H2 and HSQLDB embedded dbs support multiple writers using mvcc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about sqlite's WAL, nice find!

Reading up a bit, it seems we should also consider performance of deletions. We will need to regularly clean up semanticdb indices from stale snapshots of source files. It would be good to investigate difference between h2/sqlite with regards to deletions @xeno-by

def definitionData(symbol: Symbol): Option[SymbolData]
def referencesData(symbol: Symbol): List[SymbolData]
def indexDependencyClasspath(sourceJars: List[AbsolutePath]): Effects.IndexSourcesClasspath
def indexDatabase(document: s.Database): Effects.IndexSemanticdb
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olafurpg What do you think about turning SymbolIndex into a small trait with a curated list of methods? If you're on board with that, I can submit a pull request with just this change right away.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, that's a great first step 👍

xeno-by added a commit to xeno-by/scalameta that referenced this pull request Dec 11, 2017
@xeno-by
Copy link
Member Author

xeno-by commented Dec 15, 2017

Thank you, everyone, for your insightful comments! We'll be reworking this pull request according to the comments, and in the meanwhile, I've spun off a non-controversial commit into a separate PR: #132.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants