How to handle packages that are loaded from user code? #11

davidanthoff · 2016-11-28T05:50:28Z

Ideally we would not ever load any user code into the LS process. This makes it non-trivial to get at documentation, symbols, methods etc. from packages that user code might load. I'm not entirely sure how to best handle this, here are some rough ideas:

span a new julia process, that julia process loads the package, and returns all the necessary information to the LS process, say in JSON format. This could be done async, so maybe some of the symbols and info from the package would only show up after a while.
run our static analysis on the package that a user loads. This will probably miss some things...

In general it might also make sense to cache symbol/method/function/doc information about packages on file or something...

ZacLN · 2016-11-28T08:04:36Z

Option 1 is the way to go I think, static analysis is going to miss too much loop/eval'd code generation stuff.

Caching it all make sense and we can check for changes with whatever compilecache uses

davidanthoff · 2016-11-29T01:45:46Z

I guess one question is: how can we get the list of all methods defined in a given module. In particular, can we somehow get methods that actually extend a function in say base, that were defined in a module?

ZacLN · 2016-11-29T06:50:52Z

We can use the static analysis stuff and find all import statements, then filter the relevant functions' method tables by their .module

davidanthoff · 2016-12-02T06:08:29Z

Ah, cool! And actually, we might just see how performance looks like if we just look at all methods and check their .module, if that is not too slow we could avoid the whole static analysis, right?

ZacLN · 2016-12-02T08:16:16Z

I've whipped up a simple script that pulls hovers and goto defs from packages. It only takes ~ 5 seconds on my computer to pull everything from Base, this will be a bit longer when I add signatures but I don't think it's anything to worry about. Size for Base is around 3Mb so storage is no great ordeal

davidanthoff · 2016-12-03T06:12:28Z

Do you mean line 4 takes 5 secs on your system? That line takes 0.1 seconds on my system :) But yes, speed doesn't seem to be a huge issue here.

ZacLN · 2016-12-03T11:01:35Z

Well that's distressing!

ZacLN · 2016-12-10T10:05:49Z

Oh no I meant to pull all the documents from running cachemodule! I was worried my computer was busted somehow...

Anyway I've fiddled with this and have it working to pull Base/Core and all sub modules and to cache user installed packages.

Storage for Base/Core is ~ 30Mb and takes a minute or two to run initially. I haven't worked out a storage method yet. I've been burnt in the past trying to use HDF5/JLD to store data but maybe it's stabilised now?

My idea would be that on installation we pull all doc's from Base and store it.

On server startup this get's loaded into server.DocStore.
Each new Document has it's own DocStore::Dict which by default includes an entry pointing to server.Doctore[:Base].
Using/import of any packages results in: 1) loading to server.DocStore 2) saving to HDD 3) a new entry in document.DocStore pointing to the loaded package.
vscode menu option to reload all DocStores (i.e. following updated version of Julia / Pkg.update() ).

davidanthoff · 2016-12-14T23:55:02Z

I agree, if we can stay away from HDF5/JLD, it would be nice. I think both have a build script, so our whole package installation story would get more complicated... we might even be able to just use julia's default serializer? That file format is probably not stable, but on the other hand, it is just a cache, so that might not be much of an issue.

Could you share the link to the code that you mention? I'd like to get a bit of a feel for what it is doing.

ZacLN · 2016-12-15T00:15:52Z

I've been playing around with this here and there and the last thing that seemed good in terms of speed, and storage was this.

It writes some AST that holds a module Namespace whose sub modules are the relevant modules (i.e. Base, Core, LanguageServer) so it's easy to store (a .jl file) quite quick to construct, super quick to load and editing it simply requires parsing the file and selecting the modules to update or add. I haven't got it set up to load sub-modules yet but I think that'll be pretty easy.

The framework for what to actually store needs to be worked out so at the moment it just keeps methods and documentation (mainly to assess speed)

ZacLN · 2016-12-15T00:20:10Z

Alternatively there's this approach which seems slower

ZacLN · 2016-12-15T16:31:10Z

Scratch that, the second approach is better. You can simply serialize the resulting Dict as you proposed and store it. Can you thik of some existing code in Base that we could use to quickly search Dict keys by partial matching?

ZacLN · 2016-12-16T11:27:50Z

I've implemented the caching for Base and Core in this branch. It requires the forked JuliaParser (and includes all the other changes that are I think now ready for a merge).

It's easy to add new modules to the cache, I just need to work out the logic (as with includes) for which modules are loaded by any given file.

ZacLN · 2016-12-18T10:52:31Z

The above branch has been updated to asynchronously load and cache modules and parse the entire workspace on initialisation

davidanthoff · 2017-05-14T04:59:22Z

I've thought a bit more about this, and here is another idea:

We add one more julia process to the mix, namely a singleton symbol server. Essentially when a language server process starts, it will try to connect via a named pipe to this global singleton symbol server process. If that fails (because the symbol server has not been started), the language server process will start the symbol server and then connect. Essentially whenever there is at least one language server process running, there will be one global symbol server that serves all language server processes.

This symbol sever holds the cache of symbols. It also persists this cache on disc. When a language server needs symbol info for a specific package, it queries the global symbol server. If the cache has that info, the symbol server returns it from the cache, otherwise it will use similar logic as we had before to retrieve the symbol/doc info: the symbol server process will spawn a new julia process that loads the module and then returns all the extracted info to the symbol server, and there it gets stored in the cache (and also persisted onto the cache on disc). The symbol server will have one dedicated task for this, i.e. one task will work off a queue of packages that still need to be cached. That way we can prevent a large number of julia process from starting that each load one package for extraction of symbol/doc info (which happens right now and can get my beefy machine to stall).

So this is partly going back to the on disc cache that @ZacLN originally had, but the difference is that the only process that interacts with that on disc cache is the symbol server, so we get around the race condition that I was worried about.

Mainly I've come to terms that we need an on disc cache if we want decent performance and not long waits. Also, @ZacLN you mentioned that retrieving symbol/doc info takes even longer with triangular dispatch, right? So this would be even more important. This comment suggests that other language server implementations have used a similar pattern before.

Finally, this still allows us to completely isolate the language server and the symbol server from user code, i.e. only the julia processes that are spawned by the symbol server would actually run/import packages. I still think that that is super valuable in terms of robustness of the whole language server.

I have a prototype implementation of the singleton server story that works on Windows. I'll test it on Linux next, and then we could think about incorporating this.

Main question for @ZacLN is whether this would enable a scenario where you don't have to eval imports of packages in the language server process?

pfitzseb · 2017-07-12T12:06:48Z

Just read through this issue, and I noticed that I've been doing something similar in DocSeeker, which supports inspecting all packages in Pkg.dir() and saving symbol information as well as docs to a cache on disk to enable decently fast searches through un-loaded packages.

The relevant part of the code (relevant for LSP, that is) isn't all that fleshed out, and especially doesn't have it's own process interacting with the cache as @davidanthoff suggests above, but seems to work at least okayish for now.

davidanthoff · 2018-11-13T04:12:11Z

Closing, as we now load this in a separate process.

davidanthoff added the discussion label Nov 28, 2016

davidanthoff added this to the Backlog milestone Nov 28, 2016

ZacLN closed this as completed Dec 3, 2016

ZacLN reopened this Dec 3, 2016

ZacLN closed this as completed Dec 15, 2016

ZacLN reopened this Dec 15, 2016

ZacLN mentioned this issue May 29, 2017

Updates LS for CSTParser refactor #58

Merged

davidanthoff closed this as completed Nov 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle packages that are loaded from user code? #11

How to handle packages that are loaded from user code? #11

davidanthoff commented Nov 28, 2016

ZacLN commented Nov 28, 2016

davidanthoff commented Nov 29, 2016

ZacLN commented Nov 29, 2016

davidanthoff commented Dec 2, 2016

ZacLN commented Dec 2, 2016

davidanthoff commented Dec 3, 2016

ZacLN commented Dec 3, 2016

ZacLN commented Dec 10, 2016

davidanthoff commented Dec 14, 2016

ZacLN commented Dec 15, 2016 •

edited

ZacLN commented Dec 15, 2016

ZacLN commented Dec 15, 2016 •

edited

ZacLN commented Dec 16, 2016

ZacLN commented Dec 18, 2016

davidanthoff commented May 14, 2017

pfitzseb commented Jul 12, 2017

davidanthoff commented Nov 13, 2018

How to handle packages that are loaded from user code? #11

How to handle packages that are loaded from user code? #11

Comments

davidanthoff commented Nov 28, 2016

ZacLN commented Nov 28, 2016

davidanthoff commented Nov 29, 2016

ZacLN commented Nov 29, 2016

davidanthoff commented Dec 2, 2016

ZacLN commented Dec 2, 2016

davidanthoff commented Dec 3, 2016

ZacLN commented Dec 3, 2016

ZacLN commented Dec 10, 2016

davidanthoff commented Dec 14, 2016

ZacLN commented Dec 15, 2016 • edited

ZacLN commented Dec 15, 2016

ZacLN commented Dec 15, 2016 • edited

ZacLN commented Dec 16, 2016

ZacLN commented Dec 18, 2016

davidanthoff commented May 14, 2017

pfitzseb commented Jul 12, 2017

davidanthoff commented Nov 13, 2018

ZacLN commented Dec 15, 2016 •

edited

ZacLN commented Dec 15, 2016 •

edited