Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle packages that are loaded from user code? #11

Closed
davidanthoff opened this issue Nov 28, 2016 · 17 comments
Closed

How to handle packages that are loaded from user code? #11

davidanthoff opened this issue Nov 28, 2016 · 17 comments

Comments

@davidanthoff
Copy link
Member

Ideally we would not ever load any user code into the LS process. This makes it non-trivial to get at documentation, symbols, methods etc. from packages that user code might load. I'm not entirely sure how to best handle this, here are some rough ideas:

  1. span a new julia process, that julia process loads the package, and returns all the necessary information to the LS process, say in JSON format. This could be done async, so maybe some of the symbols and info from the package would only show up after a while.
  2. run our static analysis on the package that a user loads. This will probably miss some things...

In general it might also make sense to cache symbol/method/function/doc information about packages on file or something...

@davidanthoff davidanthoff added this to the Backlog milestone Nov 28, 2016
@ZacLN
Copy link
Contributor

ZacLN commented Nov 28, 2016

Option 1 is the way to go I think, static analysis is going to miss too much loop/eval'd code generation stuff.

Caching it all make sense and we can check for changes with whatever compilecache uses

@davidanthoff
Copy link
Member Author

I guess one question is: how can we get the list of all methods defined in a given module. In particular, can we somehow get methods that actually extend a function in say base, that were defined in a module?

@ZacLN
Copy link
Contributor

ZacLN commented Nov 29, 2016

We can use the static analysis stuff and find all import statements, then filter the relevant functions' method tables by their .module

@davidanthoff
Copy link
Member Author

Ah, cool! And actually, we might just see how performance looks like if we just look at all methods and check their .module, if that is not too slow we could avoid the whole static analysis, right?

@ZacLN
Copy link
Contributor

ZacLN commented Dec 2, 2016

I've whipped up a simple script that pulls hovers and goto defs from packages. It only takes ~ 5 seconds on my computer to pull everything from Base, this will be a bit longer when I add signatures but I don't think it's anything to worry about. Size for Base is around 3Mb so storage is no great ordeal

@davidanthoff
Copy link
Member Author

Do you mean line 4 takes 5 secs on your system? That line takes 0.1 seconds on my system :) But yes, speed doesn't seem to be a huge issue here.

@ZacLN
Copy link
Contributor

ZacLN commented Dec 3, 2016

Well that's distressing!

@ZacLN ZacLN closed this as completed Dec 3, 2016
@ZacLN ZacLN reopened this Dec 3, 2016
@ZacLN
Copy link
Contributor

ZacLN commented Dec 10, 2016

Oh no I meant to pull all the documents from running cachemodule! I was worried my computer was busted somehow...

Anyway I've fiddled with this and have it working to pull Base/Core and all sub modules and to cache user installed packages.

Storage for Base/Core is ~ 30Mb and takes a minute or two to run initially. I haven't worked out a storage method yet. I've been burnt in the past trying to use HDF5/JLD to store data but maybe it's stabilised now?

My idea would be that on installation we pull all doc's from Base and store it.

  • On server startup this get's loaded into server.DocStore.
  • Each new Document has it's own DocStore::Dict which by default includes an entry pointing to server.Doctore[:Base].
  • Using/import of any packages results in: 1) loading to server.DocStore 2) saving to HDD 3) a new entry in document.DocStore pointing to the loaded package.
  • vscode menu option to reload all DocStores (i.e. following updated version of Julia / Pkg.update() ).

@davidanthoff
Copy link
Member Author

I agree, if we can stay away from HDF5/JLD, it would be nice. I think both have a build script, so our whole package installation story would get more complicated... we might even be able to just use julia's default serializer? That file format is probably not stable, but on the other hand, it is just a cache, so that might not be much of an issue.

Could you share the link to the code that you mention? I'd like to get a bit of a feel for what it is doing.

@ZacLN
Copy link
Contributor

ZacLN commented Dec 15, 2016

I've been playing around with this here and there and the last thing that seemed good in terms of speed, and storage was this.

It writes some AST that holds a module Namespace whose sub modules are the relevant modules (i.e. Base, Core, LanguageServer) so it's easy to store (a .jl file) quite quick to construct, super quick to load and editing it simply requires parsing the file and selecting the modules to update or add. I haven't got it set up to load sub-modules yet but I think that'll be pretty easy.

The framework for what to actually store needs to be worked out so at the moment it just keeps methods and documentation (mainly to assess speed)

@ZacLN
Copy link
Contributor

ZacLN commented Dec 15, 2016

Alternatively there's this approach which seems slower

@ZacLN
Copy link
Contributor

ZacLN commented Dec 15, 2016

Scratch that, the second approach is better. You can simply serialize the resulting Dict as you proposed and store it. Can you thik of some existing code in Base that we could use to quickly search Dict keys by partial matching?

@ZacLN ZacLN closed this as completed Dec 15, 2016
@ZacLN ZacLN reopened this Dec 15, 2016
@ZacLN
Copy link
Contributor

ZacLN commented Dec 16, 2016

I've implemented the caching for Base and Core in this branch. It requires the forked JuliaParser (and includes all the other changes that are I think now ready for a merge).

It's easy to add new modules to the cache, I just need to work out the logic (as with includes) for which modules are loaded by any given file.

@ZacLN
Copy link
Contributor

ZacLN commented Dec 18, 2016

The above branch has been updated to asynchronously load and cache modules and parse the entire workspace on initialisation

@davidanthoff
Copy link
Member Author

I've thought a bit more about this, and here is another idea:

We add one more julia process to the mix, namely a singleton symbol server. Essentially when a language server process starts, it will try to connect via a named pipe to this global singleton symbol server process. If that fails (because the symbol server has not been started), the language server process will start the symbol server and then connect. Essentially whenever there is at least one language server process running, there will be one global symbol server that serves all language server processes.

This symbol sever holds the cache of symbols. It also persists this cache on disc. When a language server needs symbol info for a specific package, it queries the global symbol server. If the cache has that info, the symbol server returns it from the cache, otherwise it will use similar logic as we had before to retrieve the symbol/doc info: the symbol server process will spawn a new julia process that loads the module and then returns all the extracted info to the symbol server, and there it gets stored in the cache (and also persisted onto the cache on disc). The symbol server will have one dedicated task for this, i.e. one task will work off a queue of packages that still need to be cached. That way we can prevent a large number of julia process from starting that each load one package for extraction of symbol/doc info (which happens right now and can get my beefy machine to stall).

So this is partly going back to the on disc cache that @ZacLN originally had, but the difference is that the only process that interacts with that on disc cache is the symbol server, so we get around the race condition that I was worried about.

Mainly I've come to terms that we need an on disc cache if we want decent performance and not long waits. Also, @ZacLN you mentioned that retrieving symbol/doc info takes even longer with triangular dispatch, right? So this would be even more important. This comment suggests that other language server implementations have used a similar pattern before.

Finally, this still allows us to completely isolate the language server and the symbol server from user code, i.e. only the julia processes that are spawned by the symbol server would actually run/import packages. I still think that that is super valuable in terms of robustness of the whole language server.

I have a prototype implementation of the singleton server story that works on Windows. I'll test it on Linux next, and then we could think about incorporating this.

Main question for @ZacLN is whether this would enable a scenario where you don't have to eval imports of packages in the language server process?

@pfitzseb
Copy link
Member

Just read through this issue, and I noticed that I've been doing something similar in DocSeeker, which supports inspecting all packages in Pkg.dir() and saving symbol information as well as docs to a cache on disk to enable decently fast searches through un-loaded packages.

The relevant part of the code (relevant for LSP, that is) isn't all that fleshed out, and especially doesn't have it's own process interacting with the cache as @davidanthoff suggests above, but seems to work at least okayish for now.

@davidanthoff
Copy link
Member Author

Closing, as we now load this in a separate process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants