-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle packages that are loaded from user code? #11
Comments
Option 1 is the way to go I think, static analysis is going to miss too much loop/eval'd code generation stuff. Caching it all make sense and we can check for changes with whatever |
I guess one question is: how can we get the list of all methods defined in a given module. In particular, can we somehow get methods that actually extend a function in say base, that were defined in a module? |
We can use the static analysis stuff and find all import statements, then filter the relevant functions' method tables by their |
Ah, cool! And actually, we might just see how performance looks like if we just look at all methods and check their |
I've whipped up a simple script that pulls hovers and goto defs from packages. It only takes ~ 5 seconds on my computer to pull everything from |
Do you mean line 4 takes 5 secs on your system? That line takes 0.1 seconds on my system :) But yes, speed doesn't seem to be a huge issue here. |
Well that's distressing! |
Oh no I meant to pull all the documents from running Anyway I've fiddled with this and have it working to pull Storage for Base/Core is ~ 30Mb and takes a minute or two to run initially. I haven't worked out a storage method yet. I've been burnt in the past trying to use HDF5/JLD to store data but maybe it's stabilised now? My idea would be that on installation we pull all doc's from Base and store it.
|
I agree, if we can stay away from HDF5/JLD, it would be nice. I think both have a build script, so our whole package installation story would get more complicated... we might even be able to just use julia's default serializer? That file format is probably not stable, but on the other hand, it is just a cache, so that might not be much of an issue. Could you share the link to the code that you mention? I'd like to get a bit of a feel for what it is doing. |
I've been playing around with this here and there and the last thing that seemed good in terms of speed, and storage was this. It writes some AST that holds a module The framework for what to actually store needs to be worked out so at the moment it just keeps methods and documentation (mainly to assess speed) |
Alternatively there's this approach which seems slower |
Scratch that, the second approach is better. You can simply serialize the resulting |
I've implemented the caching for It's easy to add new modules to the cache, I just need to work out the logic (as with |
The above branch has been updated to asynchronously load and cache modules and parse the entire workspace on initialisation |
I've thought a bit more about this, and here is another idea: We add one more julia process to the mix, namely a singleton symbol server. Essentially when a language server process starts, it will try to connect via a named pipe to this global singleton symbol server process. If that fails (because the symbol server has not been started), the language server process will start the symbol server and then connect. Essentially whenever there is at least one language server process running, there will be one global symbol server that serves all language server processes. This symbol sever holds the cache of symbols. It also persists this cache on disc. When a language server needs symbol info for a specific package, it queries the global symbol server. If the cache has that info, the symbol server returns it from the cache, otherwise it will use similar logic as we had before to retrieve the symbol/doc info: the symbol server process will spawn a new julia process that loads the module and then returns all the extracted info to the symbol server, and there it gets stored in the cache (and also persisted onto the cache on disc). The symbol server will have one dedicated task for this, i.e. one task will work off a queue of packages that still need to be cached. That way we can prevent a large number of julia process from starting that each load one package for extraction of symbol/doc info (which happens right now and can get my beefy machine to stall). So this is partly going back to the on disc cache that @ZacLN originally had, but the difference is that the only process that interacts with that on disc cache is the symbol server, so we get around the race condition that I was worried about. Mainly I've come to terms that we need an on disc cache if we want decent performance and not long waits. Also, @ZacLN you mentioned that retrieving symbol/doc info takes even longer with triangular dispatch, right? So this would be even more important. This comment suggests that other language server implementations have used a similar pattern before. Finally, this still allows us to completely isolate the language server and the symbol server from user code, i.e. only the julia processes that are spawned by the symbol server would actually run/import packages. I still think that that is super valuable in terms of robustness of the whole language server. I have a prototype implementation of the singleton server story that works on Windows. I'll test it on Linux next, and then we could think about incorporating this. Main question for @ZacLN is whether this would enable a scenario where you don't have to eval imports of packages in the language server process? |
Just read through this issue, and I noticed that I've been doing something similar in DocSeeker, which supports inspecting all packages in The relevant part of the code (relevant for LSP, that is) isn't all that fleshed out, and especially doesn't have it's own process interacting with the cache as @davidanthoff suggests above, but seems to work at least okayish for now. |
Closing, as we now load this in a separate process. |
Ideally we would not ever load any user code into the LS process. This makes it non-trivial to get at documentation, symbols, methods etc. from packages that user code might load. I'm not entirely sure how to best handle this, here are some rough ideas:
In general it might also make sense to cache symbol/method/function/doc information about packages on file or something...
The text was updated successfully, but these errors were encountered: