Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building a shared global environment #13445

Open
josevalim opened this issue Mar 27, 2024 · 5 comments
Open

Building a shared global environment #13445

josevalim opened this issue Mar 27, 2024 · 5 comments

Comments

@josevalim
Copy link
Member

Hi @lukaszsamson, @mhanberg, @scohen, and @michalmuskala!

We have had several discussions in the past about improving Elixir for language servers. The discussions usually centered around two topics:

  • The global environment which contains all modules and functions, alongside their imports, aliases, requires, etc (note that not all modules are compiled with tracer events, such as the ones from Elixir or Erlang modules, so there is still need to fallback to runtime reflection APIs)

  • The buffer environment which is obtained by analyzing the currently open file and only makes sense within the current file (variable definition, module attributes)

In order to build the global environment, language servers need to add their own compilation tracers, and customize compilation. In turn, that requires a custom build directory and the need to compile all code at least twice. I would like to discuss the option of having Mix build this shared global environment instead.

In a nutshell, the idea is that Mix in dev mode will either add a separate .termdb file alongside each .beam file or augment the .beam files with an additional chunk. This way, language servers can rely on the regular code compilation to generate all artifacts they need and then use the file watching mechanism to observe when such files are added or removed. You can then load such files into another database of your preference. Another file will be made available with compilation diagnostics as well, so they can be shown in the UI.

Here are questions for you:

  1. How does this idea sound? Would it help your language server?
  2. I am assuming language servers can watch how .termdb files will be added to _build, is this true?

Assuming we are happy with this, we should discuss:

  1. How to store this information? .beam files? Separate files? Which format?
  2. What to store in those files. We can probably summarize the needs of all language servers.
  3. Anything else?

Thank you for your time.

PS: the language server can also just run mix compile to trigger a new compilation. However, we need to add locking to make sure we don't have concurrent compilations, @lukaszsamson plans to explore this feature too.

@scohen
Copy link
Contributor

scohen commented Mar 28, 2024

One rather large issue I see with the approach you've highlighted is that if you're using mix like this, it can only compile things when they've been synced to disk. Lexical doesn't require this now, and gives errors when they happen, as opposed to when the user saves. I think changing this would be a step backwards in developer experience. I really have come to enjoy the as-you type compilation and would like to increase its prevalence rather than eliminate it.

Also, using the LSP file watching mechanism is a bit fraught. Emacs, for example ships with the ability to monitor only 1000 files (at least on MacOS), which gets exhausted quite quickly (this is editor-wide, and editors can run multiple language servers). Worse, in order to increase this, you need to recompile Emacs, and then disable some system protections, which requires rebooting your computer four times. I doubt many emacs users will do this, so this solution will exclude Emacs users, at least on MacOS. If you're thinking "hey, we should only watch certain files to reduce the number of watched files", then emacs has your back too (at least with lsp-mode). It ignores that request and watches every file in your directory anyways.

I'm an Emacs user, so i'm most familiar with it, I'm not sure how other editors implement or manage file watching. I would hope they do a better job.

Currently, Lexical takes an indexing approach to look at your source code --it doesn't make heavy use of compilation tracers, and actually runs mix tasks to compile and get diagnostics. It just places the artifacts in a different directory. This is slightly annoying and slow when you start the LS in a clean directory since we have to get deps, etc. but it's mostly solved, and does use the standard tooling. Elixirls behaves similarly.

Another question: If your project and the LS share a compilation directory, then the project will get compiled a lot more than it would be presently. Lexical, for example, compiles the current file on every keypress. Will that interfere with tools like phoenix live reload?

Another file will be made available with compilation diagnostics as well, so they can be shown in the UI.

This seems nice, though I think i prefer the approach that the Code module introduced in 1.16, where you can get the diagnostics in code. Going from fille -> diagnostic seems like it will be much slower than running it in-memory.

I'm not familiar with .termdb files, and what I've found on the net doesn't seem to relate to something useful to a language server in a way that I'm understanding. Can you elucidate?

@josevalim
Copy link
Member Author

Hi @scohen, thanks for the input!

One rather large issue I see with the approach you've highlighted is that if you're using mix like this, it can only compile things when they've been synced to disk.

I would think those approaches are orthogonal. We can build the global environment, so you can fetch global information about the project (to create stuff like workspace symbols) and find references, but still use Code.with_diagnostics to get the live-on-typing information.

Exactly as you said here:

Currently, Lexical takes an indexing approach to look at your source code --it doesn't make heavy use of compilation tracers, and actually runs mix tasks to compile and get diagnostics.

The approach would not change. The idea, however, is that Elixir will provide you with even more information upfront, so you don't have to use any compilation tracer and, hopefully, not even need a separate directory for LS builds. :)

Also, using the LSP file watching mechanism is a bit fraught.

Ah, that's a shame. So we would also need to figure out a mechanism to communicate between processes which files have changed.

I'm not familiar with .termdb files, and what I've found on the net doesn't seem to relate to something useful to a language server in a way that I'm understanding. Can you elucidate?

Sorry, that's just a random name I came up with. It doesn't mean anything right now.

@scohen
Copy link
Contributor

scohen commented Mar 28, 2024

Sorry, that's just a random name I came up with. It doesn't mean anything right now.

Omg, that's hilarious, i spent about 45 minutes looking up what a termdb file is, and could not, for the life of me, figure out how that format would help here. 🤣 They actually exist, and are related to termcaps, and store information to display different escape sequences.

I'm still having a little difficulty understanding how the language server and the normal execution environment will coexist. Is the idea that the language server shell out to mix compile when a document is saved, then re-read all the database files and integrate them into its internal databases? Or is the idea that we read these database files on startup, and then use our current processes to figure out what has changed, what needs to be reindexed, and what the errors and warnings are?

I'm also concerned about having to support two ways of doing things. These changes are welcome, but a language server needs to support older versions of elixir that don't have these features, so we'll need to keep the internal compilers in addition to integrating the elixir-specific databases.

@josevalim
Copy link
Member Author

Omg, that's hilarious, i spent about 45 minutes looking up what a termdb file is, and could not, for the life of me, figure out how that format would help here.

Sorry 😱

I'm still having a little difficulty understanding how the language server and the normal execution environment will coexist.

I am still trying to figure out the details myself. Let's start with the problem and we don't even need to talk about language servers, as the main problems exist within Elixir + Phoenix today. If you run mix phx.server and call mix compile, they will clash with other, potentially leading to failures. Even worse, if you call mix compile manually on the CLI, mix phx.server won't pick up the changed files either. So even with only Elixir + Phoenix, it is clear we need both a lock during compilation and the ability to communicate compilation results across OS processes.

To solve this problem, I would make it so Mix starts a UNIX socket under _build/ that acts as a server. This server will perform both locking and pubsub. If you have more than one instance running Mix, only one of them wins and act as the server.

Now, when mix compile runs, it will connect to the server and ask permission to compile code. This is the locking feature. If you have two instances and they race on mix compile, one of them will run first, and then notify the second of which modules it has purged. This allows us to solve the Phoenix case: even the user runs mix compile on the CLI, Phoenix will know about modules that have changed and update them accordingly.


What about language servers?

One of the reasons language servers have their own build directory is to work around the problems above. This means every project needs to be compiled twice, one for the language server and another for the user. For projects with 100+ deps or 1000 files, this can be an issue.

If we introduce locking and pubsub, language servers could leverage this too! They will subscribe to pubsub events and now how the project changes. HOWEVER, another reason why language servers have their own _build is because they need to compile code with their own compilation tracers. So if we want to solve this problem, Elixir also needs to build the database (or .termdb files lol) that language servers build during compilation. You would still be able to run Code.with_diagnostics and so on, while you listen to pubsub events to know which modules have been added or removed.

I'm also concerned about having to support two ways of doing things. These changes are welcome, but a language server needs to support older versions of elixir that don't have these features, so we'll need to keep the internal compilers in addition to integrating the elixir-specific databases.

This is definitely a challenge but we will need to figure how to continue improving. Given we have 3 language servers, maybe some of them would be fine with a "latest Elixir" policy. Dunno.

@scohen
Copy link
Contributor

scohen commented Mar 29, 2024

To solve this problem, I would make it so Mix starts a UNIX socket under _build/ that acts as a server.

I didn't know that was on the table, this is an interesting approach, and definitely worth pursuing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants