Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hylo LSP proof-of-concept #1010

Open
koliyo opened this issue Sep 15, 2023 · 15 comments
Open

Hylo LSP proof-of-concept #1010

koliyo opened this issue Sep 15, 2023 · 15 comments

Comments

@koliyo
Copy link
Contributor

koliyo commented Sep 15, 2023

Hello Hylo team!

I am really excited about the Hylo language effort, mainly from the watching the published presentations and listening to the podcasts, which has convinced me that Hylo has very high potential, and if the project and community get enough momentum this could have a large impact on writing safe and performant code.

I wanted to get more familiar with the language itself, as well as design and implementation details of the compiler.

So I thought building an LSP for Hylo was the perfect project to get hands on experience.

I do have a working proof-of-concept implementation of a hylo-lsp server at this point. I have not been able to spend as much time on the actual Hylo compiler API as I originally intended, because I soon realized that the Swift LSP ecosystem was missing some key components to bootstrap such an LSP project.

I found the LanguageServerProtocol projects, which does provide some of the building blocks, but has up until now only been used for client side LSP integration.

So I have spent quite a bit of work to add functionality for LSP server development, tracked in this issue: ChimeHQ/LanguageServerProtocol#7

After working on those parts, I could then return to the hylo specific parts for an hylo-lsp, writen in Swift, and integrating with the Hylo compiler API.

The hylo-lsp repo is here: https://github.com/koliyo/hylo-lsp

And the current implementation is in the feature/wip branch.

The project consists of the following:

  • hylo-lsp
    • A library integrating LanguageServerProtocol with the Hylo compiler
  • hylo-lsp-executable
    • Wrapping hylo-lsp as an executable, with command line flags for logging and lsp protocol transport options
  • hylo-lsp-test-client
    • POC client to test the LSP implementation, in a single process, where both client and server is written in Swift.
  • hylo-lsp-vs-code
    • VS Code extension with the hylo-lsp-executable as LSP backend.

I also have a fork of hylo with some minor changes to allow the LSP development.

The current POC LSP implementation has the following functionality:

  • textDocument/definition
    • Jump to symbol, eg cmd-clicking a symbol
  • textDocument/documentSymbol
    • All symbols in a document, VS Code outline view
  • textDocument/semanticTokens/full
    • Semantic token syntax based hightlighter (partial)

For the sematic token functionality I am missing AST nodes to build more complete support. Specifically AST nodes for keywords, eg public, fun, etc. I understand these do not need to be stored for normal compilation, but for complete syntax highlighting coverage these would be useful to have. Maybe using some custom parameter as part of AST construction?

Also I have forked the hylo compiler itself with some minor changes, mainly to build hylo compiler as a library, and adding public constructor to SourcePosition.

What are your thoughts on LSP development, have you started some internal effort towards this, or how do you see this work going forward? I would be happy to contribute and continue help developing some parts of the LSP server.

The implementation is currently in a pretty rough WIP state :)

I really hope you continue your effort with Hylo development, and that wider adoption start spreading. It is great that you have a public roadmap on the website as well.

Let me know your thoughts on this!

@kyouko-taiga
Copy link
Contributor

First of all, wow! Thanks a lot for this work.

I think I can confidently speak on behalf of all contributors to say that we're very excited to see this work going forward. I would be happy to provide all the help I can.

For the sematic token functionality I am missing AST nodes to build more complete support.

Almost all declarations have an introducer or introducerSite property that provides the source locations of their introducer keyword. For example, FunctionDecl.introducerSite is the source range of the fun keyword in all function declarations.

Declaration modifiers (e.g., public) and other keywords are represented with a type called SourceRepresentable<T> that's notionally a value of type T (typically the abstract value of the keyword) along with its source locations.

Would you be able to work with these objects alone or do you need proper AST nodes?

@koliyo
Copy link
Contributor Author

koliyo commented Sep 15, 2023

Ok, nice, i will look into using the introducer, introducerSite, and other references. I do not really need it to be explicitly part of the AST, just as long as it is available somewhere. I just needed some pointers on where to look I guess 🤗

And to be honest, I don't plan on spending a ton of effort on this. But it was a good project to get in depth knowledge of Hylo, and hopefully a nudge in the right direction to get broader adoption of the language.

What I really want is to actually start using Hylo itself. And I realize we are still in a very early phase. But as soon as the compiler, language design, and stdlib, are mature enough I want to be able to start building some application/library using Hylo.

And IDE integration specifically is to me a really important part of making a language productive in the hands of a developer. As a reference, I do a lot of work in modern, cross-platform, .NET environment with C#, and the Roslyn-based LSP is such a night-and-day differentiator compared to not having that level of IDE support.

Additionally, not just for human developers, AI developers will have great usage of LSP tools for code comprehension and navigation. And Hylo could imo be an extremely powerful platform in this context for building next generation applications.

My thoughts on next steps for this LSP initiative:

  1. Get my forks of the dependencies, mainly LangaugeServerProtocol library, merged back to the upstream repositories.
  2. Make sure the hylo-lsp WIP is cleaned up a bit, and make sure others can build and test it out locally.
  3. End user (developer) artifacts. First step make sure there is a build pipeline with a release artifact, eg a .vsix extension archive that can be installed in VS Code. And at later point this could also be published to extension marketplace.
  4. Hylo organization based governance for the lsp, eg migrate to hylo-lang/hylo-lsp repository(?)

@koliyo
Copy link
Contributor Author

koliyo commented Sep 22, 2023

Regarding keywords, I have resolved a lot of tokens since last time. But I have not been able to get introducer for ProductTypeDecl, is this not available? For VarDecl I was able to use the outer BindingDecl.

Additionally, are ranges for source code comments available?

@kyouko-taiga
Copy link
Contributor

But I have not been able to get introducer for ProductTypeDecl, is this not available?

It seems like we don't have one. Please open an issue. I'll implement it when I find some time.

For VarDecl I was able to use the outer BindingDecl.

That's probably the best way to get an introducer. Note that a BindingDecl may introduce multiple variables; all will have the same introducer.

Additionally, are ranges for source code comments available?

No, comments are simply ignored during tokenization. It may be a little cumbersome to add them to the AST.

Perhaps in the long run we may need a different parser to interact with LSP, to produce a concrete syntax tree rather than an abstract one. IIUC that's what Swift does. (see swift-syntax).

@koliyo
Copy link
Contributor Author

koliyo commented Sep 29, 2023

I have worked a bit with setting up a release workflow for the vscode extension. I have a first test version available now, if anyone is interesting in trying it out

NOTE: The release build only support Mac Silicon (M1 & M2) at the moment.

https://github.com/koliyo/hylo-lsp/releases/tag/v0.5.0

Download the vsix file and install from command line:

code --install-extension ~/Downloads/hylo-lang-0.5.0.vsix

It is also possible to build locally with the script build-and-install-vscode-extension.sh in the hylo-lsp repository.

Current functionality

Semantic token support, for quite a lot different of nodes types at this point
Screenshot 2023-09-29 at 20 17 18

Document symbol list/outline
Screenshot 2023-09-29 at 20 17 45

Jump to symbol/definition
image

Error diagnostics for compilation errors
Screenshot 2023-09-29 at 20 22 38

Of course also lots of things also not working well, it is a very early version 😅

@koliyo
Copy link
Contributor Author

koliyo commented Oct 1, 2023

I pushed a new version, there were some pretty severe document sync issues in the first release. Getting a lot fewer errors now. Also moved the release to the hylo-vscode-extension repository

https://github.com/koliyo/hylo-vscode-extension/releases/tag/v0.5.3

@kyouko-taiga
Copy link
Contributor

Thanks a lot for this progress report. It looks absolutely awesome!

I'm sorry I didn't have time to test the extension earlier. It seems that my editor is failing to connect to the LSP server, as all requests report a failure. Any idea of a possible step I may have missed?

@koliyo
Copy link
Contributor Author

koliyo commented Oct 27, 2023

Please try again with updated release, I have done more development and pushed some new versions.
Are you running on Mac?

@koliyo
Copy link
Contributor Author

koliyo commented Oct 27, 2023

Here is a summary with feedback on LSP developmen for Hylo, that probably needs to be considered going forward. At least in longer time perspective. This includes:

  • Compiler must have mechanisms for when source code is not on disk, In the LSP we get in-memory representation of source code that is not saved to disk, and this needs some updated handling. I have a local patch I will make a draft PR with.
  • Some concept of packages, either implicit based on directory structure, eg python, or explicit, eg swift.
    • My current approach is to compile Hylo stdlib + the current active file. So multifile programs does not work in the LSP atm.
    • The LSP needs to know what files to send to the compiler for parsing/analysis.
  • Performance overall, and especially type checking.
    • I know Hylo is very early in development, so it is understandable that the compilation performance is not top priority.
    • The type checking is really expensive right now, which limit usage in LSP.
      • At first I used the TypedProgram for basically all functions in the LSP, semantic tokens, symbols, diagnostics, etc
        • But specifically semantic tokens need to be more responsive than the current type checking performance, and we are talking very small programs here.
        • I have migrated semantic token analysis and document symbol listing to using AST instead of TypedProgram
          • This gives huge speedup, but adds some additional heuristics that I would get "for free" with the TypedProgram.
    • And overall, LSP is very demanding of the compiler, and must be considered from the core to get a top-level experience. Especially regarding incremental parsing/analysis and low latency response. Eg Roslyn C# and swift does really good job with this. But I understand this is not top priority at this stage of the project.

@koliyo
Copy link
Contributor Author

koliyo commented Dec 8, 2023

@kyouko-taiga Have you had a chance to test running the extension again? I would be glad to help get you up and running!

I have just rebuilt the LSP with the recent changes to hylo, so it is up-to-date, LSP version is now v0.6.10.

Also, I have managed to get upstream swift LSP development PR merged in LanguageServerProtocol repository, so it is now much easier to get started with LSP serverside development in swift. See ChimeHQ/LanguageServerProtocol#14

@dabrahams
Copy link
Collaborator

I'd very much like to try using the extension with emacs, since that's my primary development environment.

@kyouko-taiga
Copy link
Contributor

Have you had a chance to test running the extension again? I would be glad to help get you up and running!

Haven't had time to try again yet, sorry for the lack of updates. I will try during the weekend and report here.

@koliyo
Copy link
Contributor Author

koliyo commented Dec 15, 2023

I have zero experience with emacs, and I do not think I will be able to allocate time to setting up the emacs integration unfortunately.

The upside of the LSP architecture is that the majority of code and logic is in the portable LSP server, and the IDE integration is a pretty thin interface. The main additional complexity in the VSCode extension is that it does handle dynamically installing and updating the LSP server. This allows:

  • Download the OS/architecture specific build of the LSP. Alternative would be to distribute all OS binaries in the extension package.
  • Allow updating the LSP server without deploying a new extension version.
  • Keep the extension installation minimal, in terms of size and installation time. The LSP server is dynamically installed when first hylo buffer is opened in the editor.

Hopefully we could find another resource in the hylo community that could help integrating into emacs.

Also, the prototype has most of the functionality I wanted to be able to at least inspect and navigate hylo files more efficiently. There is much to be done in the LSP, but I will probably not be working much on this in the near term. I have a small baby at home and very limited time for sideprojects atm 😅

Also, it is very much a prototype, and may need to be redesigned in some very fundamental ways going forward, eg in terms of concurrency. But hopefully it can be used as a starting point!

@dabrahams
Copy link
Collaborator

I have zero experience with emacs, and I do not think I will be able to allocate time to setting up the emacs integration unfortunately.

Don't worry; I didn't expect you to. Emacs has LSP support built-in.

@kyouko-taiga
Copy link
Contributor

kyouko-taiga commented Jan 2, 2024

Sorry for the late, late reply @koliyo 😔 (and happy new year).

I finally took the time to give another try at your extension, and let me say that is so cool! All features worked, and frankly I think the extension can probably already help writing actual code in its form. So really well done.

Regarding your remarks:

Compiler must have mechanisms for when source code is not on disk, In the LSP we get in-memory representation of source code that is not saved to disk, and this needs some updated handling. I have a local patch I will make a draft PR with.

I saw that PR pass and some exchanges with @dabrahams. It seems to me that the both of you are currently on top of this issue so I won't interfere, but give me a sign if my help is required.

Some concept of packages, either implicit based on directory structure, eg python, or explicit, eg swift.

We have a notion of module and should be iterating more seriously on its design pretty soon. Stay tuned.

In a nutshell, a module is a unit of code distribution, like a library. I think it's larger than what Java would consider a package, which is more like a directory AFAIU. Changes in a single source file may have far reaching consequences in a single module, but less influence on other modules (or none if the modification doesn't touch an exported API).

My current approach is to compile Hylo stdlib + the current active file. So multifile programs does not work in the LSP atm.

The standard library will be a module of its own pretty soon so probably you won't have to compile it anymore. At least you won't have to type check it anymore. I think that you could already implement this strategy in your own driver, but it's likely simpler to just wait that the feature lands upstream.

IR lowering and analysis isn't blazingly fast either. Some parts are certainly parallelizable, but since the IR is mutating it won't necessarily trivial.

Note that there should never be further diagnostics after we've applied mandatory IR passes. In the driver that is after we called lower(program: program, reportingDiagnosticsTo: &diagnostics). So for the purpose of live editing support, you can probably stop your compilation pipeline at this stage.

The LSP needs to know what files to send to the compiler for parsing/analysis.

My hot take is that it should be fine to just send the file being edited, but perhaps I'm too naive. I'll need your help to understand why my assumptions may be wrong.

Performance overall, and especially type checking.

Working on it 😅

At first I used the TypedProgram for basically all functions in the LSP, semantic tokens, symbols, diagnostics, etc. But specifically semantic tokens need to be more responsive than the current type checking performance, and we are talking very small programs here.

I think that is the right strategy for semantic highlighting. Building a typed program will be significantly faster once we land a first implementation of modules because you won't have to type check the standard library again. Most of the time you spend type checking "Hello, World!" today is actually time spent compiling Hylo.Int...

That won't fly in the long run, of course, so we have to start considering incremental compilation. I know almost nothing about this technique so I'll have to learn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants