Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move parser from compiler to rls #118

Closed
matklad opened this Issue Dec 16, 2016 · 8 comments

Comments

Projects
None yet
3 participants
@matklad
Copy link
Member

matklad commented Dec 16, 2016

Hi!

This is another long termish idea. What about bringing parsing from the compiler into RLS?

As far as I understand, parsing in rustc is a complex affair, because the parser deals with macro
expansion, macro resolution and reading files for non-inline modules.

I propose to split this process into two parts:

fn pure_parse(single_file: &str) -> ConcreteSyntaxTree;

fn parse_crate(
    root_module: Path,
    module_loader: &Fn(Path) -> ConcreteSyntaxTree,
    file_loader: &Fn(Path) -> Vec<u8>,
) -> ast::Krate

pure_parse is a pure function for parsing a single file, without parsing dependent module,
macro expanding or include_str! handling.

parse_crate does the same job as the current parser: walks tree of modules, expands macros,
includes stuff, applies some desugaring.

pure_parse then would be independent of the compiler, so it can be used directly by the RLS and
its results might be cached in the VFS (you can't cache results of current parser because of the
dependencies between files).

Why do I think this is beneficial?

Firstly, it would allow to provide some features faster then the compiler would do. Some of such
features are reporting of syntax errors, outline (tree of modules, traits, methods) of a file,
folding regions, structural selection.

Secondly, having syntax tree in the RLS would permit to implement some code modification facilities.
rustfmt can be rerailed to use this tree, refactorings would need the API for code modifications, and
you even can provide some code assists based solely on the syntax (stuff like moving type parameter
bounds to the where clause and back).

Finally, caching the syntax trees on per file bases should allow the compiler to be faster. I understand
that currently parsing is not a bottleneck at all now. However the incremental compilation is based on
the ids derived from the syntax tree of the crate, so parsing may become a bottlneck for incrementality.
It's difficult to incrementalize macro expansion, but "pure single file parsing" should be pretty easy
to add to the VFS.

cc @jseyfried

@bruno-medeiros

This comment has been minimized.

Copy link
Contributor

bruno-medeiros commented Dec 20, 2016

That pure, top-level parsing, isn't that what syntex_syntax does?

@matklad

This comment has been minimized.

Copy link
Member Author

matklad commented Dec 20, 2016

In my understanding syntex_syntax is basically the same parser which the compiler uses, so looks like it does want to read more than one file: https://github.com/serde-rs/syntex/blob/master/syntex_syntax/src/parse/parser.rs#L5291-L5293

@bruno-medeiros

This comment has been minimized.

Copy link
Contributor

bruno-medeiros commented Dec 20, 2016

You can configure with a dummy file loader. I used it in https://github.com/RustDT/Rainicorn some time ago (before LSP) to do just that: report syntax errors, providing a hierarchical outline.

But you are right with regards to macros, it seems it takes that information away (macro definitions for example) - I hadn't noticed until now. But maybe that can be fixed with just some minor API tweaking?

@matklad

This comment has been minimized.

Copy link
Member Author

matklad commented Dec 20, 2016

But maybe that can be fixed with just some minor API tweaking?

Don't forget that syntex is literally the parser from the rustc, so any API tweaking is actually the modification of the compiler.

And I want to do exactly this: to tease apart compiler specific parts (module loading, error reporting, macro handling, interner) and a pure function for parsing, to wrap the latter one as a library and to make the compiler configurable not only with the FileLoader, but with the AstLoader as well.

@bruno-medeiros

This comment has been minimized.

Copy link
Contributor

bruno-medeiros commented Dec 20, 2016

But you are right with regards to macros, it seems it takes that information away (macro definitions for example) - I hadn't noticed until now. But maybe that can be fixed with just some minor API tweaking?

Actually, upon further inspection, this is incorrect.

syntex_syntax is not evaluating macros. I got confused trying out a macro_rules! source under syntex_syntax because I was getting a macro invocation node. Specifically, you get a Mac node: https://github.com/serde-rs/syntex/blob/master/syntex_syntax/src/ast.rs#L1061 , but as the documentation of _Mac states, "the additional ident for a macro_rules-style macro is actually stored in the enclosing item. Oog."

I didn't know macro_rules! is actually a macro invocation, and not a special macro definition syntax 😲 (I need to finish reading the macros chapters of The Book... )

@bruno-medeiros

This comment has been minimized.

Copy link
Contributor

bruno-medeiros commented Dec 20, 2016

@matklad So it seems syntex_syntax is actually closer to what to you want than it seemed at first.

Don't forget that syntex is literally the parser from the rustc, so any API tweaking is actually the modification of the compiler.

Yeah, I know, but I suspect it would be a bad a idea to try to have two parser code bases, one for compiler, another for IDEs. The modifications required for IDE usage should be minimal, and should be achievable without compromise in performance for both compiler and IDE usage.

@nrc should know the answer to this better, since him and the rest of the Rust team are familiar with that code base. But if the end goal is to have the compiler semantic analysis code be used by IDE tools (RLS), then surely the parser is a good (and 10x much simpler) place to start.

@matklad

This comment has been minimized.

Copy link
Member Author

matklad commented Dec 20, 2016

Yeah, I know, but I suspect it would be a bad a idea to try to have two parser code bases, one for compiler, another for IDEs.

For sure! I must have not been clear enough, I think that rls and rustc should share the same parser code (which is kinda possible with today's syntex_syntax) and the same AST data, at least partially (rls would want raw data which faithfully represents the source code, the compiler might want to process it into a more abstract form (interned identifiers, expanded macros, desugaring)).

@nrc

This comment has been minimized.

Copy link
Member

nrc commented Oct 30, 2017

This is more of a rustc issue than an RLS issue, so closing. I think it would be nice to move libsyntax into a crate, but that has proved annoyingly difficult in the past :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.