Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust Language Server (IDE support) #1317

Merged
merged 6 commits into from Feb 11, 2016
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
302 changes: 302 additions & 0 deletions text/0000-ide.md
@@ -0,0 +1,302 @@
- Feature Name: n/a
- Start Date: 2015-10-13
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary

This RFC describes the Rust Language Server (RLS). This is a program designed to
service IDEs and other tools. It offers a new access point to compilation and
APIs for getting information about a program. The RLS can be thought of as an
alternate compiler, but internally will use the existing compiler.

Using the RLS offers very low latency compilation. This allows for an IDE to
present information based on compilation to the user as quickly as possible.


## Requirements

To be concrete about the requirements for the RLS, it should enable the
following actions:

* show compilation errors and warnings, updated as the user types,
* code completion as the user types,
* highlight all references to an item,
* find all references to an item,
* jump to definition.

These requirements will be covered in more detail in later sections.


## History note

This RFC started as a more wide-ranging RFC. Some of the details have been
scaled back to allow for more focused and incremental development.

Parts of the RFC dealing with robust compilation have been removed - work here
is ongoing and mostly doesn't require an RFC.

The RLS was earlier referred to as the oracle.


# Motivation

Modern IDEs are large and complex pieces of software; creating a new one from
scratch for Rust would be impractical. Therefore we need to work with existing
IDEs (such as Eclipse, IntelliJ, and Visual Studio) to provide functionality.
These IDEs provide excellent editor and project management support out of the
box, but know nothing about the Rust language. This information must come from
the compiler.

An important aspect of IDE support is that response times must be extremely
quick. Users expect some feedback as they type. Running normal compilation of an
entire project is far too slow. Furthermore, as the user is typing, the program
will not be a valid, complete Rust program.

We expect that an IDE may have its own lexer and parser. This is necessary for
the IDE to quickly give parse errors as the user types. Editors are free to rely
on the compiler's parsing if they prefer (the compiler will do its own parsing
in any case). Further information (name resolution, type information, etc.) will
be provided by the RLS.

## Requirements

We stated some requirements in the summary, here we'll cover more detail and the
workflow between IDE and RLS.

The RLS should be safe to use in the face of concurrent actions. For example,
multiple requests for compilation could occur, with later requests occurring
before earlier requests have finished. There could be multiple clients making
requests to the RLS, some of which may mutate its data. The RLS should provide
reliable and consistent responses. However, it is not expected that clients are
totally isolated, e.g., if client 1 updates the program, then client 2 requests
information about the program, client 2's response will reflect the changes made
by client 1, even if these are not otherwise known to client 2.


### Show compilation errors and warnings, updated as the user types

The IDE will request compilation of the in-memory program. The RLS will compile
the program and asynchronously supply the IDE with errors and warnings.

### Code completion as the user types

The IDE will request compilation of the in-memory program and request code-
completion options for the cursor position. The RLS will compile the program. As
soon as it has enough information for code-completion it will return options to
the IDE.

* The RLS should return code-completion options asynchronously to the IDE.
Alternatively, the RLS could block the IDE's request for options.
* The RLS should not filter the code-completion options. For example, if the
user types `foo.ba` where `foo` has available fields `bar` and `qux`, it
should return both these fields, not just `bar`. The IDE can perform it's own
filtering since it might want to perform spell checking, etc. Put another way,
the RLS is not a code completion tool, but supplies the low-level data that a
code completion tool uses to provide suggestions.

### Highlight all references to an item

The IDE requests all references in the same file based on a position in the
file. The RLS returns a list of spans.

### Find all references to an item

The IDE requests all references based on a position in the file. The RLS returns
a list of spans.

### Jump to definition

The IDE requests the definition of an item based on a position in a file. The RLS
returns a list of spans (a list is necessary since, for example, a dynamically
dispatched trait method could be defined in multiple places).


# Detailed design

## Architecture

The basic requirements for the architecture of the RLS are that it should be:

* reusable by different clients (IDEs, tools, ...),
* fast (we must provide semantic information about a program as the user types),
* handle multi-crate programs,
* consistent (it should handle multiple, potentially mutating, concurrent requests).

The RLS will be a long running daemon process. Communication between the RLS and
an IDE will be via IPC calls (tools (for example, Racer) will also be able to
use the RLS as an in-process library.). The RLS will include the compiler as a
library.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is not a detailed API, but I dont see how this works when there are multiple spans to invalidate.Do all spans must be computed based on the initial content of the file? It sounds like it could be a bit painful to compute it from the plugin side. Maybe a simpler API could live alongside this one, that takes the full content of the file and that lets the oracle compute the differences based on its current state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spans are relative to the last update passed to the oracle. I don't think that should be hard to compute - the plugin just has to keep track of any text deleted or edited since the last update call.

The trouble with diff'ing before and after snapshots is that it is hard to do well, and so we'd end up making mistakes or overestimating the invalidated region. I imagine it is not super-cheap either.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. I think you are right and the proposed design is best.

The RLS has three main components - the compiler, a database, and a work queue.

The RLS accepts two kinds of requests - compilation requests and queries. It
will also push data to registered programs (generally triggered by compilation
completing). Essentially, all communication with the RLS is asynchronous (when
used as an in-process library, the client will be able to use synchronous
function calls too).

The work queue is used to sequentialise requests and ensure consistency of
responses. Both compilation requests and queries are stored in the queue. Some
compilation requests can cause earlier compilation requests to be canceled.
Queries blocked on the earlier compilation then become blocked on the new
request.

In the future, we should move queries ahead of compilation requests where
possible.

When compilation completes, the database is updated (see below for more
details). All queries are answered from the database. The database has data for
the whole project, not just one crate. This also means we don't need to keep the
compiler's data in memory.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the input a span here? I would expect an offset. Would I have to find the span of the entire word I want the definition of, or is a partial span (or even an empty span) a valid input? (Note that the same remark applies to all the subsequent API functions)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that the IDE has a tokeniser and therefore already knows the span of the identifier (even simple editors usually have a tokeniser in order to implement syntax highlighting). On the other hand, I suppose that from the oracle's point of view it is as easy to identify the identifier by a single position as it is a span, so it might be as well to take a single position.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hand written tokenizer on the plugin side will certainly not be as accurate as the oracle's one. I'd rather delegate to the oracle as much as possible.


## Compilation

The RLS is somewhat parametric in its compilation model. Theoretically, it could
run a full compile on the requested crate, however this would be too slow in
practice.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output should also tell the "kind" of each reference found. For example, if we want to find the references of a function declared in a trait, the output could be either a "method call" kind, or a "definition in a trait impl" kind, or a "function as a value" kind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "reference data" includes the kind of definition, see lines 299, 300

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term used in the the previous section was "definition data". It is not obvious that "reference data" designates the "kind" of references.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try and polish the text here.

The general procedure is that the IDE (or other client) requests that the RLS
compile a crate. It is up to the IDE to interact with Cargo (or some other
build system) in order to produce the correct build command and to ensure that
any dependencies are built.

Initially, the RLS will do a standard incremental compile on the specified
crate. See [RFC PR 1298](https://github.com/rust-lang/rfcs/pull/1298) for more
details on incremental compilation.

The crate being compiled should include any modifications made in the client and
not yet committed to a file (e.g., changes the IDE has in memory). The client
should pass such changes to the RLS along with the compilation request.

I see two ways to improve compilation times: lazy compilation and keeping the
compiler in memory. We might also experiment with having the IDE specify which
parts of the program have changed, rather than having the compiler compute this.

### Lazy compilation

With lazy compilation the IDE requests that a specific item is compiled, rather
than the whole program. The compiler compiles this function compiling other
items only as necessary to compile the requested item.

Lazy compilation should also be incremental - an item is only compiled if
required *and* if it has changed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The search for identifier should also take as an input the scope of the search. QtCreator's Locator lets the user search for a symbol in the current opened file, and It's one of the Locator function I use the most.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, I'll add it.

Note that the IDE is expected to present an interface over this functionality, and plain text search should be done entirely in the IDE, the only search the oracle helps with is finding uses of a particular identifier (i.e., semantic search). Defining a scope to search over would still be useful though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was indeed talking about symbols, not full text searches. The Locator will give top level items only.

Obviously, we could miss some errors with pure lazy compilation. To address this
the RLS schedules both a lazy and a full (but still incremental) compilation.
The advantage of this approach is that many queries scheduled after compilation
can be performed after the lazy compilation, but before the full compilation.

### Keeping the compiler in memory

There are still overheads with the incremental compilation approach. We must
startup the compiler initialising its data structures, we must parse the whole
crate, and we must read the incremental compilation data and metadata from disk.

If we can keep the compiler in memory, we avoid these costs.

However, this would require some significant refactoring of the compiler. There
is currently no way to invalidate data the compiler has already computed. It
also becomes difficult to cancel compilation: if we receive two compile requests
in rapid succession, we may wish to cancel the first compilation before it
finishes, since it will be wasted work. This is currently easy - the compilation
process is killed and all data released. However, if we want to keep the
compiler in memory we must invalidate some data and ensure the compiler is in a
consistent state.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add another function to the API: a way to search among the documentation. QtCreator's Locator can search in the available documentation, and display the html doc of the matches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a text search of the docs, or does it look up documentation for a particular function (or whatever)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the specifics. With a Qt project, this Locator feature matches either a symbol name, or a word that appears inside a title of the generated html doc (they are structured docs). This is clearly not a critical feature, it may not belong to an initial RFC.


### Compilation output

Once compilation is finished, the RLS's database must be updated. Errors and
warnings produced by the compiler are stored in the database. Information from
name resolution and type checking is stored in the database (exactly which
information will grow with time). The analysis information will be provided by
the save-analysis API.

The compiler will also provide data on which (old) code has been invalidated.
Any information (including errors) in the database concerning this code is
removed before the new data is inserted.


### Multiple crates

The RLS does not track dependencies, nor much crate information. However, it
will be asked to compile many crates and it will keep track of which crate data
belongs to. It will also keep track of which crates belong to a single program
and will not share data between programs, even if the same crate is shared. This
helps avoid versioning issues.


## Versioning

The RLS will be released using the same train model as Rust. A version of the
RLS is pinned to a specific version of Rust. If users want to operate with
multiple versions, they will need multiple versions of the RLS (I hope we can
extend multirust/rustup.rs to handle the RLS as well as Rust).


# Drawbacks

It's a lot of work. But better we do it once than each IDE doing it themselves,
or having sub-standard IDE support.


# Alternatives

The big design choice here is using a database rather than the compiler's data
structures. The primary motivation for this is the 'find all references'
requirement. References could be in multiple crates, so we would need to reload
incremental compilation data (which must include the serialised MIR, or
something equivalent) for all crates, then search this data for matching
identifiers. Assuming the serialisation format is not too complex, this should
be possible in a reasonable amount of time. Since identifiers might be in
function bodies, we can't rely on metadata.

This is a reasonable alternative, and may be simpler than the database approach.
However, it is not planned to output this data in the near future (the initial
plan for incremental compilation is to not store information required to re-
check function bodies). This approach might be too slow for very large projects,
we might wish to do searches in the future that cannot be answered without doing
the equivalent of a database join, and the database simplifies questions about
concurrent accesses.

We could only provide the RLS as a library, rather than providing an API via
IPC. An IPC interface allows a single instance of the RLS to service multiple
programs, is language-agnostic, and allows for easy asynchronous-ness between
the RLS and its clients. It also provides isolation - a panic in the RLS will
not cause the IDE to crash, not can a long-running operation delay the IDE. Most
of these advantages could be captured using threads. However, the cost of
implementing an IPC interface is fairly low and means less effort for clients,
so it seems worthwhile to provide.

Extending this idea, we could do less than the RLS - provide a high-level
library API for the Rust compiler and let other projects do the rest. In
particular, Racer does an excellent job at providing the information the RLS
would provide without much information from the compiler. This is certainly less
work for the compiler team and more flexible for clients. On the other hand, it
means more work for clients and possible fragmentation. Duplicated effort means
that different clients will not benefit from each other's innovations.

The RLS could do more - actually perform some of the processing tasks usually
done by IDEs (such as editing source code) or other tools (refactoring,
reformating, etc.).


# Unresolved questions

A problem is that Visual Studio uses UTF16 while Rust uses UTF8, there is (I
understand) no efficient way to convert between byte counts in these systems.
I'm not sure how to address this. It might require the RLS to be able to operate
in UTF16 mode. This is only a problem with byte offsets in spans, not with
row/column data (the RLS will supply both). It may be possible for Visual Studio
to just use the row/column data, or convert inefficiently to UTF16. I guess the
question comes down to should this conversion be done in the RLS or the client.
I think we should start assuming the client, and perhaps adjust course later.

What kind of IPC protocol to use? HTTP is popular and simple to deal with. It's
platform-independent and used in many similar pieces of software. On the other
hand it is heavyweight and requires pulling in large libraries, and requires
some attention to security issues. Alternatives are some kind of custom
prototcol, or using a solution like Thrift. My prefernce is for HTTP, since it
has been proven in similar situations.