New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust Language Server (IDE support) #1317

Merged
merged 6 commits into from Feb 11, 2016

Conversation

Projects
None yet
@nrc
Member

nrc commented Oct 13, 2015

This RFC describes how we intend to modify the compiler to support IDEs. The
intention is that support will be as generic as possible. A follow-up internals
post will describe how we intend to focus our energies and deploy Rust support
in actual IDEs.

There are two sets of technical changes proposed in this RFC: changes to how we
compile, and the creation of an 'oracle' tool (name of tool TBC).

Thanks to Phil Dawes, Bruno Medeiros, Vosen, eddyb, Evgeny Kurbatsky, and Dmitry Jemerov for early feedback.

@nrc nrc self-assigned this Oct 13, 2015

@killercup

This comment has been minimized.

Show comment
Hide comment
@killercup
Member

killercup commented Oct 13, 2015

Show outdated Hide outdated text/0000-ide.md
A solution to the first problem is replacing invalid names with some magic
identifier, and ignoring errors involving that identifier. @sanxiyn implemented
something like the second feature in a [PR](https://github.com/rust-

This comment has been minimized.

@killercup

killercup Oct 13, 2015

Member

Line break → link break

@killercup

killercup Oct 13, 2015

Member

Line break → link break

@liigo

This comment has been minimized.

Show comment
Hide comment
@liigo

liigo Oct 14, 2015

Contributor

'oracle' is not a perfect name here, since when you google 'rustlang oracle', Google don't know what you intend to search, a compiler tool or a database api?

Contributor

liigo commented Oct 14, 2015

'oracle' is not a perfect name here, since when you google 'rustlang oracle', Google don't know what you intend to search, a compiler tool or a database api?

Show outdated Hide outdated text/0000-ide.md
proposal](https://github.com/rust-lang/rfcs/pull/1298) for supporting
incremental compilation involves some lazy compilation as an implementation
detail.

This comment has been minimized.

@daniel-vainsencher

daniel-vainsencher Oct 14, 2015

Seems like "incremental" here is use to describe push-driven in contrast to the lazy pull-driven compilation. Which is more efficient depends strongly on the use case, hence supporting both and combinations is probably useful. If the dependencies, changes (the pushes) and interests (the pulls) are managed explicitly, combining the strategies should be feasible. I would use the word "incremental" instead to describe all of these partial compilations.

@daniel-vainsencher

daniel-vainsencher Oct 14, 2015

Seems like "incremental" here is use to describe push-driven in contrast to the lazy pull-driven compilation. Which is more efficient depends strongly on the use case, hence supporting both and combinations is probably useful. If the dependencies, changes (the pushes) and interests (the pulls) are managed explicitly, combining the strategies should be feasible. I would use the word "incremental" instead to describe all of these partial compilations.

This comment has been minimized.

@nrc

nrc Oct 15, 2015

Member

The two are orthogonal - you can have lazy incremental, eager incremental, lazy non-incremental, and eager non-incremental.

@nrc

nrc Oct 15, 2015

Member

The two are orthogonal - you can have lazy incremental, eager incremental, lazy non-incremental, and eager non-incremental.

This comment has been minimized.

@daniel-vainsencher

daniel-vainsencher Oct 15, 2015

I'm saying that the reference to incremental in line 101 seems to define it as eager, conflicting the orthogonality set up in lines 110-115. Or maybe that's just the way it reads to me.

@daniel-vainsencher

daniel-vainsencher Oct 15, 2015

I'm saying that the reference to incremental in line 101 seems to define it as eager, conflicting the orthogonality set up in lines 110-115. Or maybe that's just the way it reads to me.

This comment has been minimized.

@Ericson2314

Ericson2314 Oct 15, 2015

Contributor

@daniel-vainsencher I think the intended distinction is that "incremental" describes what is done (only what hasn't been done before), while "lazy" describes the algorithm of figuring what to do (start at end goal and work through dependency dag). Laziness without caching is non-incremental, and the "push-driven" approach you mention is a different algorithm for incremental compilation than the lazy one. Does that clarify the orthogonality?

@Ericson2314

Ericson2314 Oct 15, 2015

Contributor

@daniel-vainsencher I think the intended distinction is that "incremental" describes what is done (only what hasn't been done before), while "lazy" describes the algorithm of figuring what to do (start at end goal and work through dependency dag). Laziness without caching is non-incremental, and the "push-driven" approach you mention is a different algorithm for incremental compilation than the lazy one. Does that clarify the orthogonality?

Show outdated Hide outdated text/0000-ide.md
incrementally update its knowledge of the source code. How exactly to do this
when neither names nor ids are stable is an interesting question, but too much
detail for this RFC (especially as the implementation of ids in the compiler is
evolving).

This comment has been minimized.

@eddyb

eddyb Oct 14, 2015

Member

Why exactly can't the oracle drive rustc directly?
Keeping compiler sessions in memory seems like the most efficient and accurate method at our disposal.

@eddyb

eddyb Oct 14, 2015

Member

Why exactly can't the oracle drive rustc directly?
Keeping compiler sessions in memory seems like the most efficient and accurate method at our disposal.

This comment has been minimized.

@nrc

nrc Oct 15, 2015

Member

It could, I list this as an alternative in the alternatives section (I assume you mean the quick-check version of rustc). I think it may be a better approach. The only real downside is that the oracle then has to know about dependencies between crates.

In terms of when to do a full build of a crate, I think you want to avoid this as much as possible, so the IDE is the best driver to choose when it is necessary.

@nrc

nrc Oct 15, 2015

Member

It could, I list this as an alternative in the alternatives section (I assume you mean the quick-check version of rustc). I think it may be a better approach. The only real downside is that the oracle then has to know about dependencies between crates.

In terms of when to do a full build of a crate, I think you want to avoid this as much as possible, so the IDE is the best driver to choose when it is necessary.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Oct 14, 2015

Member

I forgot to comment this on the pre-RFC thread (sorry @nrc), but I prefer rider as the name of the oracle tool, and keeping racer (with @phildawes' permission, of course) as the one-shot completion (and perhaps quickcheck) tool.

The way I see it, racer "races" to give an useful result each time, whereas rider "rides" along an IDE, for as long as there are Rust projects to be handled by it, and can have a slow start without impacting UX.

Member

eddyb commented Oct 14, 2015

I forgot to comment this on the pre-RFC thread (sorry @nrc), but I prefer rider as the name of the oracle tool, and keeping racer (with @phildawes' permission, of course) as the one-shot completion (and perhaps quickcheck) tool.

The way I see it, racer "races" to give an useful result each time, whereas rider "rides" along an IDE, for as long as there are Rust projects to be handled by it, and can have a slow start without impacting UX.

@kud1ing

This comment has been minimized.

Show comment
Hide comment
@kud1ing

kud1ing Oct 14, 2015

Contributor

I don't have a name suggestion (yet), but i expect calling it "Oracle" will lead to confusion and legal trouble.

Contributor

kud1ing commented Oct 14, 2015

I don't have a name suggestion (yet), but i expect calling it "Oracle" will lead to confusion and legal trouble.

@killercup

This comment has been minimized.

Show comment
Hide comment
@killercup

killercup Oct 14, 2015

Member

(I like how this RFC concentrates on the name first 😉)

How about rustcage? Either pronounced rust cage or rust sage.

Member

killercup commented Oct 14, 2015

(I like how this RFC concentrates on the name first 😉)

How about rustcage? Either pronounced rust cage or rust sage.

@phildawes

This comment has been minimized.

Show comment
Hide comment
@phildawes

phildawes Oct 14, 2015

Personally I'm not entirely sold on the oracle/rider concept yet. I think the fundamental requirement is that we need a stable interface to rustc to support IDEs and plugins. I'm not yet sure whether a long running database process is required/desirable.

Some things that might cause problems:

  • The oracle concept implies a single view of the sourcecode, and I think this jibes a bit with the tools that will perform refactoring, reformatting and suggestions. I suspect they will want interactive access to the compiler (compile this pre-processed snippet, what errors occur if I do this?)
  • If the oracle were to support completion plugins directly from its database, it would need to have very low latency update turnaround (i.e. on every keypress, in <100ms). I'm not sure how well this would work with it being separate from rustc and not driven by the plugins.

It may be that we have an oracle in addition to a tools-oriented interface to rustc.

(aside: I noticed that the go oracle isn't used by the gocode completion tool, but I don't know the reason for this, it could just be historical)

phildawes commented Oct 14, 2015

Personally I'm not entirely sold on the oracle/rider concept yet. I think the fundamental requirement is that we need a stable interface to rustc to support IDEs and plugins. I'm not yet sure whether a long running database process is required/desirable.

Some things that might cause problems:

  • The oracle concept implies a single view of the sourcecode, and I think this jibes a bit with the tools that will perform refactoring, reformatting and suggestions. I suspect they will want interactive access to the compiler (compile this pre-processed snippet, what errors occur if I do this?)
  • If the oracle were to support completion plugins directly from its database, it would need to have very low latency update turnaround (i.e. on every keypress, in <100ms). I'm not sure how well this would work with it being separate from rustc and not driven by the plugins.

It may be that we have an oracle in addition to a tools-oriented interface to rustc.

(aside: I noticed that the go oracle isn't used by the gocode completion tool, but I don't know the reason for this, it could just be historical)

Show outdated Hide outdated text/0000-ide.md
The returned data is a list of 'defintion' data. That data includes the span for
the item, any documentation for the item, a code snippet for the item,
optionally a type for the item, and one or more kinds of definition (e.g.,
'variable definition', 'field definition', 'function declaration').

This comment has been minimized.

@daniel-vainsencher

daniel-vainsencher Oct 14, 2015

This style of API, where the IDE queries for what it needs right now, is hard for doing push-driven updates (based on a file changing and being saved), because who knows what exactly the IDE is interested in?

An alternative is to allow IDEs to register interest in some queries (for a typical heavy IDE "all definitions in this project", but in Racer, only a fast changing "whatever is under this cursor location"), and then:

  • Use the registrations to notify IDE only of interesting changes.
  • Use positive registrations to know someone cares about this at all (even before they ask for it).
@daniel-vainsencher

daniel-vainsencher Oct 14, 2015

This style of API, where the IDE queries for what it needs right now, is hard for doing push-driven updates (based on a file changing and being saved), because who knows what exactly the IDE is interested in?

An alternative is to allow IDEs to register interest in some queries (for a typical heavy IDE "all definitions in this project", but in Racer, only a fast changing "whatever is under this cursor location"), and then:

  • Use the registrations to notify IDE only of interesting changes.
  • Use positive registrations to know someone cares about this at all (even before they ask for it).

This comment has been minimized.

@nrc

nrc Oct 15, 2015

Member

Could you explain the push-driven update a bit more please? I am assuming the IDE plugin is pretty much state-less with respect to the oracle's data and whenever the user performs an action (right click, hover, hotkey, etc.) then it will query the oracle for the data it needs (thus why the oracle must be quick).

@nrc

nrc Oct 15, 2015

Member

Could you explain the push-driven update a bit more please? I am assuming the IDE plugin is pretty much state-less with respect to the oracle's data and whenever the user performs an action (right click, hover, hotkey, etc.) then it will query the oracle for the data it needs (thus why the oracle must be quick).

This comment has been minimized.

@daniel-vainsencher

daniel-vainsencher Oct 15, 2015

Two things can drive the oracle to do work:

  • Code changed (two major usecases: a drip as in editing or massive as in git pull/automatic refactoring/cargo update). So IIUC, what you've called eager evaluation would be to react to the change in code immediately, I called this push driven. This can easily be a waste of time when you are recomputing something that nobody cares about. However, if in the API the IDE said "I care until X,Y,Z and any updates on them until further notice" then you can be correctly selective.
  • IDE asked for something (say, a definition) and not everything has been precomputed in advance. Triggers some lazy (I called this pull driven) computation. This is never a waste of time, but may have unreasonable latency (whoops, I should have d/led those new versions of two crates before, huh?) and might also lose opportunities for doing things in parallel.
@daniel-vainsencher

daniel-vainsencher Oct 15, 2015

Two things can drive the oracle to do work:

  • Code changed (two major usecases: a drip as in editing or massive as in git pull/automatic refactoring/cargo update). So IIUC, what you've called eager evaluation would be to react to the change in code immediately, I called this push driven. This can easily be a waste of time when you are recomputing something that nobody cares about. However, if in the API the IDE said "I care until X,Y,Z and any updates on them until further notice" then you can be correctly selective.
  • IDE asked for something (say, a definition) and not everything has been precomputed in advance. Triggers some lazy (I called this pull driven) computation. This is never a waste of time, but may have unreasonable latency (whoops, I should have d/led those new versions of two crates before, huh?) and might also lose opportunities for doing things in parallel.

This comment has been minimized.

@nrc

nrc Oct 16, 2015

Member

I think that for the push-based changes, the IDE calls the update functions (for the big changes, that is why we need to invalidate whole files or directories). But I don't think the result of any of that has to be communicated back to the IDE (other than error messages, maybe) - the IDE won't keep any state about the program - that is all kept in the oracle, which will be updated by the compiler (or the two are integrated together).

You should of the oracle as part of the IDE really, the IDE shouldn't manage state of its own about the program.

Does that sound right? Or am I missing something about the work flow?

@nrc

nrc Oct 16, 2015

Member

I think that for the push-based changes, the IDE calls the update functions (for the big changes, that is why we need to invalidate whole files or directories). But I don't think the result of any of that has to be communicated back to the IDE (other than error messages, maybe) - the IDE won't keep any state about the program - that is all kept in the oracle, which will be updated by the compiler (or the two are integrated together).

You should of the oracle as part of the IDE really, the IDE shouldn't manage state of its own about the program.

Does that sound right? Or am I missing something about the work flow?

Show outdated Hide outdated text/0000-ide.md
covered by the span. Can return an error if the span does not cover exactly one
identifier or the oracle has no data for an identifier.
The returned data is a list of 'defintion' data. That data includes the span for

This comment has been minimized.

@daniel-vainsencher

daniel-vainsencher Oct 14, 2015

defintion -> definition

@daniel-vainsencher

daniel-vainsencher Oct 14, 2015

defintion -> definition

@bungcip

This comment has been minimized.

Show comment
Hide comment
@bungcip

bungcip commented Oct 14, 2015

Found some paper related with IDE integration:
http://wasdett.org/2013/submissions/wasdett2013_submission_10.pdf

@daniel-vainsencher

This comment has been minimized.

Show comment
Hide comment
@daniel-vainsencher

daniel-vainsencher Oct 14, 2015

Apologize for linking my own paper. [1] is a (proto-) pattern language describing methods to support multiple analyses over a changing source base (some from Smalltalk designs, some also used then in Eclipse). Basically, something like the oracle discussed here. One major decision point is: what objects from the analysis persist when the program text is not completely valid? for example, adding a "{" early in a file can be seen to invalidate every scope after it, do we then not use those function definitions in auto complete? Smalltalk solves this by giving most definitions (classes, methods) a persistent identity over time. The text edited only corresponds to a single definition, and updates the canonical version only when saved (and syntactically valid). Rust currently seems to encourage large files, which makes this more difficult, though not necessarily impossible. For example, if we have both a recent valid version of the source and the changed span, we can use the valid old definitions except where valid current versions are available.

[1] http://hillside.net/plop/2006/Papers/ACMConferenceProceedings/Intimacy_Gradient/a15-vainsencher.pdf

daniel-vainsencher commented Oct 14, 2015

Apologize for linking my own paper. [1] is a (proto-) pattern language describing methods to support multiple analyses over a changing source base (some from Smalltalk designs, some also used then in Eclipse). Basically, something like the oracle discussed here. One major decision point is: what objects from the analysis persist when the program text is not completely valid? for example, adding a "{" early in a file can be seen to invalidate every scope after it, do we then not use those function definitions in auto complete? Smalltalk solves this by giving most definitions (classes, methods) a persistent identity over time. The text edited only corresponds to a single definition, and updates the canonical version only when saved (and syntactically valid). Rust currently seems to encourage large files, which makes this more difficult, though not necessarily impossible. For example, if we have both a recent valid version of the source and the changed span, we can use the valid old definitions except where valid current versions are available.

[1] http://hillside.net/plop/2006/Papers/ACMConferenceProceedings/Intimacy_Gradient/a15-vainsencher.pdf

Show outdated Hide outdated text/0000-ide.md
The oracle is a long running daemon process. It will keep a database
representation of an entire project's source code and semantic information (as
opposed to the compiler which operates on a crate at a time). It is
incrementally updated by the compiler and provides an IPC API for providing

This comment has been minimized.

@olivren

olivren Oct 14, 2015

Preliminary notes: I'm commenting on this RFC because I started writing a Rust plugin for the QtCreator IDE. This plugin is in a very preliminary state and it does nothing useful for the moment. I don't know much about compilers in general, and I'm discovering the QtCreator API as I write my plugin. So, be suspicious of anything I could say!

What is the rationale for proposing a daemon process + an IPC API? I fail to see an advantage compared to the oracle just being a library with both a Rust and a C API, that I could call directly from my IDE plugin code. It would remove the pain of writing communication code. it would also allow me to instanciate two services at the same time with isolated content.

The only pros I can see for the daemon approach are:

  1. Share information between multiple IDE instances.
    It seems like a rare use case, and I doubt there are much data to share.
  2. Be easier to integrate in languages with awful FFI capabilities.
    The IPC API could be designed on top of an existing library, if the need is real.

When I investigated to integrate Racer to my IDE plugin, I came to the conclusion that it would be easier to use Racer as a library rather than invoking it as a process and parsing the output.

@olivren

olivren Oct 14, 2015

Preliminary notes: I'm commenting on this RFC because I started writing a Rust plugin for the QtCreator IDE. This plugin is in a very preliminary state and it does nothing useful for the moment. I don't know much about compilers in general, and I'm discovering the QtCreator API as I write my plugin. So, be suspicious of anything I could say!

What is the rationale for proposing a daemon process + an IPC API? I fail to see an advantage compared to the oracle just being a library with both a Rust and a C API, that I could call directly from my IDE plugin code. It would remove the pain of writing communication code. it would also allow me to instanciate two services at the same time with isolated content.

The only pros I can see for the daemon approach are:

  1. Share information between multiple IDE instances.
    It seems like a rare use case, and I doubt there are much data to share.
  2. Be easier to integrate in languages with awful FFI capabilities.
    The IPC API could be designed on top of an existing library, if the need is real.

When I investigated to integrate Racer to my IDE plugin, I came to the conclusion that it would be easier to use Racer as a library rather than invoking it as a process and parsing the output.

This comment has been minimized.

@eddyb

eddyb Oct 15, 2015

Member

Having an API means all sorts of stabilization concerns, and also passing structured data through a C API.

@eddyb

eddyb Oct 15, 2015

Member

Having an API means all sorts of stabilization concerns, and also passing structured data through a C API.

This comment has been minimized.

@phildawes

phildawes Oct 15, 2015

Would we not have to apply the same stablization concerns to the IPC interface?
(genuine question, what's the difference between a rust api and an ipc api wrt stabilization?)

@phildawes

phildawes Oct 15, 2015

Would we not have to apply the same stablization concerns to the IPC interface?
(genuine question, what's the difference between a rust api and an ipc api wrt stabilization?)

This comment has been minimized.

@nrc

nrc Oct 15, 2015

Member

The major reason is parallelism - we want the IDE to carry on doing its thing whilst the compiler (coordinated by the oracle) compiles the non-urgent parts of changed source. Then the oracle needs to update its database with the results. Although the IDE could handle the threading to make all this work, it is simpler (and more reusable) just to do it all in a separate process.

With a decent IPC library, having an IPC API is not a lot more complex than having an FFI API. And in turns of implementation having a totally separate process is marginally easier.

In terms of stability, I don't think there is much difference between the two. An IPC API might be a little more robust because using an intermediate data structure and having parsing/serialisation adds a little abstraction.

@nrc

nrc Oct 15, 2015

Member

The major reason is parallelism - we want the IDE to carry on doing its thing whilst the compiler (coordinated by the oracle) compiles the non-urgent parts of changed source. Then the oracle needs to update its database with the results. Although the IDE could handle the threading to make all this work, it is simpler (and more reusable) just to do it all in a separate process.

With a decent IPC library, having an IPC API is not a lot more complex than having an FFI API. And in turns of implementation having a totally separate process is marginally easier.

In terms of stability, I don't think there is much difference between the two. An IPC API might be a little more robust because using an intermediate data structure and having parsing/serialisation adds a little abstraction.

This comment has been minimized.

@olivren

olivren Oct 15, 2015

At least for QtCreator, the extension API is already designed with asynchrony in mind. Having to deal with asynchronous IO would be a lot more painful for me than having a simple synchronous API. In fact, if you give me an IPC API, the first thing I'll do will be to wrap IO and parsing into a simple synchronous API. In the end there will just be an unnecessary
indirection layer.

The oracle library will of course deal with its own update work in a dedicated thread, but you will have to have such a thread anyway, with a process-based design.

@olivren

olivren Oct 15, 2015

At least for QtCreator, the extension API is already designed with asynchrony in mind. Having to deal with asynchronous IO would be a lot more painful for me than having a simple synchronous API. In fact, if you give me an IPC API, the first thing I'll do will be to wrap IO and parsing into a simple synchronous API. In the end there will just be an unnecessary
indirection layer.

The oracle library will of course deal with its own update work in a dedicated thread, but you will have to have such a thread anyway, with a process-based design.

This comment has been minimized.

@nrc

nrc Oct 16, 2015

Member

Hmm, interesting, that suggests implementing the oracle as a library might be a better solution than as a separate process. I'm not 100% convinced, but it seems like a reasonable alternative to consider. Let me have a think about it...

@nrc

nrc Oct 16, 2015

Member

Hmm, interesting, that suggests implementing the oracle as a library might be a better solution than as a separate process. I'm not 100% convinced, but it seems like a reasonable alternative to consider. Let me have a think about it...

This comment has been minimized.

@nrc

nrc Oct 16, 2015

Member

Does anyone know how this would work out in the Java world? How does a JNI/FFI interface compare to an IPC interface?

@nrc

nrc Oct 16, 2015

Member

Does anyone know how this would work out in the Java world? How does a JNI/FFI interface compare to an IPC interface?

This comment has been minimized.

@eddyb

eddyb Oct 16, 2015

Member

I guess this got lost in the bikeshed: the reason I independently came up with the IPC idea is that compartimentalization and serialization is unavoidable.

I have implemented a system where rustc threads are spawned in the background, and there is a duplex request/response channel to control the thread and get data from it.

The messages are owned ADTs, which means rustc-specific data (e.g. an interned string) has to be converted to a general representation (String).

To avoid tying the in-memory format to Rust or C-based representations, an actual serialization format can be used.
Cap'n Proto, for example, had built-in versioning and extensibility.

And, finally, for added reliability and to get rid of the FFI surface, the threads can be moved to a separate process, which is how we end up with the oracle/rider model.

@eddyb

eddyb Oct 16, 2015

Member

I guess this got lost in the bikeshed: the reason I independently came up with the IPC idea is that compartimentalization and serialization is unavoidable.

I have implemented a system where rustc threads are spawned in the background, and there is a duplex request/response channel to control the thread and get data from it.

The messages are owned ADTs, which means rustc-specific data (e.g. an interned string) has to be converted to a general representation (String).

To avoid tying the in-memory format to Rust or C-based representations, an actual serialization format can be used.
Cap'n Proto, for example, had built-in versioning and extensibility.

And, finally, for added reliability and to get rid of the FFI surface, the threads can be moved to a separate process, which is how we end up with the oracle/rider model.

Show outdated Hide outdated text/0000-ide.md
the user has saved the file) and a list of spans to invalidate. Where there are
no invalidated spans, the update call adds data (which will cause an error if
there are conflicts). Where there is no input data, update just invalidates.

This comment has been minimized.

@olivren

olivren Oct 14, 2015

I understand this is not a detailed API, but I dont see how this works when there are multiple spans to invalidate.Do all spans must be computed based on the initial content of the file? It sounds like it could be a bit painful to compute it from the plugin side. Maybe a simpler API could live alongside this one, that takes the full content of the file and that lets the oracle compute the differences based on its current state.

@olivren

olivren Oct 14, 2015

I understand this is not a detailed API, but I dont see how this works when there are multiple spans to invalidate.Do all spans must be computed based on the initial content of the file? It sounds like it could be a bit painful to compute it from the plugin side. Maybe a simpler API could live alongside this one, that takes the full content of the file and that lets the oracle compute the differences based on its current state.

This comment has been minimized.

@nrc

nrc Oct 15, 2015

Member

The spans are relative to the last update passed to the oracle. I don't think that should be hard to compute - the plugin just has to keep track of any text deleted or edited since the last update call.

The trouble with diff'ing before and after snapshots is that it is hard to do well, and so we'd end up making mistakes or overestimating the invalidated region. I imagine it is not super-cheap either.

@nrc

nrc Oct 15, 2015

Member

The spans are relative to the last update passed to the oracle. I don't think that should be hard to compute - the plugin just has to keep track of any text deleted or edited since the last update call.

The trouble with diff'ing before and after snapshots is that it is hard to do well, and so we'd end up making mistakes or overestimating the invalidated region. I imagine it is not super-cheap either.

This comment has been minimized.

@olivren

olivren Oct 15, 2015

This makes sense. I think you are right and the proposed design is best.

@olivren

olivren Oct 15, 2015

This makes sense. I think you are right and the proposed design is best.

Show outdated Hide outdated text/0000-ide.md
Takes a span, returns all 'definitions and declarations' for the identifier
covered by the span. Can return an error if the span does not cover exactly one
identifier or the oracle has no data for an identifier.

This comment has been minimized.

@olivren

olivren Oct 14, 2015

Why is the input a span here? I would expect an offset. Would I have to find the span of the entire word I want the definition of, or is a partial span (or even an empty span) a valid input? (Note that the same remark applies to all the subsequent API functions)

@olivren

olivren Oct 14, 2015

Why is the input a span here? I would expect an offset. Would I have to find the span of the entire word I want the definition of, or is a partial span (or even an empty span) a valid input? (Note that the same remark applies to all the subsequent API functions)

This comment has been minimized.

@nrc

nrc Oct 15, 2015

Member

I'm assuming that the IDE has a tokeniser and therefore already knows the span of the identifier (even simple editors usually have a tokeniser in order to implement syntax highlighting). On the other hand, I suppose that from the oracle's point of view it is as easy to identify the identifier by a single position as it is a span, so it might be as well to take a single position.

@nrc

nrc Oct 15, 2015

Member

I'm assuming that the IDE has a tokeniser and therefore already knows the span of the identifier (even simple editors usually have a tokeniser in order to implement syntax highlighting). On the other hand, I suppose that from the oracle's point of view it is as easy to identify the identifier by a single position as it is a span, so it might be as well to take a single position.

This comment has been minimized.

@olivren

olivren Oct 15, 2015

The hand written tokenizer on the plugin side will certainly not be as accurate as the oracle's one. I'd rather delegate to the oracle as much as possible.

@olivren

olivren Oct 15, 2015

The hand written tokenizer on the plugin side will certainly not be as accurate as the oracle's one. I'd rather delegate to the oracle as much as possible.

Show outdated Hide outdated text/0000-ide.md
Takes a span, returns a list of reference data (or an error). Each datum
consists of the span of the reference and a code snippet.

This comment has been minimized.

@olivren

olivren Oct 14, 2015

The output should also tell the "kind" of each reference found. For example, if we want to find the references of a function declared in a trait, the output could be either a "method call" kind, or a "definition in a trait impl" kind, or a "function as a value" kind.

@olivren

olivren Oct 14, 2015

The output should also tell the "kind" of each reference found. For example, if we want to find the references of a function declared in a trait, the output could be either a "method call" kind, or a "definition in a trait impl" kind, or a "function as a value" kind.

This comment has been minimized.

@nrc

nrc Oct 15, 2015

Member

The "reference data" includes the kind of definition, see lines 299, 300

@nrc

nrc Oct 15, 2015

Member

The "reference data" includes the kind of definition, see lines 299, 300

This comment has been minimized.

@olivren

olivren Oct 15, 2015

The term used in the the previous section was "definition data". It is not obvious that "reference data" designates the "kind" of references.

@olivren

olivren Oct 15, 2015

The term used in the the previous section was "definition data". It is not obvious that "reference data" designates the "kind" of references.

This comment has been minimized.

@nrc

nrc Oct 16, 2015

Member

I'll try and polish the text here.

@nrc

nrc Oct 16, 2015

Member

I'll try and polish the text here.

Show outdated Hide outdated text/0000-ide.md
Takes a span, returns the same data as *get definition* but limited to type information.
Question: are these useful/necessary? Or should users just call *get definition*?

This comment has been minimized.

@olivren

olivren Oct 14, 2015

I think it only depends whether the oracle is able to give one type of answer more quickly than another one. If the oracle can only be "ready to answer any request" or "not ready", then a unique get definition function is enough.

@olivren

olivren Oct 14, 2015

I think it only depends whether the oracle is able to give one type of answer more quickly than another one. If the oracle can only be "ready to answer any request" or "not ready", then a unique get definition function is enough.

This comment has been minimized.

@daniel-vainsencher

daniel-vainsencher Oct 15, 2015

I don't think that get type (presumably over any expression, not just identifiers) is the same kind of query as get definition: the user wants the type specialized to the current call's context, including values of any type parameters.

@daniel-vainsencher

daniel-vainsencher Oct 15, 2015

I don't think that get type (presumably over any expression, not just identifiers) is the same kind of query as get definition: the user wants the type specialized to the current call's context, including values of any type parameters.

Show outdated Hide outdated text/0000-ide.md
Takes a search string or an id, and a struct of search parameters including case
sensitivity, and the kind of items to search (e.g., functions, traits, all
items). Returns a list of spans and code snippets.

This comment has been minimized.

@olivren

olivren Oct 14, 2015

The search for identifier should also take as an input the scope of the search. QtCreator's Locator lets the user search for a symbol in the current opened file, and It's one of the Locator function I use the most.

@olivren

olivren Oct 14, 2015

The search for identifier should also take as an input the scope of the search. QtCreator's Locator lets the user search for a symbol in the current opened file, and It's one of the Locator function I use the most.

This comment has been minimized.

@nrc

nrc Oct 15, 2015

Member

This is a good idea, I'll add it.

Note that the IDE is expected to present an interface over this functionality, and plain text search should be done entirely in the IDE, the only search the oracle helps with is finding uses of a particular identifier (i.e., semantic search). Defining a scope to search over would still be useful though.

@nrc

nrc Oct 15, 2015

Member

This is a good idea, I'll add it.

Note that the IDE is expected to present an interface over this functionality, and plain text search should be done entirely in the IDE, the only search the oracle helps with is finding uses of a particular identifier (i.e., semantic search). Defining a scope to search over would still be useful though.

This comment has been minimized.

@olivren

olivren Oct 15, 2015

Yes, I was indeed talking about symbols, not full text searches. The Locator will give top level items only.

@olivren

olivren Oct 15, 2015

Yes, I was indeed talking about symbols, not full text searches. The Locator will give top level items only.

Show outdated Hide outdated text/0000-ide.md
from just using the caret position?). Each suggestion consists of the text for
completion plus the same information as returned for the *get definition* call.

This comment has been minimized.

@olivren

olivren Oct 14, 2015

I would add another function to the API: a way to search among the documentation. QtCreator's Locator can search in the available documentation, and display the html doc of the matches.

@olivren

olivren Oct 14, 2015

I would add another function to the API: a way to search among the documentation. QtCreator's Locator can search in the available documentation, and display the html doc of the matches.

This comment has been minimized.

@nrc

nrc Oct 15, 2015

Member

Is this a text search of the docs, or does it look up documentation for a particular function (or whatever)?

@nrc

nrc Oct 15, 2015

Member

Is this a text search of the docs, or does it look up documentation for a particular function (or whatever)?

This comment has been minimized.

@olivren

olivren Oct 15, 2015

Not sure about the specifics. With a Qt project, this Locator feature matches either a symbol name, or a word that appears inside a title of the generated html doc (they are structured docs). This is clearly not a critical feature, it may not belong to an initial RFC.

@olivren

olivren Oct 15, 2015

Not sure about the specifics. With a Qt project, this Locator feature matches either a symbol name, or a word that appears inside a title of the generated html doc (they are structured docs). This is clearly not a critical feature, it may not belong to an initial RFC.

Show outdated Hide outdated text/0000-ide.md
Takes a span (note that this span could be empty, e.g, for `foo.` we would use
the empty span which starts after the `.`; for `foo.b` we would use the span for
`b`), and returns a list of suggestions (is this useful? Is there any difference
from just using the caret position?). Each suggestion consists of the text for

This comment has been minimized.

@olivren

olivren Oct 14, 2015

In this case I think a span is ok. If a user highlights a part of the text and invokes the autocompletion, I expect the completion to give results that could replace the highlighted selection.

@olivren

olivren Oct 14, 2015

In this case I think a span is ok. If a user highlights a part of the text and invokes the autocompletion, I expect the completion to give results that could replace the highlighted selection.

Show outdated Hide outdated text/0000-ide.md
We should support the current text format, JSON (or some other structured
format) for tools to use, and HTML for rich error messages (this is somewhat
orthogonal to this RFC, but has been discussed in the past as a desirable
feature).

This comment has been minimized.

@olivren

olivren Oct 14, 2015

Could you give an example of how an HTML error message could be useful? Or give a link to a discussion on the subject?

@olivren

olivren Oct 14, 2015

Could you give an example of how an HTML error message could be useful? Or give a link to a discussion on the subject?

This comment has been minimized.

@nrc

nrc Oct 15, 2015

Member

The idea is that we could use colouring or shading or whatever to indicate things like the scope of borrows or lifetimes and have links to relevant documentation, etc. This is not useful for IDEs so much as a general way to improve our error messages.

@nrc

nrc Oct 15, 2015

Member

The idea is that we could use colouring or shading or whatever to indicate things like the scope of borrows or lifetimes and have links to relevant documentation, etc. This is not useful for IDEs so much as a general way to improve our error messages.

Show outdated Hide outdated text/0000-ide.md
Alternatives are 'Rider', 'Racer Server', or anything you can think of.
How do we handle different versions of Rust and interact with multi-rust?
Upgrades to the next stable version of Rust?

This comment has been minimized.

@olivren

olivren Oct 14, 2015

I too was wondering how to deal with multiple versions of Rust. In my IDE plugin, I would like to let the user choose its compiler, and be able to invoke the nightly compiler or the stable one explicitly using different targets. I don't know how I could give this choice to the user, for the code model used for navigation/completion/refactoring.

@olivren

olivren Oct 14, 2015

I too was wondering how to deal with multiple versions of Rust. In my IDE plugin, I would like to let the user choose its compiler, and be able to invoke the nightly compiler or the stable one explicitly using different targets. I don't know how I could give this choice to the user, for the code model used for navigation/completion/refactoring.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Oct 15, 2015

Member

@phildawes To remove as much latency as possible and to prevent rustc stabilization concerns, I believe that the oracle should use rustc's internal APIs and be behind the stability wall.
That is, the oracle ships with rustc and uses its libraries directly.

I really see no point in having some output format from rustc itself, that's added design complexity and inefficiency for no gain (in case of the oracle, at least).

Member

eddyb commented Oct 15, 2015

@phildawes To remove as much latency as possible and to prevent rustc stabilization concerns, I believe that the oracle should use rustc's internal APIs and be behind the stability wall.
That is, the oracle ships with rustc and uses its libraries directly.

I really see no point in having some output format from rustc itself, that's added design complexity and inefficiency for no gain (in case of the oracle, at least).

@phildawes

This comment has been minimized.

Show comment
Hide comment
@phildawes

phildawes Oct 15, 2015

Hi @eddyb!

I agree with that idea, if oracle is to be a thing then it makes sense to put oracle behind the stabilization wall and link it directly to rustc.

However I'm worried about the whole oracle thing. Definitely we need a stable interface to rustc. My concerns with oracle being the stable interface to rustc are:

  • 'database of code' is seductive, but imposing this top-down abstraction could easily result in incidental complexity (overengineering)
  • it might turn out that building the various refactorings and functionality require an interface that doesn't fit well into the database-of-code paradigm.

It already appears to me that completions maybe don't fit naturally into this design, and we've barely got started.

I'd prefer to see us drive the interface out bottom-up from the ide plugins. Try to build stuff using rustc directly, understand what stable interface is required.

phildawes commented Oct 15, 2015

Hi @eddyb!

I agree with that idea, if oracle is to be a thing then it makes sense to put oracle behind the stabilization wall and link it directly to rustc.

However I'm worried about the whole oracle thing. Definitely we need a stable interface to rustc. My concerns with oracle being the stable interface to rustc are:

  • 'database of code' is seductive, but imposing this top-down abstraction could easily result in incidental complexity (overengineering)
  • it might turn out that building the various refactorings and functionality require an interface that doesn't fit well into the database-of-code paradigm.

It already appears to me that completions maybe don't fit naturally into this design, and we've barely got started.

I'd prefer to see us drive the interface out bottom-up from the ide plugins. Try to build stuff using rustc directly, understand what stable interface is required.

@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Oct 15, 2015

Member

'oracle' is not a perfect name here

@liigo oracle is a terrible name! But it does have some precedent from Go, and I can't think of a better one. Suggestions welcome!

Member

nrc commented Oct 15, 2015

'oracle' is not a perfect name here

@liigo oracle is a terrible name! But it does have some precedent from Go, and I can't think of a better one. Suggestions welcome!

@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Oct 15, 2015

Member

@phildawes

I think the fundamental requirement is that we need a stable interface to rustc to support IDEs and plugins.

The save-analysis API should evolve into this. But it seems there is a level of computation that must be done on top of that data (i.e., the data the compiler has) and that this computation must be done by any IDE, so it makes sense to me to share that code in the oracle.

I'm not yet sure whether a long running database process is required/desirable.

The database aspect is just an implementation detail. I think it is necessary in order to do the cross-referencing that the compiler does not do (e.g., enumerating all the implementations of a trait). I also think some kind of cache is necessary for perf reasons. That doesn't have to be a database, but there is a lot of info for Rust programs (indexing Rust or Servo for DXR produces several gigs of data) and we'll hit lots of problems with data of that size. Databases have these problems solved already.

The oracle concept implies a single view of the sourcecode, and I think this jibes a bit with the tools that will perform refactoring, reformatting and suggestions. I suspect they will want interactive access to the compiler (compile this pre-processed snippet, what errors occur if I do this?)

They will, and they will be able to interact with the compiler directly still. I imagine that something like a reformatting tool will start by querying the oracle, and then use the compiler directly.

If the oracle were to support completion plugins directly from its database, it would need to have very low latency update turnaround (i.e. on every keypress, in <100ms). I'm not sure how well this would work with it being separate from rustc and not driven by the plugins.

This is a concern. I think the only way we can be fast enough is to be totally separate from rustc - a db query should orders of magnitude than compiling even a small section of a program. It is possible that the IPC overhead will be too high and we'll have to provide the oracle as a library for the plugins to use in-process, but that probably means writing in Java to avoid expensive JNI calls. We should get this perf info from an early prototype, and can revisit then if necessary.

It may be that we have an oracle in addition to a tools-oriented interface to rustc.

Yeah, I envisage this.

Member

nrc commented Oct 15, 2015

@phildawes

I think the fundamental requirement is that we need a stable interface to rustc to support IDEs and plugins.

The save-analysis API should evolve into this. But it seems there is a level of computation that must be done on top of that data (i.e., the data the compiler has) and that this computation must be done by any IDE, so it makes sense to me to share that code in the oracle.

I'm not yet sure whether a long running database process is required/desirable.

The database aspect is just an implementation detail. I think it is necessary in order to do the cross-referencing that the compiler does not do (e.g., enumerating all the implementations of a trait). I also think some kind of cache is necessary for perf reasons. That doesn't have to be a database, but there is a lot of info for Rust programs (indexing Rust or Servo for DXR produces several gigs of data) and we'll hit lots of problems with data of that size. Databases have these problems solved already.

The oracle concept implies a single view of the sourcecode, and I think this jibes a bit with the tools that will perform refactoring, reformatting and suggestions. I suspect they will want interactive access to the compiler (compile this pre-processed snippet, what errors occur if I do this?)

They will, and they will be able to interact with the compiler directly still. I imagine that something like a reformatting tool will start by querying the oracle, and then use the compiler directly.

If the oracle were to support completion plugins directly from its database, it would need to have very low latency update turnaround (i.e. on every keypress, in <100ms). I'm not sure how well this would work with it being separate from rustc and not driven by the plugins.

This is a concern. I think the only way we can be fast enough is to be totally separate from rustc - a db query should orders of magnitude than compiling even a small section of a program. It is possible that the IPC overhead will be too high and we'll have to provide the oracle as a library for the plugins to use in-process, but that probably means writing in Java to avoid expensive JNI calls. We should get this perf info from an early prototype, and can revisit then if necessary.

It may be that we have an oracle in addition to a tools-oriented interface to rustc.

Yeah, I envisage this.

@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Oct 15, 2015

Member

@eddyb

I believe that the oracle should use rustc's internal APIs and be behind the stability wall.

Yeah, we could definitely put the oracle and quick-check compiler together. The downside of this is purely about development practice - it will be pain to prototype the oracle if we have to land stuff into the compiler all the time and deal with the compiler internals changing.

I think I would like to start separately for easier development (they are two radically different kinds of software), work out how things will look and then merge them if necessary. I imagine that using the save-analysis API directly rather than data provided by it should not be too difficult.

Member

nrc commented Oct 15, 2015

@eddyb

I believe that the oracle should use rustc's internal APIs and be behind the stability wall.

Yeah, we could definitely put the oracle and quick-check compiler together. The downside of this is purely about development practice - it will be pain to prototype the oracle if we have to land stuff into the compiler all the time and deal with the compiler internals changing.

I think I would like to start separately for easier development (they are two radically different kinds of software), work out how things will look and then merge them if necessary. I imagine that using the save-analysis API directly rather than data provided by it should not be too difficult.

@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Oct 15, 2015

Member

@phildawes

It already appears to me that completions maybe don't fit naturally into this design, and we've barely got started.

Could you expand on this please? I'm keen to understand the limitations of the database approach.

I'd prefer to see us drive the interface out bottom-up from the ide plugins. Try to build stuff using rustc directly, understand what stable interface is required.

To deal with rustc directly, it would need to be modified to be a long running process - the latency of reading and processing crate metadata would kill the latency alone. I suspect we could do simple lookups and probably code completion based on querying the compiler. Some more complicated searches are impossible due to the compiler never having the info. For large projects, there would be big overheads in keeping the compiler in memory - it uses a lot, and we'd need to figure out a way to have an instance running for each crate we currently care about. Then there are questions about how quickly we can reload the data for a crate - hopefully the incremental compilation work helps there, but it is a little bit of an unknown.

In comparison, the database approach feels simpler - it is really just a cache of the compiler's data, pre-cross-referenced and stored in an efficient manner. It's also somewhat tried and tested as the implementation for DXR and is I believe how most IDEs for other languages work (with the difference that the database is managed by the IDE rather than a helper oracle).

Member

nrc commented Oct 15, 2015

@phildawes

It already appears to me that completions maybe don't fit naturally into this design, and we've barely got started.

Could you expand on this please? I'm keen to understand the limitations of the database approach.

I'd prefer to see us drive the interface out bottom-up from the ide plugins. Try to build stuff using rustc directly, understand what stable interface is required.

To deal with rustc directly, it would need to be modified to be a long running process - the latency of reading and processing crate metadata would kill the latency alone. I suspect we could do simple lookups and probably code completion based on querying the compiler. Some more complicated searches are impossible due to the compiler never having the info. For large projects, there would be big overheads in keeping the compiler in memory - it uses a lot, and we'd need to figure out a way to have an instance running for each crate we currently care about. Then there are questions about how quickly we can reload the data for a crate - hopefully the incremental compilation work helps there, but it is a little bit of an unknown.

In comparison, the database approach feels simpler - it is really just a cache of the compiler's data, pre-cross-referenced and stored in an efficient manner. It's also somewhat tried and tested as the implementation for DXR and is I believe how most IDEs for other languages work (with the difference that the database is managed by the IDE rather than a helper oracle).

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Oct 15, 2015

Member

For large projects, there would be big overheads in keeping the compiler in memory - it uses a lot, and we'd need to figure out a way to have an instance running for each crate we currently care about.

There is... always a compromise: if we use only flat arrays without pointers, and no pointer-sized types, we can start memory-mapping them out of crate metadata.
I suggested something like this for HIR in one of the discussions, and I'm pretty sure it can work out, albeit at the cost of less ergonomic internal APIs.

Also, most of the memory overhead comes from LLVM, IME.
I was able to keep ~120 rustc instances in RAM, for all the .rs files in rust-by-example, in about 400MB (IIRC, it could've been more).
I had to remove some eager impl loading logic to get there, and there were some other things (eager macro loading) that could also be removed, but it was more work for a smaller win.

Member

eddyb commented Oct 15, 2015

For large projects, there would be big overheads in keeping the compiler in memory - it uses a lot, and we'd need to figure out a way to have an instance running for each crate we currently care about.

There is... always a compromise: if we use only flat arrays without pointers, and no pointer-sized types, we can start memory-mapping them out of crate metadata.
I suggested something like this for HIR in one of the discussions, and I'm pretty sure it can work out, albeit at the cost of less ergonomic internal APIs.

Also, most of the memory overhead comes from LLVM, IME.
I was able to keep ~120 rustc instances in RAM, for all the .rs files in rust-by-example, in about 400MB (IIRC, it could've been more).
I had to remove some eager impl loading logic to get there, and there were some other things (eager macro loading) that could also be removed, but it was more work for a smaller win.

@phildawes

This comment has been minimized.

Show comment
Hide comment
@phildawes

phildawes Oct 15, 2015

Alternative straw-man design: (this is a starting point + probably full of holes, don't be too hard on me!)

  • There is no project api.
  • get analysis for Item
    pass the filename, file-contents (if not the same as on-disk), and a point in the file
    returns the save-analysis style info for every element in the Item.
    If there are parse or semantic errors/warnings, also returns these, but does its best to complete
    • turnaround time: aim for ~50ms
  • get analysis for File
    pass the filename, file-contents (if not the same as on-disk)
    returns the analysis info for every element in the file
    If there are parse or semantic errors/warnings, also returns these
    • turnaround time: aim for ~250ms
  • get analysis for project
    pass a filename in the project (doesn't matter which file)
    Returns analysis info for every type and function skeleton, but not bodies
    If there are parse or semantic errors/warnings, also returns these
    • turnaround time: aim for ~250ms

The advantage of this approach is that it assumes less about the intentions of the caller and is more flexible.

All the cross referencing is done by the IDE plugins according to their needs. Additional middleware can be written to assist plugins later if required.

phildawes commented Oct 15, 2015

Alternative straw-man design: (this is a starting point + probably full of holes, don't be too hard on me!)

  • There is no project api.
  • get analysis for Item
    pass the filename, file-contents (if not the same as on-disk), and a point in the file
    returns the save-analysis style info for every element in the Item.
    If there are parse or semantic errors/warnings, also returns these, but does its best to complete
    • turnaround time: aim for ~50ms
  • get analysis for File
    pass the filename, file-contents (if not the same as on-disk)
    returns the analysis info for every element in the file
    If there are parse or semantic errors/warnings, also returns these
    • turnaround time: aim for ~250ms
  • get analysis for project
    pass a filename in the project (doesn't matter which file)
    Returns analysis info for every type and function skeleton, but not bodies
    If there are parse or semantic errors/warnings, also returns these
    • turnaround time: aim for ~250ms

The advantage of this approach is that it assumes less about the intentions of the caller and is more flexible.

All the cross referencing is done by the IDE plugins according to their needs. Additional middleware can be written to assist plugins later if required.

@phildawes

This comment has been minimized.

Show comment
Hide comment
@phildawes

phildawes Oct 15, 2015

Hi @nrc

To deal with rustc directly, it would need to be modified to be a long running process - the latency of reading and processing crate metadata would kill the latency alone. I suspect we could do simple lookups and probably code completion based on querying the compiler. Some more complicated searches are impossible due to the compiler never having the info. For large projects, there would be big overheads in keeping the compiler in memory - it uses a lot, and we'd need to figure out a way to have an instance running for each crate we currently care about. Then there are questions about how quickly we can reload the data for a crate - hopefully the incremental compilation work helps there, but it is a little bit of an unknown.

I think I disagree with this.

  • When profiling rustc, reading the crate metadata takes very small fraction of the time compared to resolution and analysis.
  • We wouldn't keep a rustc running, we'd link to the libraries and perform the analysis lazily.
    (I think we have to do this if we want to support quick-check-style analysis for completions, even if we do the database thing)

In comparison, the database approach feels simpler - it is really just a cache of the compiler's data, pre-cross-referenced and stored in an efficient manner. It's also somewhat tried and tested as the implementation for DXR and is I believe how most IDEs for other languages work (with the difference that the database is managed by the IDE rather than a helper oracle).

  • I don't think DXR is a good precedent in this case, since DXR works on a static fully-compiled view of the crate. If a static view was all that was required we'd already have good tool support built on top of save-analysis. DXR doesn't have any of the incremental update or unfinished code complexity (cache invalidation being one of the 2 hard things in software engineering! http://martinfowler.com/bliki/TwoHardThings.html)
  • I'm not sure that IDE's work like this. E.g. @dgrunwald hints that SharpDevelop and Microsoft's new 'Roslyn' C# use a lazy approach to semantic analysis rather caching the entire semantic project. https://www.reddit.com/r/rust/comments/3a4qbj/racer_rustc_update/cs9w3ny?context=3

phildawes commented Oct 15, 2015

Hi @nrc

To deal with rustc directly, it would need to be modified to be a long running process - the latency of reading and processing crate metadata would kill the latency alone. I suspect we could do simple lookups and probably code completion based on querying the compiler. Some more complicated searches are impossible due to the compiler never having the info. For large projects, there would be big overheads in keeping the compiler in memory - it uses a lot, and we'd need to figure out a way to have an instance running for each crate we currently care about. Then there are questions about how quickly we can reload the data for a crate - hopefully the incremental compilation work helps there, but it is a little bit of an unknown.

I think I disagree with this.

  • When profiling rustc, reading the crate metadata takes very small fraction of the time compared to resolution and analysis.
  • We wouldn't keep a rustc running, we'd link to the libraries and perform the analysis lazily.
    (I think we have to do this if we want to support quick-check-style analysis for completions, even if we do the database thing)

In comparison, the database approach feels simpler - it is really just a cache of the compiler's data, pre-cross-referenced and stored in an efficient manner. It's also somewhat tried and tested as the implementation for DXR and is I believe how most IDEs for other languages work (with the difference that the database is managed by the IDE rather than a helper oracle).

  • I don't think DXR is a good precedent in this case, since DXR works on a static fully-compiled view of the crate. If a static view was all that was required we'd already have good tool support built on top of save-analysis. DXR doesn't have any of the incremental update or unfinished code complexity (cache invalidation being one of the 2 hard things in software engineering! http://martinfowler.com/bliki/TwoHardThings.html)
  • I'm not sure that IDE's work like this. E.g. @dgrunwald hints that SharpDevelop and Microsoft's new 'Roslyn' C# use a lazy approach to semantic analysis rather caching the entire semantic project. https://www.reddit.com/r/rust/comments/3a4qbj/racer_rustc_update/cs9w3ny?context=3
@Ericson2314

This comment has been minimized.

Show comment
Hide comment
@Ericson2314

Ericson2314 Oct 15, 2015

Contributor

@eddyb I agree that we should minimize the number of processes. Every process boundary means screwing around with serialization formats and flattening things, which is a waste of code and effort. Furthermore the nature of what is being done lends itself to rich data structures, so this flattening can be extra harmful.


I brought this up in the earlier incremental compilation RFC, but at some point I rather have all dependency graphs be explicit, and traverse/cache in some generic matter. My hunch is that building such graphs is far less expensive than storing them. If so might as well have all builds work this way, and instead make the decision how much of the graph should be persisted---in memory in the case of a daemon or on disk in the case of repeated runs. I'd assume once we have GC that building and immediately chucking the graphs would be even more of a non-issue.

Contributor

Ericson2314 commented Oct 15, 2015

@eddyb I agree that we should minimize the number of processes. Every process boundary means screwing around with serialization formats and flattening things, which is a waste of code and effort. Furthermore the nature of what is being done lends itself to rich data structures, so this flattening can be extra harmful.


I brought this up in the earlier incremental compilation RFC, but at some point I rather have all dependency graphs be explicit, and traverse/cache in some generic matter. My hunch is that building such graphs is far less expensive than storing them. If so might as well have all builds work this way, and instead make the decision how much of the graph should be persisted---in memory in the case of a daemon or on disk in the case of repeated runs. I'd assume once we have GC that building and immediately chucking the graphs would be even more of a non-issue.

@phildawes

This comment has been minimized.

Show comment
Hide comment
@phildawes

phildawes Oct 15, 2015

@nrc

.. appears to me that completions maybe don't fit naturally into this design, ..

Could you expand on this please? I'm keen to understand the limitations of the database approach.

My thinking is:

  • the requirement for completions is that we provide suggestions for the code as it is being written with low latency (ideally < 100ms on each keypress).
  • With the database fed by the compiler approach the compiler has to be passed the new state by the ide after each keypress. The compiler generates the analysis data (quick-check), which is then fed to update the database. After the database is updated the ide can then perform the get-suggestions query.
  • A more straightforward approach would be to obtain the analysis data direct from the compiler (in order to get the type of the expression being completed), and then do a separate field+methods search on the (more static) skeleton data.

phildawes commented Oct 15, 2015

@nrc

.. appears to me that completions maybe don't fit naturally into this design, ..

Could you expand on this please? I'm keen to understand the limitations of the database approach.

My thinking is:

  • the requirement for completions is that we provide suggestions for the code as it is being written with low latency (ideally < 100ms on each keypress).
  • With the database fed by the compiler approach the compiler has to be passed the new state by the ide after each keypress. The compiler generates the analysis data (quick-check), which is then fed to update the database. After the database is updated the ide can then perform the get-suggestions query.
  • A more straightforward approach would be to obtain the analysis data direct from the compiler (in order to get the type of the expression being completed), and then do a separate field+methods search on the (more static) skeleton data.
@phildawes

This comment has been minimized.

Show comment
Hide comment
@phildawes

phildawes Oct 15, 2015

Hello again!

Reading it all back, I suspect you (@nrc) and I are mostly in agreement about the details, but just have a different focus.

Have I got this correct?:

  • we both agree that there needs to be a stable low level tools interface to rustc for 'quick check'. Only rustc can do quick-check, since it requires the complicated type and lifetime analysis performed by librustc_typeck et al. (by 'rustc' I mean the code that makes up the compiler rather than the binary)
    • the current view is that this interface will deliver content similar to save-analysis, but will support generation of per-Item (or smaller?) chunks.
  • Your focus is on the oracle database-style component to assist ide plugins, with the intention of saving duplicated effort. From this perspective the interaction with the compiler/quick-check is an implementation detail.
  • My concern is the low level quickcheck functionality and api. For me this is the piece blocking everything else (including the oracle!).

I'm not yet sure if the oracle is a necessary piece to get good ide support. Once you take away typeck+resolve I think the rest is mostly graph traversal. But with the low level interface in place it doesn't matter if it isn't a good fit; ide developers can write their own traversal/search code instead (and in java if they like!).

phildawes commented Oct 15, 2015

Hello again!

Reading it all back, I suspect you (@nrc) and I are mostly in agreement about the details, but just have a different focus.

Have I got this correct?:

  • we both agree that there needs to be a stable low level tools interface to rustc for 'quick check'. Only rustc can do quick-check, since it requires the complicated type and lifetime analysis performed by librustc_typeck et al. (by 'rustc' I mean the code that makes up the compiler rather than the binary)
    • the current view is that this interface will deliver content similar to save-analysis, but will support generation of per-Item (or smaller?) chunks.
  • Your focus is on the oracle database-style component to assist ide plugins, with the intention of saving duplicated effort. From this perspective the interaction with the compiler/quick-check is an implementation detail.
  • My concern is the low level quickcheck functionality and api. For me this is the piece blocking everything else (including the oracle!).

I'm not yet sure if the oracle is a necessary piece to get good ide support. Once you take away typeck+resolve I think the rest is mostly graph traversal. But with the low level interface in place it doesn't matter if it isn't a good fit; ide developers can write their own traversal/search code instead (and in java if they like!).

@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Oct 16, 2015

Member

@phildawes yeah, I think we are more in agreement than not. I think in your mental model, the oracle is part of the IDE, and it is perfectly reasonable to view it as a reuse mechanism shared between IDEs layered on top of the compiler, rather than the compiler itself.

I guess where we differ is that for code completion (as opposed to code search), you don't need the oracle layer, you could get the answer straight from the compiler. I guess you want that for efficiency, whereas I was imagining the oracle giving a more uniform interface by providing everything.

It would certainly be possible for Racer to skip the oracle and talk directly to the compiler for code completion. The same goes for IDEs, although not if we take the approach where the oracle drives the compiler, rather than the IDE directly. However, I think the mode of operation in this case could be pretty quick - the oracle queries the compiler, passes the result to the IDE, and then updates its database. As long as the oracle and quick check compiler are merged into one program, this should be as quick as using the compiler directly. If they are separate processes, then we'd need to measure the difference.

(More in the next reply).

Member

nrc commented Oct 16, 2015

@phildawes yeah, I think we are more in agreement than not. I think in your mental model, the oracle is part of the IDE, and it is perfectly reasonable to view it as a reuse mechanism shared between IDEs layered on top of the compiler, rather than the compiler itself.

I guess where we differ is that for code completion (as opposed to code search), you don't need the oracle layer, you could get the answer straight from the compiler. I guess you want that for efficiency, whereas I was imagining the oracle giving a more uniform interface by providing everything.

It would certainly be possible for Racer to skip the oracle and talk directly to the compiler for code completion. The same goes for IDEs, although not if we take the approach where the oracle drives the compiler, rather than the IDE directly. However, I think the mode of operation in this case could be pretty quick - the oracle queries the compiler, passes the result to the IDE, and then updates its database. As long as the oracle and quick check compiler are merged into one program, this should be as quick as using the compiler directly. If they are separate processes, then we'd need to measure the difference.

(More in the next reply).

@ArtemGr

This comment has been minimized.

Show comment
Hide comment
@ArtemGr

ArtemGr Jan 19, 2016

In any case, my prediction, were this plan of yours to be put in practice, is that at some point, someone will rewrite all of that unnecessary abstraction, reducing code size and eliminating one (or more) non-Rust dependencies.

That might very well be true and I'd like to point out that it is a good thing!

"When designing a new kind of system, a team will design a throw-away system (whether it intends to or not). This system acts as a "pilot plant" that reveals techniques that will subsequently cause a complete redesign of the system." - https://en.wikipedia.org/wiki/The_Mythical_Man-Month#The_pilot_system

Relational databases certainly aren't magic and a lot of the complexity they have is around handling arbitrary data in arbitrary ways.
We have specialized data and special needs, therefore a relational database is not only overkill, but potentially harmful.

Relational databases are self-docummenting. You have a table, you can always look it up, see what columns, expression indexes, foreign keys it has, browse the data, make some slight modifications necessary for debugging, etc. It's like always having an UML diagram on hand.

In my experience, and I worked with both the relational and the key-value (LevelDB, LMDB, Oracle Berkeley DB Java Edition, Riak, Cassandra, Voldemort) databases, the relational side is much easier to prototype with. And as you've noted youself, they're interchangeable, what you keep in a relational database you can keep in key-value store and vice versa, so if the compiler can use a specialized index then it can use a relational database instead. The nice property of relational databases is that you don't have to maintain the index integrity or even the referential integrity by yourself. You don't have to reinvent the necessary abstractions. Want to upgrade your data to a different format? It's a matter of several SQL queries and not of writing a whole program just to make that tiny upgrade possible, or maintaining the code of the old model in order to be able to access that old format during the upgrade.

Eventually both approaches have their gotchas and shortcuts to efficient use. And both can fail miserably. It all depends. That's why I think that picking one over another shouldn't be a part of this RFC but rather an implementation detail left out to implementors and what they're most familiar and comfortable with.

ArtemGr commented Jan 19, 2016

In any case, my prediction, were this plan of yours to be put in practice, is that at some point, someone will rewrite all of that unnecessary abstraction, reducing code size and eliminating one (or more) non-Rust dependencies.

That might very well be true and I'd like to point out that it is a good thing!

"When designing a new kind of system, a team will design a throw-away system (whether it intends to or not). This system acts as a "pilot plant" that reveals techniques that will subsequently cause a complete redesign of the system." - https://en.wikipedia.org/wiki/The_Mythical_Man-Month#The_pilot_system

Relational databases certainly aren't magic and a lot of the complexity they have is around handling arbitrary data in arbitrary ways.
We have specialized data and special needs, therefore a relational database is not only overkill, but potentially harmful.

Relational databases are self-docummenting. You have a table, you can always look it up, see what columns, expression indexes, foreign keys it has, browse the data, make some slight modifications necessary for debugging, etc. It's like always having an UML diagram on hand.

In my experience, and I worked with both the relational and the key-value (LevelDB, LMDB, Oracle Berkeley DB Java Edition, Riak, Cassandra, Voldemort) databases, the relational side is much easier to prototype with. And as you've noted youself, they're interchangeable, what you keep in a relational database you can keep in key-value store and vice versa, so if the compiler can use a specialized index then it can use a relational database instead. The nice property of relational databases is that you don't have to maintain the index integrity or even the referential integrity by yourself. You don't have to reinvent the necessary abstractions. Want to upgrade your data to a different format? It's a matter of several SQL queries and not of writing a whole program just to make that tiny upgrade possible, or maintaining the code of the old model in order to be able to access that old format during the upgrade.

Eventually both approaches have their gotchas and shortcuts to efficient use. And both can fail miserably. It all depends. That's why I think that picking one over another shouldn't be a part of this RFC but rather an implementation detail left out to implementors and what they're most familiar and comfortable with.

@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Jan 19, 2016

Member

@bruno-medeiros I agree we don't need to go in to too much detail - but whether we use a library or IPC seems important.

Is security the reason why people (@Valloric @jwilm, etc.) where suggesting using HTTP

I believe ease of use is a more important reason. Security has been mentioned as a reason not to use HTTP.

cargo check does nothing on my machine, is this something currently available only on nightly ?

Not exactly sure which version of Cargo is required, but it is fairly recent.

It may be better to use NoSql rather than an SQL db, I'm not an expert in the space. I know DXR uses ElasticSearch, but that is more to support fancy text searching. I've found old school SQL dbs to be quite a good match for PL stuff in the past - the data is very homogeneous, cross-referencing by id fits the relational paradigm nicely, etc.

Member

nrc commented Jan 19, 2016

@bruno-medeiros I agree we don't need to go in to too much detail - but whether we use a library or IPC seems important.

Is security the reason why people (@Valloric @jwilm, etc.) where suggesting using HTTP

I believe ease of use is a more important reason. Security has been mentioned as a reason not to use HTTP.

cargo check does nothing on my machine, is this something currently available only on nightly ?

Not exactly sure which version of Cargo is required, but it is fairly recent.

It may be better to use NoSql rather than an SQL db, I'm not an expert in the space. I know DXR uses ElasticSearch, but that is more to support fancy text searching. I've found old school SQL dbs to be quite a good match for PL stuff in the past - the data is very homogeneous, cross-referencing by id fits the relational paradigm nicely, etc.

@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Jan 19, 2016

Member

@eddyb all things being equal I would like to re-use the compiler's data, there are a few reasons not:

  • it's not usable now. Incremental compilation data isn't done and it's likely to change a fair bit even once it is done.
  • metadata and planned incremental compilation data does not store any info about function bodies - we would have to serialise all this extra info. It will probably not be used for anything else, so it would be just for the RLS. We would want to not emit it in other cases, because it would be a large amount of data.
  • none of this data is designed to be used by anyone except the compiler - sure it can be used, but it will not be trivial. In the compiler the functions for reading the data are tightly coupled to how the data is used, if we want to use it in other ways we need to implement that.
  • stability - this is all compiler-internal stuff which we might change at any moment.
  • complex searches, in the future we might want to add sophisticated searches like 'find all impls of this trait' - each search like this would mean adding another index over the data and thus more and more complexity.
  • concurrent access - if we're rebuilding (assume non-incremental for whatever reason) we can't query the data.

It still seems reasonable to argue that using the compiler's data is a better approach, but is definitely not as clear cut as you make out. It may well be that we choose the db path and in some time the incremental compilation stuff has settled and we have function body data and we can get rid of the db, that seems fine to me. But I don't think we can implement this today without the db.

And I haven't really touched the save-analysis stuff: are you seriously suggesting that we dump all the compiler data in bulk into some stringly-typed mush?

No I am not. I think you missed my reply to you above where I make a distinction between the save-analysis data dumps (data in bulk) and the save-analysis API which is more typed, allows access to data on a fine-grained basis, and is basically just a future-proof wrapper around the compiler's data.

Member

nrc commented Jan 19, 2016

@eddyb all things being equal I would like to re-use the compiler's data, there are a few reasons not:

  • it's not usable now. Incremental compilation data isn't done and it's likely to change a fair bit even once it is done.
  • metadata and planned incremental compilation data does not store any info about function bodies - we would have to serialise all this extra info. It will probably not be used for anything else, so it would be just for the RLS. We would want to not emit it in other cases, because it would be a large amount of data.
  • none of this data is designed to be used by anyone except the compiler - sure it can be used, but it will not be trivial. In the compiler the functions for reading the data are tightly coupled to how the data is used, if we want to use it in other ways we need to implement that.
  • stability - this is all compiler-internal stuff which we might change at any moment.
  • complex searches, in the future we might want to add sophisticated searches like 'find all impls of this trait' - each search like this would mean adding another index over the data and thus more and more complexity.
  • concurrent access - if we're rebuilding (assume non-incremental for whatever reason) we can't query the data.

It still seems reasonable to argue that using the compiler's data is a better approach, but is definitely not as clear cut as you make out. It may well be that we choose the db path and in some time the incremental compilation stuff has settled and we have function body data and we can get rid of the db, that seems fine to me. But I don't think we can implement this today without the db.

And I haven't really touched the save-analysis stuff: are you seriously suggesting that we dump all the compiler data in bulk into some stringly-typed mush?

No I am not. I think you missed my reply to you above where I make a distinction between the save-analysis data dumps (data in bulk) and the save-analysis API which is more typed, allows access to data on a fine-grained basis, and is basically just a future-proof wrapper around the compiler's data.

@arielb1

This comment has been minimized.

Show comment
Hide comment
@arielb1

arielb1 Jan 19, 2016

Contributor

@ArtemGr

Want to upgrade your data to a different format? It's a matter of several SQL queries and not of writing a whole program just to make that tiny upgrade possible,

Rust has a homegrown basically-relational-database metadata format (we can't use an SQL database there because rustc is way too chatty). Many parts of rust source fit there nicely. However, one of the most important things - types - is a very poor fit for a relational database because of the tree-structure.

I suspect that all queries involving type information require a live compiler anyway. It would though be nice if someone(=me?) would write a metadata-to-sqlite translator.

Contributor

arielb1 commented Jan 19, 2016

@ArtemGr

Want to upgrade your data to a different format? It's a matter of several SQL queries and not of writing a whole program just to make that tiny upgrade possible,

Rust has a homegrown basically-relational-database metadata format (we can't use an SQL database there because rustc is way too chatty). Many parts of rust source fit there nicely. However, one of the most important things - types - is a very poor fit for a relational database because of the tree-structure.

I suspect that all queries involving type information require a live compiler anyway. It would though be nice if someone(=me?) would write a metadata-to-sqlite translator.

@sfackler

This comment has been minimized.

Show comment
Hide comment
@sfackler

sfackler Jan 19, 2016

Member

SQLite virtual tables could potentially be used to provide a SQL interface to compiler metadata without needing to go through the overhead of updating physical tables: https://www.sqlite.org/vtab.html.

The question of if the metadata is reasonably representable in SQL would still stand, of course.

Member

sfackler commented Jan 19, 2016

SQLite virtual tables could potentially be used to provide a SQL interface to compiler metadata without needing to go through the overhead of updating physical tables: https://www.sqlite.org/vtab.html.

The question of if the metadata is reasonably representable in SQL would still stand, of course.

@phildawes

This comment has been minimized.

Show comment
Hide comment
@phildawes

phildawes Jan 20, 2016

@nrc Hi Nick, Thanks for your reply, I'm glad the compiler api is not out of scope - I was under the impression that with this new cut-down proposal the compiler api wasn't deemed necessary because the RLS would provide the public interface.

I would however still like to argue that the compiler-api be the focus and priority

  • I'm skeptical about the big-database-of-code approach, and want the door left open for alternatives
  • I'd also like to see (and have a go at) projects to bring refactoring to rust

I think the api could be as straightforward as a focused save-analysis-style api (e.g. analysis-for-function / analysis-for-item, crate top-level-analysis (no function bodies)).

phildawes commented Jan 20, 2016

@nrc Hi Nick, Thanks for your reply, I'm glad the compiler api is not out of scope - I was under the impression that with this new cut-down proposal the compiler api wasn't deemed necessary because the RLS would provide the public interface.

I would however still like to argue that the compiler-api be the focus and priority

  • I'm skeptical about the big-database-of-code approach, and want the door left open for alternatives
  • I'd also like to see (and have a go at) projects to bring refactoring to rust

I think the api could be as straightforward as a focused save-analysis-style api (e.g. analysis-for-function / analysis-for-item, crate top-level-analysis (no function bodies)).

@phildawes

This comment has been minimized.

Show comment
Hide comment
@phildawes

phildawes Jan 20, 2016

To elaborate on the compiler api:

  • racer currently uses syntex-syntax for parsing rust code snippets, and this works surprisingly well. The api changes frequently (because it tracks the compiler libsyntax api), but keeping up with it is pretty ok and allows racer to be a crates.io project that can be built with rust stable. I don't think we need an alternative stable api for parsing at this point.
  • The new analysis api could start by mapping expression spans to type and definition paths. This is along the same lines as the save-analysis output, but I'm advocating that there be an rustc api for generating small pieces of this data (e.g. at the Item (function) level as a start). In order to keep things simple the input to this analysis api could be the same arguments you would pass to the compiler.

I think (guess!) the combination of these two could be enough to feed the RLS and enable it to be a crates.io project rather than something that must be shipped with the compiler. It also would leave the door open for alternative approaches and other tools that are currently out of scope for RLS.

phildawes commented Jan 20, 2016

To elaborate on the compiler api:

  • racer currently uses syntex-syntax for parsing rust code snippets, and this works surprisingly well. The api changes frequently (because it tracks the compiler libsyntax api), but keeping up with it is pretty ok and allows racer to be a crates.io project that can be built with rust stable. I don't think we need an alternative stable api for parsing at this point.
  • The new analysis api could start by mapping expression spans to type and definition paths. This is along the same lines as the save-analysis output, but I'm advocating that there be an rustc api for generating small pieces of this data (e.g. at the Item (function) level as a start). In order to keep things simple the input to this analysis api could be the same arguments you would pass to the compiler.

I think (guess!) the combination of these two could be enough to feed the RLS and enable it to be a crates.io project rather than something that must be shipped with the compiler. It also would leave the door open for alternative approaches and other tools that are currently out of scope for RLS.

@phildawes

This comment has been minimized.

Show comment
Hide comment
@phildawes

phildawes Jan 20, 2016

Hi @bruno-medeiros.

(Re: refactoring)

Hum, yes. Or the IDE developers have to modify RLS themselves to do that. I mean, RLS will be an open-source project, so it will be easy to add patches (or even fork/mod).

The problem is when you want to add something contentious to support some niche tooling, or try something experimental that might not pay off. Forking a project is a socially aggressive action, and it is a shame to have political fallout and hurt over some small technical point or direction. (we all saw it a couple of times on contentious rust PRs)

Also without a compiler-api RLS will probably need to be shipped with rustc making it hard to build an alternative tool.

You seem to not like this scenario, but what alternative could there be?

The alternative I'm advocating is that there be a compiler-analysis api to extract the analysis information. (the analysis information is not dissimilar to the JDT/CDT index-document you described. Rustc has something similar as a way to get information into DXR, although I think it might be partially broken at the moment).

phildawes commented Jan 20, 2016

Hi @bruno-medeiros.

(Re: refactoring)

Hum, yes. Or the IDE developers have to modify RLS themselves to do that. I mean, RLS will be an open-source project, so it will be easy to add patches (or even fork/mod).

The problem is when you want to add something contentious to support some niche tooling, or try something experimental that might not pay off. Forking a project is a socially aggressive action, and it is a shame to have political fallout and hurt over some small technical point or direction. (we all saw it a couple of times on contentious rust PRs)

Also without a compiler-api RLS will probably need to be shipped with rustc making it hard to build an alternative tool.

You seem to not like this scenario, but what alternative could there be?

The alternative I'm advocating is that there be a compiler-analysis api to extract the analysis information. (the analysis information is not dissimilar to the JDT/CDT index-document you described. Rustc has something similar as a way to get information into DXR, although I think it might be partially broken at the moment).

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Jan 20, 2016

Contributor

So, I've been sort of half-following this thread, but I wanted to throw in some thoughts. Sorry for not having read every comment in detail, I'm sure I'm repeating things that have been said. What's worse, I'm going to ask some questions at the end that have probably been answered. Humor me. :)

In general, I think that the idea of having an oracle process that serves as a kind of database for Rust metadata makes a lot of sense. The current compiler data strutures are not designed to support IDEs or random queries, and I don't feel like I would want that as a constraint on how the compiler is structured internally. In particular, the compiler as it is today was really written to be a simple batch compiler a la gcc. It expects to compile a single crate and generate some output. We're working on changing that model, via incremental compilation and other things, but it's going to be a long road I think.

In the meantime, I think it makes sense to have a second layer that aggregates the output of multiple compilations (e.g., for multiple crates) and keeps a coherent view of things. This would keep that data indexed however it makes sense to have the data be indexed and be able to answer queries like "what was the type of the expression here" or "what traits are implemented for this type" and so forth. Obviously I'm glossing over a ton of details here. This layer is more-or-less what I thought the Oracle is.

To me, it doesn't matter so much if this layer is actually a process that communicates via IPC, or some shared libraries that everybody uses. I sort of prefer processes for various reasons, but whatever. If it's some shared libraries, we'll presumably need some file locking or other mechanisms to cope with the scenario where the compiler is trying to update some shared repository of information and the IDE is trying to read it.

Over time, I could see that this oracle layer and the compiler move closer together, and maybe the compiler eventually becomes a nifty library that can be integrated more closely into IDEs. In that case the "oracle" would just be various APIs, and not really much in the way of code.

Does this overarching thought make sense? @nrc, is this roughly what you were thinking with RFC, or am I totally off base? If it doesn't make sense, what are the major alternatives?

Contributor

nikomatsakis commented Jan 20, 2016

So, I've been sort of half-following this thread, but I wanted to throw in some thoughts. Sorry for not having read every comment in detail, I'm sure I'm repeating things that have been said. What's worse, I'm going to ask some questions at the end that have probably been answered. Humor me. :)

In general, I think that the idea of having an oracle process that serves as a kind of database for Rust metadata makes a lot of sense. The current compiler data strutures are not designed to support IDEs or random queries, and I don't feel like I would want that as a constraint on how the compiler is structured internally. In particular, the compiler as it is today was really written to be a simple batch compiler a la gcc. It expects to compile a single crate and generate some output. We're working on changing that model, via incremental compilation and other things, but it's going to be a long road I think.

In the meantime, I think it makes sense to have a second layer that aggregates the output of multiple compilations (e.g., for multiple crates) and keeps a coherent view of things. This would keep that data indexed however it makes sense to have the data be indexed and be able to answer queries like "what was the type of the expression here" or "what traits are implemented for this type" and so forth. Obviously I'm glossing over a ton of details here. This layer is more-or-less what I thought the Oracle is.

To me, it doesn't matter so much if this layer is actually a process that communicates via IPC, or some shared libraries that everybody uses. I sort of prefer processes for various reasons, but whatever. If it's some shared libraries, we'll presumably need some file locking or other mechanisms to cope with the scenario where the compiler is trying to update some shared repository of information and the IDE is trying to read it.

Over time, I could see that this oracle layer and the compiler move closer together, and maybe the compiler eventually becomes a nifty library that can be integrated more closely into IDEs. In that case the "oracle" would just be various APIs, and not really much in the way of code.

Does this overarching thought make sense? @nrc, is this roughly what you were thinking with RFC, or am I totally off base? If it doesn't make sense, what are the major alternatives?

@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Jan 21, 2016

Member

@nikomatsakis spot on.

Member

nrc commented Jan 21, 2016

@nikomatsakis spot on.

@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Jan 21, 2016

Member

@phildawes my intention is for the RLS to live outside the Rust repo and use the compiler as a library, like any other tool. Clearly it is going to be unstable for a long time (which is another good reason to put it in its own process instead of linking it).

My current model for a compiler API is that the tool uses libsyntax to parse source code, and then has the AST. You can then use AST nodes (currently by node id; spans have issues with macros) to query the compiler about types and so forth. This works pretty well, however, we might do better, for example by giving a synthetic 'AST' which also includes type information, etc. (the HAIR approach). We should have a discussion about how best to do this, but not here.

I have no idea what functions should be presented by the API, I've been adding stuff in a pretty ad hoc basic to the save-analysis API (note, the API, not the dumps, which are both working fine, btw). My plan for the API is more to just implement anything anyone wants (and is possible) until we have a good idea of the use cases, then make an RFC to stabilise some of it. If there are specific things you want, please let me know and/or file issues (or PRs) on the Rust repo.

Member

nrc commented Jan 21, 2016

@phildawes my intention is for the RLS to live outside the Rust repo and use the compiler as a library, like any other tool. Clearly it is going to be unstable for a long time (which is another good reason to put it in its own process instead of linking it).

My current model for a compiler API is that the tool uses libsyntax to parse source code, and then has the AST. You can then use AST nodes (currently by node id; spans have issues with macros) to query the compiler about types and so forth. This works pretty well, however, we might do better, for example by giving a synthetic 'AST' which also includes type information, etc. (the HAIR approach). We should have a discussion about how best to do this, but not here.

I have no idea what functions should be presented by the API, I've been adding stuff in a pretty ad hoc basic to the save-analysis API (note, the API, not the dumps, which are both working fine, btw). My plan for the API is more to just implement anything anyone wants (and is possible) until we have a good idea of the use cases, then make an RFC to stabilise some of it. If there are specific things you want, please let me know and/or file issues (or PRs) on the Rust repo.

@phildawes

This comment has been minimized.

Show comment
Hide comment
@phildawes

phildawes Jan 21, 2016

@nrc, @nikomatsakis

Hi Niko, Hi Nick,

In general, I think that the idea of having an oracle process that serves as a kind of database for Rust metadata makes a lot of sense.

Personally I'm wary of the database-of-code metaphor. I think it hides the two hard problems of rust code completion, and instead encourages thinking about indexing and cache invalidation. As I see it, the two hard technical problems that need solving are:

  1. complete-as-you-type requires rustc to get involved in the hot path.
    • type inference of the expression-being-completed can't be pre-cached. (database is no help here)
    • rustc must be capable of delivering the type (of the bit before the dot) in <100ms
  2. complete-as-you-type requires processing incomplete/broken code
    • the crate isn't semantically complete or compileable (again, database is no help here)

I don't think a code index is a requirement for fast completion. This is counter-intuitive, but racer specifically avoids doing it on grounds of simplicity and is fast, even for big multi-crate projects. (some discussion here).

I also don't think a database is required for find-references, but I have no evidence for this, only intuition

phildawes commented Jan 21, 2016

@nrc, @nikomatsakis

Hi Niko, Hi Nick,

In general, I think that the idea of having an oracle process that serves as a kind of database for Rust metadata makes a lot of sense.

Personally I'm wary of the database-of-code metaphor. I think it hides the two hard problems of rust code completion, and instead encourages thinking about indexing and cache invalidation. As I see it, the two hard technical problems that need solving are:

  1. complete-as-you-type requires rustc to get involved in the hot path.
    • type inference of the expression-being-completed can't be pre-cached. (database is no help here)
    • rustc must be capable of delivering the type (of the bit before the dot) in <100ms
  2. complete-as-you-type requires processing incomplete/broken code
    • the crate isn't semantically complete or compileable (again, database is no help here)

I don't think a code index is a requirement for fast completion. This is counter-intuitive, but racer specifically avoids doing it on grounds of simplicity and is fast, even for big multi-crate projects. (some discussion here).

I also don't think a database is required for find-references, but I have no evidence for this, only intuition

@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Jan 21, 2016

Member

Hi @phildawes,

1 I agree, I probably wouldn't use the database at all for code completion info - that is a purely local query so can work straight off the compiler (on a slightly separate note, I see the compiler/RLS only returning info after the ., updating the actual code completion suggestions based on what is being typed is the code completion tool's job).

  1. I don't think this is an issue. The compiler will process broken code (I've been working on more support for that this week). Code that has changed is invalidated from the DB, parts of code that we can type check are added, the rest is ignored. E.g., if the user types x.f; but there is only a foo field, the DB will have info about x but will ignore the f (we might even record that there is a field access, but the field name is unknown, if that is useful).

Find references is definitely the motivation for using a DB.

Member

nrc commented Jan 21, 2016

Hi @phildawes,

1 I agree, I probably wouldn't use the database at all for code completion info - that is a purely local query so can work straight off the compiler (on a slightly separate note, I see the compiler/RLS only returning info after the ., updating the actual code completion suggestions based on what is being typed is the code completion tool's job).

  1. I don't think this is an issue. The compiler will process broken code (I've been working on more support for that this week). Code that has changed is invalidated from the DB, parts of code that we can type check are added, the rest is ignored. E.g., if the user types x.f; but there is only a foo field, the DB will have info about x but will ignore the f (we might even record that there is a field access, but the field name is unknown, if that is useful).

Find references is definitely the motivation for using a DB.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Jan 22, 2016

Contributor

Personally I'm wary of the database-of-code metaphor. I think it
hides the two hard problems of rust code completion, and instead
encourages thinking about indexing and cache invalidation.

This makes a lot of sense. I certainly agree that adding an oracle
doesn't get us fast code completion for free -- that seems to require
not just incr. comp., but also a bit more work towards lazy
compilation and better error recovery (some of which is ongoing, as
@nrc notes). It does seem like an oracle would help for Find All
References, and I'm not sure what else.

Still, it seems to me that even in code completion, there is perhaps a
role for an oracle. In particular, you talk about wanting to determine
the type of the receiver of a method, but once you know that receiver,
you still want to figure out things like "what are the set of traits
that are implemented for that type?" (and note that those traits may
not be imported, so the compiler might not consider them
normally). This might be an ideal place for an oracle to come in.

Moreover, some completion scenarios don't necessarily require knowing
the full type, e.g., if I write some_fn(, then you can likely
resolve some_fn to a specific item (but in fact you really want to
be resolving some_fn all along to potential matches, even before
I've finished typing it).

Maybe there's another way to say what I'm saying. It seems clear that
"ideal IDE integration" is going to be a long time coming, and we're
going to be evolving how rustc works for some time. But in the short
to medium time frame, I expect you are going to want to do fast,
approximate queries across several crates, and it seems like we can
get that more easily through some spearate oracle (you will also
want to be able to ask rustc for stuff and get back a fast answer, of
course).

And if in the future it happens that rustc gets fast enough that we
don't need a database to answer those queries, we can just recompute
the data, that's fine, right? All the old code that expected
potentially stale answers should be fine when their answers are 100%
up to date?

Contributor

nikomatsakis commented Jan 22, 2016

Personally I'm wary of the database-of-code metaphor. I think it
hides the two hard problems of rust code completion, and instead
encourages thinking about indexing and cache invalidation.

This makes a lot of sense. I certainly agree that adding an oracle
doesn't get us fast code completion for free -- that seems to require
not just incr. comp., but also a bit more work towards lazy
compilation and better error recovery (some of which is ongoing, as
@nrc notes). It does seem like an oracle would help for Find All
References, and I'm not sure what else.

Still, it seems to me that even in code completion, there is perhaps a
role for an oracle. In particular, you talk about wanting to determine
the type of the receiver of a method, but once you know that receiver,
you still want to figure out things like "what are the set of traits
that are implemented for that type?" (and note that those traits may
not be imported, so the compiler might not consider them
normally). This might be an ideal place for an oracle to come in.

Moreover, some completion scenarios don't necessarily require knowing
the full type, e.g., if I write some_fn(, then you can likely
resolve some_fn to a specific item (but in fact you really want to
be resolving some_fn all along to potential matches, even before
I've finished typing it).

Maybe there's another way to say what I'm saying. It seems clear that
"ideal IDE integration" is going to be a long time coming, and we're
going to be evolving how rustc works for some time. But in the short
to medium time frame, I expect you are going to want to do fast,
approximate queries across several crates, and it seems like we can
get that more easily through some spearate oracle (you will also
want to be able to ask rustc for stuff and get back a fast answer, of
course).

And if in the future it happens that rustc gets fast enough that we
don't need a database to answer those queries, we can just recompute
the data, that's fine, right? All the old code that expected
potentially stale answers should be fine when their answers are 100%
up to date?

@bruno-medeiros

This comment has been minimized.

Show comment
Hide comment
@bruno-medeiros

bruno-medeiros Jan 22, 2016

https://crates.io/crates/syntex_syntax/

Speaking of, who maintains this project? Are the Rust developers involved?

I think changes and improvements to the Rust parser would a nice incremental step and starting point to improving the toolchain, with regards to how it's used by IDEs or IDE related tools (like Racer).

I've been working on one such tool, https://github.com/RustDT/Rainicorn (formerly called rust-parse-describe, I mentioned in a previous comment some weeks ago), also using syntex_syntax, and I can see some avenues for improvement already. For starters a panic-less parser would be nice, the way the parser currently works makes my code a bit more complicated than would otherwise be necessary.

Forking a project is a socially aggressive action, and it is a shame to have political fallout and hurt over some small technical point or direction.

I didn't mean forking as in a divergent, or organizational fork (duplication of efforts, vastly incompatible code baselines, no communication, development heading in different directions, etc.).

I meant fork as in the Git way: a clone/branch that you create, you make some modifications or additions, but it doesn't diverge that much from the upstream source, and you try to merge updates from the upstream source regularly. And occasionally you might submit patches to upstream as well.

bruno-medeiros commented Jan 22, 2016

https://crates.io/crates/syntex_syntax/

Speaking of, who maintains this project? Are the Rust developers involved?

I think changes and improvements to the Rust parser would a nice incremental step and starting point to improving the toolchain, with regards to how it's used by IDEs or IDE related tools (like Racer).

I've been working on one such tool, https://github.com/RustDT/Rainicorn (formerly called rust-parse-describe, I mentioned in a previous comment some weeks ago), also using syntex_syntax, and I can see some avenues for improvement already. For starters a panic-less parser would be nice, the way the parser currently works makes my code a bit more complicated than would otherwise be necessary.

Forking a project is a socially aggressive action, and it is a shame to have political fallout and hurt over some small technical point or direction.

I didn't mean forking as in a divergent, or organizational fork (duplication of efforts, vastly incompatible code baselines, no communication, development heading in different directions, etc.).

I meant fork as in the Git way: a clone/branch that you create, you make some modifications or additions, but it doesn't diverge that much from the upstream source, and you try to merge updates from the upstream source regularly. And occasionally you might submit patches to upstream as well.

@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Jan 22, 2016

Member

@bruno-medeiros syntex is maintained by @erickt, who is a core member of the community (community and moderation teams, as well as just being generally involved), but not part of the compiler team. It is pretty much just a straight clone of libsyntax from the compiler, so improvements to the compiler's parser show up in syntex quite quickly.

Part of the plan with procedural macros/syntax extensions is to present a stable interface for them to work on, at which point syntex gets a lot less necessary (only useful for tools). In the long term I'd like to stabilise enough of libsyntax that tools don't need it either.

There is work going on to make the parser panic-less, it no longer panics on error. I've also been doing some work on error recovery, for example rust-lang/rust#31065 adds some error correction for missing identifiers. Ideally it should be panic-free under normal use and recover from most errors within the next few months.

Member

nrc commented Jan 22, 2016

@bruno-medeiros syntex is maintained by @erickt, who is a core member of the community (community and moderation teams, as well as just being generally involved), but not part of the compiler team. It is pretty much just a straight clone of libsyntax from the compiler, so improvements to the compiler's parser show up in syntex quite quickly.

Part of the plan with procedural macros/syntax extensions is to present a stable interface for them to work on, at which point syntex gets a lot less necessary (only useful for tools). In the long term I'd like to stabilise enough of libsyntax that tools don't need it either.

There is work going on to make the parser panic-less, it no longer panics on error. I've also been doing some work on error recovery, for example rust-lang/rust#31065 adds some error correction for missing identifiers. Ideally it should be panic-free under normal use and recover from most errors within the next few months.

@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Jan 22, 2016

Member

@nikomatsakis I did not intend the DB to be used for the code completion suggestions. It is useful for find all references, also queries like find all impls of a given trait, find all sub-traits, etc.

Member

nrc commented Jan 22, 2016

@nikomatsakis I did not intend the DB to be used for the code completion suggestions. It is useful for find all references, also queries like find all impls of a given trait, find all sub-traits, etc.

@bruno-medeiros

This comment has been minimized.

Show comment
Hide comment
@bruno-medeiros

bruno-medeiros Jan 22, 2016

Still, it seems to me that even in code completion, there is perhaps a role for an oracle

Just to be clear, the Oracle, as in, "the resident process that is responsible for serving requests of various sorts to the IDE", it should handle code completion as well. Even if the data structures that are used to determine the results of code completion are entirely different from the data structures used for say, find-references, it makes no sense for this to be two different processes, or two different tools. This is because at the very minimum, these two operations can share cached AST data, not to mention that eventually (and sooner that later) we will want the Oracle to support functionality to manage/supply dirty editor buffers (ie, use a document that is being edited in memory in an IDE, but has not yet been persisted to disk). Even the functionality I'm coding in the Rainicorn tool should ideally also eventually be integrated into an oracle.

Of course, as an early prototype, it's okay for different operations to be handled by different tools, etc. but the end goal should be to integrate everything in the oracle, all-knowing that it is. 😉 (gotta hand it to the Go guys, they choose the perfect name - apart from the trademark thing.. 😝 )

@nrc BTW, I was just looking at the Nim language, and to my surprise found out they have an "oracle" tool already: http://nim-lang.org/docs/idetools.html
And by the looks of it quite more advanced than the Go-oracle, it actually seems to achieve all those key aspects that were mentioned before:

  • Handles incorrect/incomplete code well
  • Is a resident process
  • Supports managing dirty buffers
  • Supports all sorts of IDE query operations in a single tool: resolve-definition ("Definitions"), code-completion ("Suggestions"), find references ("Symbol usages"), etc.. (Something like parse-analysis seems to be missing though)

bruno-medeiros commented Jan 22, 2016

Still, it seems to me that even in code completion, there is perhaps a role for an oracle

Just to be clear, the Oracle, as in, "the resident process that is responsible for serving requests of various sorts to the IDE", it should handle code completion as well. Even if the data structures that are used to determine the results of code completion are entirely different from the data structures used for say, find-references, it makes no sense for this to be two different processes, or two different tools. This is because at the very minimum, these two operations can share cached AST data, not to mention that eventually (and sooner that later) we will want the Oracle to support functionality to manage/supply dirty editor buffers (ie, use a document that is being edited in memory in an IDE, but has not yet been persisted to disk). Even the functionality I'm coding in the Rainicorn tool should ideally also eventually be integrated into an oracle.

Of course, as an early prototype, it's okay for different operations to be handled by different tools, etc. but the end goal should be to integrate everything in the oracle, all-knowing that it is. 😉 (gotta hand it to the Go guys, they choose the perfect name - apart from the trademark thing.. 😝 )

@nrc BTW, I was just looking at the Nim language, and to my surprise found out they have an "oracle" tool already: http://nim-lang.org/docs/idetools.html
And by the looks of it quite more advanced than the Go-oracle, it actually seems to achieve all those key aspects that were mentioned before:

  • Handles incorrect/incomplete code well
  • Is a resident process
  • Supports managing dirty buffers
  • Supports all sorts of IDE query operations in a single tool: resolve-definition ("Definitions"), code-completion ("Suggestions"), find references ("Symbol usages"), etc.. (Something like parse-analysis seems to be missing though)
@bruno-medeiros

This comment has been minimized.

Show comment
Hide comment
@bruno-medeiros

bruno-medeiros Jan 22, 2016

Ideally it should be panic-free under normal use and recover from most errors within the next few months.

Sweet 👍

bruno-medeiros commented Jan 22, 2016

Ideally it should be panic-free under normal use and recover from most errors within the next few months.

Sweet 👍

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Jan 29, 2016

Member

🔔 This RFC is now entering its two-week long final comment period 🔔

Member

alexcrichton commented Jan 29, 2016

🔔 This RFC is now entering its two-week long final comment period 🔔

@lilianmoraru

This comment has been minimized.

Show comment
Hide comment
@lilianmoraru

lilianmoraru Jan 30, 2016

Don't know if it is of any help but I will throw it out there.
QtCreator(mainly C++ IDE) has a plugin that uses Clang to offer C++ code completion.
Here is the code:
http://code.qt.io/cgit/qt-creator/qt-creator.git/tree/src/libs/clangbackendipc
http://code.qt.io/cgit/qt-creator/qt-creator.git/tree/src/plugins/clangcodemodel

QtCreator also has an older custom C++ code model(that is being replaced by the clang one) that was a lot faster and seems like it is accepted in the community that the code model that uses the compiler will be considerably slower, but it offers more information and is more accurate.
I mention the speed because I saw @phildawes mentioning the hard requirements on the compiler to deliver the information in <100ms and I am not sure if we should expect it to be fast.

lilianmoraru commented Jan 30, 2016

Don't know if it is of any help but I will throw it out there.
QtCreator(mainly C++ IDE) has a plugin that uses Clang to offer C++ code completion.
Here is the code:
http://code.qt.io/cgit/qt-creator/qt-creator.git/tree/src/libs/clangbackendipc
http://code.qt.io/cgit/qt-creator/qt-creator.git/tree/src/plugins/clangcodemodel

QtCreator also has an older custom C++ code model(that is being replaced by the clang one) that was a lot faster and seems like it is accepted in the community that the code model that uses the compiler will be considerably slower, but it offers more information and is more accurate.
I mention the speed because I saw @phildawes mentioning the hard requirements on the compiler to deliver the information in <100ms and I am not sure if we should expect it to be fast.

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Feb 1, 2016

@lilianmoraru The reason the compiler needs to deliver the information so quickly is that the GUI is waiting on it. The user is expecting the completion to appear as soon as the user presses . and that requires the compiler to respond fast.

DemiMarie commented Feb 1, 2016

@lilianmoraru The reason the compiler needs to deliver the information so quickly is that the GUI is waiting on it. The user is expecting the completion to appear as soon as the user presses . and that requires the compiler to respond fast.

@michaelwoerister

This comment has been minimized.

Show comment
Hide comment
@michaelwoerister

michaelwoerister Feb 4, 2016

I'm in favor of accepting this RFC. The approach seems worth exploring and it does not preclude improving the compilers amenability for being used as a library. On the contrary, I think the compiler's APIs will benefit from trying to build the RLS on top of it and it can only help if people on the compiler team are actual clients of their own APIs. The RFC leaves many open questions when it comes to specifics and we just need a prototype implementation and the experience that comes from building that in order to decide how to proceed further. Worst case, we'll learn a bunch of stuff on what doesn't work :)

michaelwoerister commented Feb 4, 2016

I'm in favor of accepting this RFC. The approach seems worth exploring and it does not preclude improving the compilers amenability for being used as a library. On the contrary, I think the compiler's APIs will benefit from trying to build the RLS on top of it and it can only help if people on the compiler team are actual clients of their own APIs. The RFC leaves many open questions when it comes to specifics and we just need a prototype implementation and the experience that comes from building that in order to decide how to proceed further. Worst case, we'll learn a bunch of stuff on what doesn't work :)

@erkinalp

This comment has been minimized.

Show comment
Hide comment
@erkinalp

erkinalp Feb 8, 2016

1-based line numbers and 0-based column numbers please.

erkinalp commented Feb 8, 2016

1-based line numbers and 0-based column numbers please.

@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Feb 9, 2016

Member

@erkinalp why?

Member

nrc commented Feb 9, 2016

@erkinalp why?

@bruno-medeiros

This comment has been minimized.

Show comment
Hide comment
@bruno-medeiros

bruno-medeiros Feb 9, 2016

@erkinalp why?

I was wondering the same, why is a mixed format being used (1-based in one, and 0-based for the other)

bruno-medeiros commented Feb 9, 2016

@erkinalp why?

I was wondering the same, why is a mixed format being used (1-based in one, and 0-based for the other)

@erkinalp

This comment has been minimized.

Show comment
Hide comment
@erkinalp

erkinalp Feb 9, 2016

Emacs uses 0-based column numbers.

erkinalp commented Feb 9, 2016

Emacs uses 0-based column numbers.

@ticki

This comment has been minimized.

Show comment
Hide comment
@ticki

ticki Feb 9, 2016

Contributor

@erkinalp That's a perfect reason for not doing that 😜 .

Contributor

ticki commented Feb 9, 2016

@erkinalp That's a perfect reason for not doing that 😜 .

@erkinalp

This comment has been minimized.

Show comment
Hide comment
@nrc

This comment has been minimized.

Show comment
Hide comment
@nrc

nrc Feb 9, 2016

Member

afaik, there is no standard for editors to use 1-based line numbers. Having both 0-based seems like the least confusing thing to do, it's easy enough for editors to add one to each line number.

Member

nrc commented Feb 9, 2016

afaik, there is no standard for editors to use 1-based line numbers. Having both 0-based seems like the least confusing thing to do, it's easy enough for editors to add one to each line number.

@dgrunwald

This comment has been minimized.

Show comment
Hide comment
@dgrunwald

dgrunwald Feb 9, 2016

Contributor

Compilers (including rustc) tend to use 1-based line and column numbers in their error messages. Following that standard seems like the least confusing thing to do. I certainly wouldn't expect 0-based line numbers.

But the concepts of "line" and "column" are ill-defined anyways.
Does a vertical tab (\v) count as a new line? What about form feed? What about all the other exotic Unicode newlines?
Does a tab count as N columns (where N is configurable; usually 4 or 8?), does it advance to the next multiple of N columns, or does it count as only 1 character?
Are full-width characters like 'x' one or two columns wide?
Or does the editor count each Unicode codepoint as 1 column? Maybe a 'column' really is a UTF-8 byte offset within the line? Maybe it's measured in UTF-16 code units? Grapheme clusters?

It's difficult to find two editors that agree in their line/column counting for all possible input files.

Contributor

dgrunwald commented Feb 9, 2016

Compilers (including rustc) tend to use 1-based line and column numbers in their error messages. Following that standard seems like the least confusing thing to do. I certainly wouldn't expect 0-based line numbers.

But the concepts of "line" and "column" are ill-defined anyways.
Does a vertical tab (\v) count as a new line? What about form feed? What about all the other exotic Unicode newlines?
Does a tab count as N columns (where N is configurable; usually 4 or 8?), does it advance to the next multiple of N columns, or does it count as only 1 character?
Are full-width characters like 'x' one or two columns wide?
Or does the editor count each Unicode codepoint as 1 column? Maybe a 'column' really is a UTF-8 byte offset within the line? Maybe it's measured in UTF-16 code units? Grapheme clusters?

It's difficult to find two editors that agree in their line/column counting for all possible input files.

@Valloric

This comment has been minimized.

Show comment
Hide comment
@Valloric

Valloric Feb 9, 2016

I've integrated 5+ semantic engines in ycmd, and the only thing that makes sense is 1-based line and column numbers. Columns are byte offsets in UTF-8. Done.

it's easy enough for editors to add one to each line number.

But why should they? Line & column numbers coming from your oracle will be shown to the user and they expect 1-based numbering.

there is no standard for editors to use 1-based line numbers.

And yet they ~all do use 1-based numbers in the user interface. When you put your caret on the first line in the file, the editor doesn't say the line number is 0, it says it's 1. Same for columns.

Valloric commented Feb 9, 2016

I've integrated 5+ semantic engines in ycmd, and the only thing that makes sense is 1-based line and column numbers. Columns are byte offsets in UTF-8. Done.

it's easy enough for editors to add one to each line number.

But why should they? Line & column numbers coming from your oracle will be shown to the user and they expect 1-based numbering.

there is no standard for editors to use 1-based line numbers.

And yet they ~all do use 1-based numbers in the user interface. When you put your caret on the first line in the file, the editor doesn't say the line number is 0, it says it's 1. Same for columns.

@erkinalp

This comment has been minimized.

Show comment
Hide comment
@erkinalp

erkinalp Feb 10, 2016

Vertical tab means skip one line below and continue from same column offset.
CR, CR+LF, LF, LS and NEL are regular line feeds.
FF and PS count as two lines instead of one.

erkinalp commented Feb 10, 2016

Vertical tab means skip one line below and continue from same column offset.
CR, CR+LF, LF, LS and NEL are regular line feeds.
FF and PS count as two lines instead of one.

@bruno-medeiros

This comment has been minimized.

Show comment
Hide comment
@bruno-medeiros

bruno-medeiros Feb 10, 2016

I've integrated 5+ semantic engines in ycmd, and the only thing that makes sense is 1-based line and column numbers. Columns are byte offsets in UTF-8. Done.

Why byte offsets and not Unicode character offsets? It's not like an error or position for a Rust symbol will ever start in the middle of a Unicode character.

there is no standard for editors to use 1-based line numbers.

And yet they ~all do use 1-based numbers in the user interface. When you put your caret on the first line in the file, the editor doesn't say the line number is 0, it says it's 1. Same for columns.

Because the internal API for lines and columns can be 0-based, despite the UI being 1-based. This is certainly the case for Eclipse, for IntelliJ, and probably for most IDEs/editors out there. It would not surprise me if Vim is the odd one out... 😆

bruno-medeiros commented Feb 10, 2016

I've integrated 5+ semantic engines in ycmd, and the only thing that makes sense is 1-based line and column numbers. Columns are byte offsets in UTF-8. Done.

Why byte offsets and not Unicode character offsets? It's not like an error or position for a Rust symbol will ever start in the middle of a Unicode character.

there is no standard for editors to use 1-based line numbers.

And yet they ~all do use 1-based numbers in the user interface. When you put your caret on the first line in the file, the editor doesn't say the line number is 0, it says it's 1. Same for columns.

Because the internal API for lines and columns can be 0-based, despite the UI being 1-based. This is certainly the case for Eclipse, for IntelliJ, and probably for most IDEs/editors out there. It would not surprise me if Vim is the odd one out... 😆

@adelarsq

This comment has been minimized.

Show comment
Hide comment
@adelarsq

adelarsq Feb 10, 2016

Keep it simple. Just use 0-based for lines and columns.

adelarsq commented Feb 10, 2016

Keep it simple. Just use 0-based for lines and columns.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton Feb 11, 2016

Member

This RFC was discussed during the tools team triage today and the decision was to merge. This RFC is still at a somewhat high level and some minor details can continue to be ironed out in the implementation over time, but there seems to be widespread agreement about the body of the RFC here.

Thanks again for the discussion everybody!

Member

alexcrichton commented Feb 11, 2016

This RFC was discussed during the tools team triage today and the decision was to merge. This RFC is still at a somewhat high level and some minor details can continue to be ironed out in the implementation over time, but there seems to be widespread agreement about the body of the RFC here.

Thanks again for the discussion everybody!

@alexcrichton alexcrichton merged commit 37d72be into rust-lang:master Feb 11, 2016

@chriskrycho chriskrycho referenced this pull request Mar 29, 2017

Closed

Document all features #9

18 of 48 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment