Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not use HTTP directly? #86

Closed
ianks opened this issue Oct 6, 2016 · 38 comments
Closed

Why not use HTTP directly? #86

ianks opened this issue Oct 6, 2016 · 38 comments
Labels
clarification help wanted Issues identified as good community contribution opportunities

Comments

@ianks
Copy link

ianks commented Oct 6, 2016

The base protocol consists of a header and a content part (comparable to HTTP). The header and content part are separated by a '\r\n'.

Why not go the full route and support HTTP directly?

Benefits

  1. Clients will not have to implement their own language-server-protocol parser. HTTP has plenty of robust, performant parsers already; we could leverage these.
  2. Ability to bind to a port and communicate over TCP, so hosts could support multiple clients.
  3. Ease of encryption via HTTPS
  4. Status codes could provide better insight into success/error cases.

Downfalls

  1. HTTP2 is a binary protocol and that may reduce debug-ability. Possible solution: stick to HTTP 1.1
  2. More bytes: at the very least, we need to send a request line, and potentially a few more headers. I would tend to think this is trivial, but its worth mentioning.
  3. Would need to figure how sessions are handled

More info

http://json-rpc.org/wiki/specification#a2.2JSON-RPCoverHTTP

@gorkem
Copy link
Contributor

gorkem commented Oct 7, 2016

A server implementation can still use HTTP, protocol does not prevent HTTP to be used as the transport. However, explicitly limiting the transport to HTTP would prevent servers to use transports such as sockets or named pipes which are most commonly used by the implementations today.

@ianks
Copy link
Author

ianks commented Oct 7, 2016

HTTP could still be used over named pipes and sockets, FWIW

@mickaelistria
Copy link

+1 for this, and when I had the opportunity to ask the VSCode team about it, they said that main reason for not using HTTP is performance (which is a major concern for the language server). Opening to HTTP(s) would allow to provide language servers as SaaS, that clients could connect to. Although not being very relevant for some operations such as completion, I see a huge opportunity for remote "build servers" to run builds or other complex analysis and report back diagnostics and remediation after some delay.
As http can allow authentication, it also enables the possibility for a server to become more user-specific and portable: the server can learn a few things for a given user, store them, and if user connects with another editor on another machine to the same server with the same credentials, server knows who's this user and can use it to customize its results.

However, I agree with Gorkem that the protocol shouldn't restrain to one transport and still allow stdio, sockets and named pipes as those are the one that work best for immediate local integration. However, it should also open the door to HTTP to allow more explicitly SaaS and user authentication.

@dbaeumer
Copy link
Member

dbaeumer commented Oct 7, 2016

The language server protocol is basically independent of the transport. So from the protocol point of view there is nothing that limits it to a special transport.

In reality the json-rpc library we use already has two different kind of transports (https://github.com/Microsoft/vscode-languageserver-node/blob/master/jsonrpc/src/messageReader.ts#L248). The first one is over streams using the described base protocol. The second is node ipc which doesn't have any header information. It simply sends and receives JSON objects. Expanding this to HTTP should be straight forward including the client library.

What needs to be done is to update the documentation. Now that I think off we should move the Base protocol to https://github.com/Microsoft/vscode-languageserver-node and simply say here that different transports are possible.

@dbaeumer dbaeumer added the help wanted Issues identified as good community contribution opportunities label Oct 7, 2016
@bruno-medeiros
Copy link

I agree with @dbaeumer here: LSP is independent of the transport - and it wouldn't be good to try to specifiy the transport at LSP level, since they are two different concerns, two different layers of abstraction.

If anything, the header part of the LSP ("Content-Length")seems like it could be simplified and removed away. Just sending the JSON_RPC json objects through the stream should be enough, and from what you say, you're doing this already in an internal case for VSCode.

@ianks
Copy link
Author

ianks commented Oct 7, 2016

I would also tend to think removing the 'Content-Length' requirement is actually a better solution if that is possible.

In this case, we are specifying on just using JSON-RPC with a configurable layer of transport, so the whole:

The base protocol consists of a header and a content part (comparable to HTTP). The header and content part are separated by a '\r\n'.

Can be removed?

@felixfbecker
Copy link

I don't think this makes much sense. HTTP always executes a HTTP method on a ressource. What would these be?
HTTP is an application level protocol over TCP on the transport layer. Just like LSP is an application level protocol over TCP on the transport layer.

@masaeedu
Copy link
Contributor

masaeedu commented Oct 26, 2016

@bruno-medeiros If you remove that header, how would you know how much of the stream to consume for the next message? JSON parsing the stream on the fly would be complicated, fault-intolerant, and would prevent concurrent parsing/processing of messages.

The protocol doesn't need all the metadata overhead of HTTP, but it is useful to have a content length header, as is the case with many protocols.

@bruno-medeiros
Copy link

@bruno-medeiros If you remove that header, how would you know how much of the stream to consume for the next message? JSON parsing the stream on the fly would be complicated, fault-intolerant, and would prevent concurrent parsing/processing of messages.

@masaeedu huh? Any JSON parsing library worth their salt should be able to parse a JSON object from a stream of characters/bytes (and do it again on same stream). It should not be complicated, but rather trivial - at least for languages that have built-in a concept of stream of bytes/characters.

As for fault-tolerance, so far the protocol assumes the connection is made through a stream of bytes from a connection (either TCP, or stdin/stdout) that guarantees sequential delivery of bytes - until the connection is closed or dropped abnormally. But if that happens the connection is dead anyways, you can't recover other than reconnecting - and that makes no difference whatsover if you're using Content Length header or not.

Also, as for "preventing concurrent parsing/processing of messages" - only the JSON parsing can no longer be concurrent with this change, true that. But the actual method execution can still be concurrent.

In any case, I'm not advocating either way that the Content-Length header be dropped, I just pointed out it could be done.

@felixfbecker
Copy link

felixfbecker commented Oct 26, 2016

huh? Any JSON parsing library worth their salt should be able to parse a JSON object from a stream of characters/bytes (and do it again on same stream). It should not be complicated, but rather trivial - at least for languages that have built-in a concept of stream of bytes/characters.

That is so totally not the case. In what language is it trivial to parse a stream of JSON? Not even in JavaScript. JSON.stringify() take a string, not a stream.

A Content-Length header makes a lot of sense. You cannot use \r\n as a terminator because JSON can include whitepace, and parsing JSON on-the-fly is unneeded complexity. It also allows you to see the progress of the response (think workspace/symbol, for a huge project the response could be millions of symbols!).

@masaeedu
Copy link
Contributor

masaeedu commented Oct 26, 2016

@bruno-medeiros Any JSON library worth its salt can parse a single JSON object from a stream of bytes, provided an encoding. You can't parse a stream of undelimited JSON objects, especially not when you are expected to ignore malformed data and can have different encodings for each object. It is not impossible to do, but many JSON libraries won't have it out of the box.

But if that happens the connection is dead anyways, you can't recover other than reconnecting - and that makes no difference whatsover if you're using Content Length header or not.

Fault tolerance isn't just about dealing with dropped connections; you need to be able to deal with invalid data without crashing the language server or flushing all the other messages in your input stream. If I send [ 'hello ] as a message, I don't want the next 20 messages to be interpreted as a string.

Also, as for "preventing concurrent parsing/processing of messages" - only the JSON parsing can no longer be concurrent with this change, true that. But the actual method execution can still be concurrent.

Glad we agree re: third point. With the volume of data you can expect in a language server, I think parallelizing deserialization is going to be important for performance.

@bruno-medeiros
Copy link

bruno-medeiros commented Oct 26, 2016

That is so totally not the case. In what language is it trivial to parse a stream of JSON? Not even in JavaScript. JSON.stringify() take a string, not a stream.

Java with GSON library. I'm sure there is GSON for Go as well. With Rust you have https://github.com/netvl/xml-rs (streaming API), and I'd bet there is something similar for D. I don't know much about dynamic languages like Javascript cause I don't like dynamic typing, however a quick google search pointed out this library for Javascript: http://oboejs.com/ , so there you go.

@felixfbecker
Copy link

JSON is JavaScript Object Notation and has seen most of its adoption because it is built-in into JavaScript with JSON.stringify() (which is fast C++ API). Requiring a third-party streaming library for both clients and servers just to avoid a single header is unneeded overhead. The whole LSP comes from TypeScript language service and VS Code (written in TypeScript) so this kind of stuff does matter.

@felixfbecker
Copy link

But it also isn't very useful to use streaming here. You cannot handle the response anyway until you have the whole object.

@bruno-medeiros
Copy link

@bruno-medeiros Any JSON library worth its salt can parse a single JSON object from a stream of bytes, provided an encoding. You can't parse a stream of undelimited JSON objects, especially not when you are expected to ignore malformed data and can have different encodings for each object. It is not impossible to do, but many JSON libraries won't have it out of the box.

You can't have different encoding for different objects ATM, only UTF8. See https://github.com/Microsoft/language-server-protocol/blob/master/protocol.md#content-part
"It defaults to 'utf8', which is the only encoding supported right now. "
I do agree that if in the future we would want to support multiple encodings, then yeah, we'd need a header. Sending just pure JSON objects on the stream would force to choose only one encoding.

Fault tolerance isn't just about dealing with dropped connections; you need to be able to deal with invalid data without crashing the language server or flushing all the other messages in your input stream. If I send [ 'hello ] as a message, I don't want the next 20 messages to be interpreted as a string.

If your client (or server) can't even send valid JSON, you're in big trouble already. Also, if the endpoint is so buggy it can't even guarantee to send valid JSON, who is to say it will guarantee the header fields will be valid as well? The Content-Length could be invalid as well...

@bruno-medeiros
Copy link

Requiring a third-party streaming library for both clients and servers just to avoid a single header is unneeded overhead

wuuut...? 😲 @felixfbecker Do you think most LSP clients and servers are going to be using Javascript at all? Not to be offensive, but that is laughable. Nearly every server is being written in the language they serve (Java LS in Java, Rust LS in Rust, Go LS Go, etc, etc. no Javascript).

As for clients, it's only VS Code that is using Javascript, as far as I know. Eclipse Ché and Eclipse desktop are using a Java JSON library - GSON actually). IntelliJ is likely to do same, if not doing already. Other clients like Vim, Emacs, XCode, Sublime, Visual Studio (not code) - don't use Javascript AFAIK, but rather C/C++, Objective C, Swift, LISP/Scheme, Python and whatnot, and are likely to use existing, well established JSON parsing libraries.

@masaeedu
Copy link
Contributor

masaeedu commented Oct 26, 2016

@bruno-medeiros

If your client (or server) can't even send valid JSON, you're in big trouble already.

It doesn't matter whether this is the client's fault, the ISP's fault, or the network card's fault. Fault tolerance is about increasing the number of scenarios where you can recover from failure. If you have a header, you can deal with invalid message bodies, which improves fault tolerance.

The Content-Length could be invalid as well...

If the content length header is invalid, you can tell immediately, because you will not be able to parse out the subsequent header in the middle of a message body. If the body is invalid (and there is no header), there is no way to tell, and the parser is going to continue muddling on until hopefully you get a parse error. Until you do encounter a parse error, the server is going to appear to be stuck. The worst possible outcome would be that you don't receive any error, and the server simply does something incorrect and carries on. This is especially likely to be a problem over connectionless transports.

Either way, we're down to only 1/3 reasons I provided that you still object to, so I think there is a strong case for keeping the header.

@felixfbecker
Copy link

This whole argument is just pointless. There are so many things that speak against removing the Content-Length header, and nothing for it. Where did I say that I think most clients will use JavaScript? All I tried to say was that it is weird to claim every good JSON implementation is able to do this when the mother of all JSON implementations does not do this and I never heard anyone complain. My own LS is implemented in PHP, which provides native encoding/decoding of JSON, but just like JS, for strings, not streams. There is good reason many protocols have a notion of easy-to-parse headers that include metadata like the body length.

@bruno-medeiros
Copy link

Either way, we're down to only 1/3 reasons I provided that you still object to, so I think there is a strong case for keeping the header.

I don't object to anything. In a comment of mine up ahead I already said: "In any case, I'm not advocating either way that the Content-Length header be dropped, I just pointed out it could be done."
So, it was just a technical hypothetical.

it is weird to claim every good JSON implementation is able to do this when the mother of all JSON implementations does not do this and I never heard anyone complain.

Well, I do agree by now the argument is pointless, because now we are getting into territory of subjectiveness and personal preference. Let's just say I don't like dynamic languages, and specifically don't like JSON.stringify and JSON.parse then - and leave it at that.

@ianks
Copy link
Author

ianks commented Oct 27, 2016

IME, the Content-Length header is a responsibility of the transport layer, and not the protocol itself. There will likely be something similar at the transport layer; but it does not need to be directly part of the protocol itself.

If it turns out to be the most efficient way of transporting data, the by all means give it a name (LSTP) and use it; I just vote for not enforcing it at the language server protocol level.

@felixfbecker
Copy link

No, transport layer is for example TCP. Content-Length header is part of the application layer, just like in HTTP. https://en.wikipedia.org/wiki/OSI_model

@ianks
Copy link
Author

ianks commented Oct 27, 2016

Sorry about the mis-wording. Essentially, I think the application layer should not be enforced by ms-language-serverprotocol. If one wanted to use HTTP1.1 for example, they should be able to. In the current documentation, they cannot because the header is strictly specified. I think the focus should be on the RPC protocol; leave the application layer up to the implementer.

@dbaeumer
Copy link
Member

I would propose the following

  • we have transport layer. This is for sure not part of the protocol specification.
  • we have an messaging layer. From an implementation aspect this can be node IPC if both client and server are node, this can the JSON-RPC currently defined in the protocol here or this can be http
  • we have the actual protocol specification.

Beside specifying the protocol we should agree on a set of command line switches we can pass to the server to start it in different transport and messaging layer. What kind a server supports is up to the server but must be documented. However I would opt for a minimal set of transport and messaging.

Transports I see: stdio, socket/pipes, ...
Messaging: LSP-RPC as defined today(json rpc with header) , nodeIPC (implies transport), http, ....

The minimal combination a server has to support is [stdio, LSP-RPC]

@aeschli FYI.

@mbana
Copy link

mbana commented Oct 31, 2016

To summarise, please do correct me if I am wrong: The protocol is not bound to the transport. One could even implement the protocol using UDP.

Did you intentionally layout the points in the above comment in that order, @dbaeumer?

I feel this is becoming a bit confusing at the moment. If people are not sure they should ask questions, e.g., #8 WebSocket is not TCP.

@felixfbecker
Copy link

One important aspect: On Windows, STDIO is always blocking. For this reason the PHP LS will connect to a TCP server, when passed the argument --tcp=host:port, which the extension uses on Windows. A standardized way to pass this would be great.

@gorkem
Copy link
Contributor

gorkem commented Oct 31, 2016

@dbaeumer JDT LS uses environment variables to select the transport currently it is a choice between sockets and named pipes

@bruno-medeiros
Copy link

One important aspect: On Windows, STDIO is always blocking.

Hum, what do you mean by this, exactly?

@felixfbecker
Copy link

@bruno-medeiros In PHP on Windows, a stream_select with STDIN will always block until there is data available even when the timeout is 0. That sucks of course for single-threaded runtimes like PHP (I think NodeJS works around this with threading under the hood, but PHP doesn't). The best option is to use a socket on Windows, because socket streams can be non-blocking.

See discussion
felixfbecker/php-language-server#31 (comment)
felixfbecker/php-language-server#31 (comment)

@bruno-medeiros
Copy link

(Ah, single-threading, now I get it. Yeah, it would be a problem in that case.)

@matklad
Copy link
Contributor

matklad commented Nov 23, 2016

If anything, the header part of the LSP ("Content-Length")seems like it could be simplified and removed away.

Yes! The simplest possible transport is to send JSON in JSON object per line format (that is, to require no internal \n whitespace in JSON and to require that \n inside string literals are escaped). I would argue that this should be the default transport.

Pro:

  • The simplest possible solution.

  • It's easy to base framing on a single character separator. In contrast, framing based on the content length can be tricky (especially if done in non-blocking fashion) because it effectively mixes text (headers are until first blank line) and binary formats (the length is in bytes).

  • You'd be able to test LSP by just copy pasting JSON on the command line.

  • This format is already used for similar purposes in the Dart analysis server.

Contra:

  • Can't preallocate properly sized buffer when reading a message.

  • Different from the status quo.

  • Newlines must always be escaped inside JSON.

@ssfrr
Copy link

ssfrr commented Sep 28, 2017

If the stream is UTF-8 encoded, you could also use an out-of-band byte like 0xff (which will never appear in the UTF-8 steam) as your delimiter, so you don't need to worry about escaping anything in your payload.

@dbaeumer
Copy link
Member

Open item is to specify which minimal command line flags should be supported. PR still welcome.

@dbaeumer dbaeumer added this to the 4.0 milestone Nov 24, 2017
@rdhafidh
Copy link

if raw TCP socket is preferrable, then why not use something like either google protobuf or flatbuffers (lazy source interop language codegen) and then perfrom simple delimeted msg length with it's raw buffer?

@matklad
Copy link
Contributor

matklad commented May 27, 2018

The simplest possible transport is to send JSON in JSON object per line format

It actually has a specification: http://ndjson.org/ :)

@felixfbecker
Copy link

Btw, was it ever considered to use WebSockets instead? That's probably available in every language too, and would even allow language servers on the web.

@kdvolder
Copy link

I don't think LSP really says anything about what you should/could use to establish a connection that can send bytes back and forth. You can use sysin/out, TCP socktets, websockets, or whatever else you want. The protocol only talks about what gets sent over the wire once you have already established a bi-directional channel between client and server.

@dbaeumer
Copy link
Member

We should still specify the minimal command line flags

@dbaeumer dbaeumer modified the milestones: 4.0, Backlog Oct 30, 2019
@dbaeumer
Copy link
Member

I documented some standard command line arguments for 3.15 and will close the issue. The specification itself still allows to rtun LSP over http.

@vscodebot vscodebot bot locked and limited conversation to collaborators Dec 25, 2020
@dbaeumer dbaeumer removed this from the Backlog milestone Nov 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
clarification help wanted Issues identified as good community contribution opportunities
Projects
None yet
Development

No branches or pull requests