Why not use HTTP directly? #86

ianks · 2016-10-06T21:02:40Z

The base protocol consists of a header and a content part (comparable to HTTP). The header and content part are separated by a '\r\n'.

Why not go the full route and support HTTP directly?

Benefits

Clients will not have to implement their own language-server-protocol parser. HTTP has plenty of robust, performant parsers already; we could leverage these.
Ability to bind to a port and communicate over TCP, so hosts could support multiple clients.
Ease of encryption via HTTPS
Status codes could provide better insight into success/error cases.

Downfalls

HTTP2 is a binary protocol and that may reduce debug-ability. Possible solution: stick to HTTP 1.1
More bytes: at the very least, we need to send a request line, and potentially a few more headers. I would tend to think this is trivial, but its worth mentioning.
Would need to figure how sessions are handled

More info

http://json-rpc.org/wiki/specification#a2.2JSON-RPCoverHTTP

The text was updated successfully, but these errors were encountered:

gorkem · 2016-10-07T01:14:41Z

A server implementation can still use HTTP, protocol does not prevent HTTP to be used as the transport. However, explicitly limiting the transport to HTTP would prevent servers to use transports such as sockets or named pipes which are most commonly used by the implementations today.

ianks · 2016-10-07T01:40:14Z

HTTP could still be used over named pipes and sockets, FWIW

mickaelistria · 2016-10-07T05:54:53Z

+1 for this, and when I had the opportunity to ask the VSCode team about it, they said that main reason for not using HTTP is performance (which is a major concern for the language server). Opening to HTTP(s) would allow to provide language servers as SaaS, that clients could connect to. Although not being very relevant for some operations such as completion, I see a huge opportunity for remote "build servers" to run builds or other complex analysis and report back diagnostics and remediation after some delay.
As http can allow authentication, it also enables the possibility for a server to become more user-specific and portable: the server can learn a few things for a given user, store them, and if user connects with another editor on another machine to the same server with the same credentials, server knows who's this user and can use it to customize its results.

However, I agree with Gorkem that the protocol shouldn't restrain to one transport and still allow stdio, sockets and named pipes as those are the one that work best for immediate local integration. However, it should also open the door to HTTP to allow more explicitly SaaS and user authentication.

dbaeumer · 2016-10-07T09:29:04Z

The language server protocol is basically independent of the transport. So from the protocol point of view there is nothing that limits it to a special transport.

In reality the json-rpc library we use already has two different kind of transports (https://github.com/Microsoft/vscode-languageserver-node/blob/master/jsonrpc/src/messageReader.ts#L248). The first one is over streams using the described base protocol. The second is node ipc which doesn't have any header information. It simply sends and receives JSON objects. Expanding this to HTTP should be straight forward including the client library.

What needs to be done is to update the documentation. Now that I think off we should move the Base protocol to https://github.com/Microsoft/vscode-languageserver-node and simply say here that different transports are possible.

bruno-medeiros · 2016-10-07T10:11:29Z

I agree with @dbaeumer here: LSP is independent of the transport - and it wouldn't be good to try to specifiy the transport at LSP level, since they are two different concerns, two different layers of abstraction.

If anything, the header part of the LSP ("Content-Length")seems like it could be simplified and removed away. Just sending the JSON_RPC json objects through the stream should be enough, and from what you say, you're doing this already in an internal case for VSCode.

ianks · 2016-10-07T16:55:19Z

I would also tend to think removing the 'Content-Length' requirement is actually a better solution if that is possible.

In this case, we are specifying on just using JSON-RPC with a configurable layer of transport, so the whole:

The base protocol consists of a header and a content part (comparable to HTTP). The header and content part are separated by a '\r\n'.

Can be removed?

felixfbecker · 2016-10-10T14:24:14Z

I don't think this makes much sense. HTTP always executes a HTTP method on a ressource. What would these be?
HTTP is an application level protocol over TCP on the transport layer. Just like LSP is an application level protocol over TCP on the transport layer.

masaeedu · 2016-10-26T03:38:28Z

@bruno-medeiros If you remove that header, how would you know how much of the stream to consume for the next message? JSON parsing the stream on the fly would be complicated, fault-intolerant, and would prevent concurrent parsing/processing of messages.

The protocol doesn't need all the metadata overhead of HTTP, but it is useful to have a content length header, as is the case with many protocols.

bruno-medeiros · 2016-10-26T13:31:07Z

@bruno-medeiros If you remove that header, how would you know how much of the stream to consume for the next message? JSON parsing the stream on the fly would be complicated, fault-intolerant, and would prevent concurrent parsing/processing of messages.

@masaeedu huh? Any JSON parsing library worth their salt should be able to parse a JSON object from a stream of characters/bytes (and do it again on same stream). It should not be complicated, but rather trivial - at least for languages that have built-in a concept of stream of bytes/characters.

As for fault-tolerance, so far the protocol assumes the connection is made through a stream of bytes from a connection (either TCP, or stdin/stdout) that guarantees sequential delivery of bytes - until the connection is closed or dropped abnormally. But if that happens the connection is dead anyways, you can't recover other than reconnecting - and that makes no difference whatsover if you're using Content Length header or not.

Also, as for "preventing concurrent parsing/processing of messages" - only the JSON parsing can no longer be concurrent with this change, true that. But the actual method execution can still be concurrent.

In any case, I'm not advocating either way that the Content-Length header be dropped, I just pointed out it could be done.

felixfbecker · 2016-10-26T13:37:03Z

huh? Any JSON parsing library worth their salt should be able to parse a JSON object from a stream of characters/bytes (and do it again on same stream). It should not be complicated, but rather trivial - at least for languages that have built-in a concept of stream of bytes/characters.

That is so totally not the case. In what language is it trivial to parse a stream of JSON? Not even in JavaScript. JSON.stringify() take a string, not a stream.

A Content-Length header makes a lot of sense. You cannot use \r\n as a terminator because JSON can include whitepace, and parsing JSON on-the-fly is unneeded complexity. It also allows you to see the progress of the response (think workspace/symbol, for a huge project the response could be millions of symbols!).

masaeedu · 2016-10-26T13:55:19Z

@bruno-medeiros Any JSON library worth its salt can parse a single JSON object from a stream of bytes, provided an encoding. You can't parse a stream of undelimited JSON objects, especially not when you are expected to ignore malformed data and can have different encodings for each object. It is not impossible to do, but many JSON libraries won't have it out of the box.

But if that happens the connection is dead anyways, you can't recover other than reconnecting - and that makes no difference whatsover if you're using Content Length header or not.

Fault tolerance isn't just about dealing with dropped connections; you need to be able to deal with invalid data without crashing the language server or flushing all the other messages in your input stream. If I send [ 'hello ] as a message, I don't want the next 20 messages to be interpreted as a string.

Also, as for "preventing concurrent parsing/processing of messages" - only the JSON parsing can no longer be concurrent with this change, true that. But the actual method execution can still be concurrent.

Glad we agree re: third point. With the volume of data you can expect in a language server, I think parallelizing deserialization is going to be important for performance.

bruno-medeiros · 2016-10-26T14:18:14Z

That is so totally not the case. In what language is it trivial to parse a stream of JSON? Not even in JavaScript. JSON.stringify() take a string, not a stream.

Java with GSON library. I'm sure there is GSON for Go as well. With Rust you have https://github.com/netvl/xml-rs (streaming API), and I'd bet there is something similar for D. I don't know much about dynamic languages like Javascript cause I don't like dynamic typing, however a quick google search pointed out this library for Javascript: http://oboejs.com/ , so there you go.

felixfbecker · 2016-10-26T14:27:36Z

JSON is JavaScript Object Notation and has seen most of its adoption because it is built-in into JavaScript with JSON.stringify() (which is fast C++ API). Requiring a third-party streaming library for both clients and servers just to avoid a single header is unneeded overhead. The whole LSP comes from TypeScript language service and VS Code (written in TypeScript) so this kind of stuff does matter.

felixfbecker · 2016-10-26T14:28:18Z

But it also isn't very useful to use streaming here. You cannot handle the response anyway until you have the whole object.

bruno-medeiros · 2016-10-26T14:28:56Z

@bruno-medeiros Any JSON library worth its salt can parse a single JSON object from a stream of bytes, provided an encoding. You can't parse a stream of undelimited JSON objects, especially not when you are expected to ignore malformed data and can have different encodings for each object. It is not impossible to do, but many JSON libraries won't have it out of the box.

You can't have different encoding for different objects ATM, only UTF8. See https://github.com/Microsoft/language-server-protocol/blob/master/protocol.md#content-part
"It defaults to 'utf8', which is the only encoding supported right now. "
I do agree that if in the future we would want to support multiple encodings, then yeah, we'd need a header. Sending just pure JSON objects on the stream would force to choose only one encoding.

Fault tolerance isn't just about dealing with dropped connections; you need to be able to deal with invalid data without crashing the language server or flushing all the other messages in your input stream. If I send [ 'hello ] as a message, I don't want the next 20 messages to be interpreted as a string.

If your client (or server) can't even send valid JSON, you're in big trouble already. Also, if the endpoint is so buggy it can't even guarantee to send valid JSON, who is to say it will guarantee the header fields will be valid as well? The Content-Length could be invalid as well...

bruno-medeiros · 2016-10-26T14:41:02Z

Requiring a third-party streaming library for both clients and servers just to avoid a single header is unneeded overhead

wuuut...? 😲 @felixfbecker Do you think most LSP clients and servers are going to be using Javascript at all? Not to be offensive, but that is laughable. Nearly every server is being written in the language they serve (Java LS in Java, Rust LS in Rust, Go LS Go, etc, etc. no Javascript).

As for clients, it's only VS Code that is using Javascript, as far as I know. Eclipse Ché and Eclipse desktop are using a Java JSON library - GSON actually). IntelliJ is likely to do same, if not doing already. Other clients like Vim, Emacs, XCode, Sublime, Visual Studio (not code) - don't use Javascript AFAIK, but rather C/C++, Objective C, Swift, LISP/Scheme, Python and whatnot, and are likely to use existing, well established JSON parsing libraries.

masaeedu · 2016-10-26T15:02:29Z

@bruno-medeiros

If your client (or server) can't even send valid JSON, you're in big trouble already.

It doesn't matter whether this is the client's fault, the ISP's fault, or the network card's fault. Fault tolerance is about increasing the number of scenarios where you can recover from failure. If you have a header, you can deal with invalid message bodies, which improves fault tolerance.

The Content-Length could be invalid as well...

If the content length header is invalid, you can tell immediately, because you will not be able to parse out the subsequent header in the middle of a message body. If the body is invalid (and there is no header), there is no way to tell, and the parser is going to continue muddling on until hopefully you get a parse error. Until you do encounter a parse error, the server is going to appear to be stuck. The worst possible outcome would be that you don't receive any error, and the server simply does something incorrect and carries on. This is especially likely to be a problem over connectionless transports.

Either way, we're down to only 1/3 reasons I provided that you still object to, so I think there is a strong case for keeping the header.

felixfbecker · 2016-10-26T15:24:59Z

This whole argument is just pointless. There are so many things that speak against removing the Content-Length header, and nothing for it. Where did I say that I think most clients will use JavaScript? All I tried to say was that it is weird to claim every good JSON implementation is able to do this when the mother of all JSON implementations does not do this and I never heard anyone complain. My own LS is implemented in PHP, which provides native encoding/decoding of JSON, but just like JS, for strings, not streams. There is good reason many protocols have a notion of easy-to-parse headers that include metadata like the body length.

bruno-medeiros · 2016-10-26T18:26:54Z

Either way, we're down to only 1/3 reasons I provided that you still object to, so I think there is a strong case for keeping the header.

I don't object to anything. In a comment of mine up ahead I already said: "In any case, I'm not advocating either way that the Content-Length header be dropped, I just pointed out it could be done."
So, it was just a technical hypothetical.

it is weird to claim every good JSON implementation is able to do this when the mother of all JSON implementations does not do this and I never heard anyone complain.

Well, I do agree by now the argument is pointless, because now we are getting into territory of subjectiveness and personal preference. Let's just say I don't like dynamic languages, and specifically don't like JSON.stringify and JSON.parse then - and leave it at that.

ianks · 2016-10-27T00:37:40Z

IME, the Content-Length header is a responsibility of the transport layer, and not the protocol itself. There will likely be something similar at the transport layer; but it does not need to be directly part of the protocol itself.

If it turns out to be the most efficient way of transporting data, the by all means give it a name (LSTP) and use it; I just vote for not enforcing it at the language server protocol level.

felixfbecker · 2016-10-27T07:11:59Z

No, transport layer is for example TCP. Content-Length header is part of the application layer, just like in HTTP. https://en.wikipedia.org/wiki/OSI_model

ianks · 2016-10-27T15:53:17Z

Sorry about the mis-wording. Essentially, I think the application layer should not be enforced by ms-language-serverprotocol. If one wanted to use HTTP1.1 for example, they should be able to. In the current documentation, they cannot because the header is strictly specified. I think the focus should be on the RPC protocol; leave the application layer up to the implementer.

dbaeumer · 2016-10-31T11:38:09Z

I would propose the following

we have transport layer. This is for sure not part of the protocol specification.
we have an messaging layer. From an implementation aspect this can be node IPC if both client and server are node, this can the JSON-RPC currently defined in the protocol here or this can be http
we have the actual protocol specification.

Beside specifying the protocol we should agree on a set of command line switches we can pass to the server to start it in different transport and messaging layer. What kind a server supports is up to the server but must be documented. However I would opt for a minimal set of transport and messaging.

Transports I see: stdio, socket/pipes, ...
Messaging: LSP-RPC as defined today(json rpc with header) , nodeIPC (implies transport), http, ....

The minimal combination a server has to support is [stdio, LSP-RPC]

@aeschli FYI.

mbana · 2016-10-31T11:57:37Z

To summarise, please do correct me if I am wrong: The protocol is not bound to the transport. One could even implement the protocol using UDP.

Did you intentionally layout the points in the above comment in that order, @dbaeumer?

I feel this is becoming a bit confusing at the moment. If people are not sure they should ask questions, e.g., #8 WebSocket is not TCP.

felixfbecker · 2016-10-31T12:21:18Z

One important aspect: On Windows, STDIO is always blocking. For this reason the PHP LS will connect to a TCP server, when passed the argument --tcp=host:port, which the extension uses on Windows. A standardized way to pass this would be great.

gorkem · 2016-10-31T14:37:37Z

@dbaeumer JDT LS uses environment variables to select the transport currently it is a choice between sockets and named pipes

bruno-medeiros · 2016-11-01T10:28:13Z

One important aspect: On Windows, STDIO is always blocking.

Hum, what do you mean by this, exactly?

felixfbecker · 2016-11-01T12:06:05Z

@bruno-medeiros In PHP on Windows, a stream_select with STDIN will always block until there is data available even when the timeout is 0. That sucks of course for single-threaded runtimes like PHP (I think NodeJS works around this with threading under the hood, but PHP doesn't). The best option is to use a socket on Windows, because socket streams can be non-blocking.

See discussion
felixfbecker/php-language-server#31 (comment)
felixfbecker/php-language-server#31 (comment)

bruno-medeiros · 2016-11-01T17:53:34Z

(Ah, single-threading, now I get it. Yeah, it would be a problem in that case.)

matklad · 2016-11-23T11:21:07Z

If anything, the header part of the LSP ("Content-Length")seems like it could be simplified and removed away.

Yes! The simplest possible transport is to send JSON in JSON object per line format (that is, to require no internal \n whitespace in JSON and to require that \n inside string literals are escaped). I would argue that this should be the default transport.

Pro:

The simplest possible solution.
It's easy to base framing on a single character separator. In contrast, framing based on the content length can be tricky (especially if done in non-blocking fashion) because it effectively mixes text (headers are until first blank line) and binary formats (the length is in bytes).
You'd be able to test LSP by just copy pasting JSON on the command line.
This format is already used for similar purposes in the Dart analysis server.

Contra:

Can't preallocate properly sized buffer when reading a message.
Different from the status quo.
Newlines must always be escaped inside JSON.

ssfrr · 2017-09-28T17:07:34Z

If the stream is UTF-8 encoded, you could also use an out-of-band byte like 0xff (which will never appear in the UTF-8 steam) as your delimiter, so you don't need to worry about escaping anything in your payload.

dbaeumer · 2017-11-24T15:30:27Z

Open item is to specify which minimal command line flags should be supported. PR still welcome.

rdhafidh · 2018-04-28T00:35:37Z

if raw TCP socket is preferrable, then why not use something like either google protobuf or flatbuffers (lazy source interop language codegen) and then perfrom simple delimeted msg length with it's raw buffer?

matklad · 2018-05-27T10:35:38Z

The simplest possible transport is to send JSON in JSON object per line format

It actually has a specification: http://ndjson.org/ :)

felixfbecker · 2018-09-25T12:17:22Z

Btw, was it ever considered to use WebSockets instead? That's probably available in every language too, and would even allow language servers on the web.

kdvolder · 2018-09-25T16:37:13Z

I don't think LSP really says anything about what you should/could use to establish a connection that can send bytes back and forth. You can use sysin/out, TCP socktets, websockets, or whatever else you want. The protocol only talks about what gets sent over the wire once you have already established a bi-directional channel between client and server.

dbaeumer · 2019-10-30T14:26:03Z

We should still specify the minimal command line flags

dbaeumer · 2020-11-10T19:57:15Z

I documented some standard command line arguments for 3.15 and will close the issue. The specification itself still allows to rtun LSP over http.

dbaeumer added the help wanted Issues identified as good community contribution opportunities label Oct 7, 2016

akosyakov mentioned this issue Oct 7, 2016

Break down io.typefox.lsapi.services module TypeFox/ls-api#36

Closed

dbaeumer added the discussion label Oct 31, 2016

dbaeumer mentioned this issue Oct 31, 2016

List supporting implementations #8

Closed

This was referenced Nov 21, 2016

Clarify JSON message separator #127

Closed

Allowed ways of transmission? #122

Closed

dbaeumer added this to the 4.0 milestone Nov 24, 2017

dbaeumer added clarification and removed discussion labels Nov 24, 2017

akosyakov mentioned this issue May 2, 2018

WebSocketMessageReader vs Content-Length TypeFox/vscode-ws-jsonrpc#6

Closed

jonahgraham mentioned this issue Jul 18, 2018

Server doesn't respond to invalid request eclipse-lsp4j/lsp4j#210

Closed

dbaeumer modified the milestones: 4.0, Backlog Oct 30, 2019

dbaeumer closed this as completed Nov 10, 2020

vscodebot bot locked and limited conversation to collaborators Dec 25, 2020

dbaeumer removed this from the Backlog milestone Nov 2, 2021

Why not use HTTP directly? #86

Why not use HTTP directly? #86

Comments

ianks commented Oct 6, 2016 • edited Loading

Benefits

Downfalls

More info

gorkem commented Oct 7, 2016

ianks commented Oct 7, 2016

mickaelistria commented Oct 7, 2016

dbaeumer commented Oct 7, 2016

bruno-medeiros commented Oct 7, 2016

ianks commented Oct 7, 2016 • edited Loading

felixfbecker commented Oct 10, 2016

masaeedu commented Oct 26, 2016 • edited Loading

bruno-medeiros commented Oct 26, 2016

felixfbecker commented Oct 26, 2016 • edited Loading

masaeedu commented Oct 26, 2016 • edited Loading

bruno-medeiros commented Oct 26, 2016 • edited Loading

felixfbecker commented Oct 26, 2016

felixfbecker commented Oct 26, 2016

bruno-medeiros commented Oct 26, 2016

bruno-medeiros commented Oct 26, 2016

masaeedu commented Oct 26, 2016 • edited Loading

felixfbecker commented Oct 26, 2016

bruno-medeiros commented Oct 26, 2016

ianks commented Oct 27, 2016

felixfbecker commented Oct 27, 2016

ianks commented Oct 27, 2016

dbaeumer commented Oct 31, 2016

mbana commented Oct 31, 2016

felixfbecker commented Oct 31, 2016

gorkem commented Oct 31, 2016

bruno-medeiros commented Nov 1, 2016

felixfbecker commented Nov 1, 2016

bruno-medeiros commented Nov 1, 2016

matklad commented Nov 23, 2016 • edited Loading

ssfrr commented Sep 28, 2017

dbaeumer commented Nov 24, 2017

rdhafidh commented Apr 28, 2018

matklad commented May 27, 2018

felixfbecker commented Sep 25, 2018

kdvolder commented Sep 25, 2018

dbaeumer commented Oct 30, 2019

dbaeumer commented Nov 10, 2020

ianks commented Oct 6, 2016 •

edited

Loading

ianks commented Oct 7, 2016 •

edited

Loading

masaeedu commented Oct 26, 2016 •

edited

Loading

felixfbecker commented Oct 26, 2016 •

edited

Loading

masaeedu commented Oct 26, 2016 •

edited

Loading

bruno-medeiros commented Oct 26, 2016 •

edited

Loading

masaeedu commented Oct 26, 2016 •

edited

Loading

matklad commented Nov 23, 2016 •

edited

Loading