Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate into Message Pack and GZIP support #553

Closed
dbaeumer opened this issue Jan 6, 2020 · 30 comments
Closed

Investigate into Message Pack and GZIP support #553

dbaeumer opened this issue Jan 6, 2020 · 30 comments
Milestone

Comments

@dbaeumer
Copy link
Member

dbaeumer commented Jan 6, 2020

Currently the transport uses a JSON string as a payload. To reduce the payload we could:

  • zip the content of the string
  • use Message Pack instead of JSON

We should investigate both options and measure the saving we can have.

This must be optional and would best be done using header properties to negotiate this as in HTTP

@artas90
Copy link

artas90 commented Feb 26, 2020

CBOR format which was inspired by MessagePack seems to be better choice.
As is it a web standard https://tools.ietf.org/html/rfc7049.

@dbaeumer
Copy link
Member Author

@artas90 thanks for pointing this out.

@AArnott
Copy link
Member

AArnott commented May 1, 2020

We're looking at moving more of our remotable services in VS from JSON to MessagePack. I'm concerned about losing interoperability with node-based processes. I'd love to see vscode-jsonrpc support a pluggable formatter (like StreamJsonRpc has) so that the encoding can be whatever the user wants. JSON can be the default formatter, and we can supply the MessagePack one if it isn't in-box as an option as well.

@AArnott
Copy link
Member

AArnott commented May 1, 2020

@artas90 As I already have a great deal of investment and code based on MessagePack, I'm biased. But I wonder if you want to persuade me as to why CBOR is superior to MessagePack. They are both spec'd standards. One being RFC and the other spec'd elsewhere isn't particularly compelling. JSON itself at least started as a spec on https://www.json.org/ as I understand it.

@AArnott
Copy link
Member

AArnott commented May 2, 2020

This must be optional and would best be done using header properties to negotiate this as in HTTP

FWIW, the format we use tends to be important to maintain across an entire RPC session. While we can change text encodings from message to message, switching between JSON and MessagePack isn't supported in StreamJsonRpc. So while you could declare the encoding to be messagepack in your HTTP headers, at least for our purposes we need to know whether MessagePack or JSON will be used up front when establishing the connection.

@artas90
Copy link

artas90 commented May 3, 2020

@AArnott Honestly I never used neither CBOR nor MessagePack. Just wanted to mention that such standard as CBOR exists. And it might be considered as well

@dbaeumer
Copy link
Member Author

dbaeumer commented May 5, 2020

I rebased the work I started on master today and a first version of this is here: https://github.com/microsoft/vscode-languageserver-node/tree/dbaeumer/encodingSupport2

Implementation wise this works as follows:

  • support pluggable encoders and decoders (gzip, ...)
  • content format is still JSON
  • handshake is done using HTTP header properties (e.g. Accept-Encoding and Content-Encoding). This only support encoding responses (which usually are the big payload)
  • in LSP I will add capabilities to negotiate the encoding in the initialize phase to avoid sending the headers all the time. In addition it will support encode notification and request.

For MessagePack in would propose that we indicate it using a different Content-Type since it is not JSON anymore. The content type I would use is application/msgpack (see the discussion in msgpack/msgpack#194). As with encoding I would also support in LSP to negotiate msgpack during the initialize phase.

@AArnott
Copy link
Member

AArnott commented May 5, 2020

Thanks for the update, @dbaeumer. Your design sounds like you're definitely planning on a mix of messagepack and JSON messages on the same connection, which is exactly what I was hoping we wouldn't do. I'll have to review what kind of state our formatters build up over a connection to see what kind of features will break if we switch formats from one message to another.

@dbaeumer
Copy link
Member Author

dbaeumer commented May 6, 2020

This is mainly driven by the fact that I want to offer an upgrade path from the current behavior. In my current thinking the initialize handshake will always be plain json (like the header is always encoded using ascii). After the first handshake a connection can then be either in a state where encoding and content type is determined using header properties or where client and server agree on a encoding and / or content type for all messages.

Otherwise LSP would need to specify command line arguments on how to define content type and encoding on startup. And a server needs to document the supported types and encodings since they are reused in different clients with different content type and encoding capabilities.

@AArnott
Copy link
Member

AArnott commented May 6, 2020

Does every LSP server already have a client portion that knows how to spawn that particular LSP server? In VS that's the way it is. I thought perhaps it was this way in VS Code as well. As I understood it, LSP doesn't prescribe how the LSP server is spawned, what command line args it takes, or how to establish the pipe between the client and server processes.
If that's all true, why not simply document in the LSP spec that LSP may be encoded as JSON or MessagePack, or any other encoding that the client and server agree on.

Then the spec itself doesn't have to prescribe a handshake. For example, if I wrote an LSP server for F#, I would also have to write the LSP client that spawns the F# LSP server. Since I want a super-fast experience, I choose to encode with MessagePack. My client and server are hard-coded to always use MessagePack, and things just work.

All that would be left is for the vscode-jsonrpc library to support swapping in different formatters.

But I'm guessing I'm missing something. @dbaeumer can you tell me why you feel it's something that needs to be specified at the LSP spec level?

Consider: If we are to include MessagePack provisions in the LSP spec, we could get far more benefit by specifying indexes for each property so that we can encode the data as arrays instead of maps in messagepack. We'd never have to serialize and transmit property names, which can dramatically reduce the payload size and time spent serializing.

@dbaeumer
Copy link
Member Author

dbaeumer commented May 7, 2020

This requires that we spec command line arguments to at least make the way how a content type is picked the same across servers. This simply adds another dimension to the spec which I want to avoid. But more important is the fact that we have requests to support the following:

  • connect to an already running server
  • multi tenant support

Hardwiring the content type and encoding will make it a lot harder to support the two things. I have no problem to specify that the content type and encoding must stay fixed after the initialize (doesn't change on every message, as it would be possible with HTTP).

@AArnott
Copy link
Member

AArnott commented May 7, 2020

I'm imagining that even if LSP is "relayed" across multiple nodes (e.g. LSP Server<->LSP Client on Live Share host<->Live Share host<->Live Share guest<->LSP client on guest) that each individual leg may be encoded with MessagePack or not. So it's not something necessarily even in the LSP spec -- just a decision that the individual JSON-RPC library is using when it sets up an individual leg on the connection.

I'd like MessagePack encoding support in vscode-jsonrpc so we can use MessagePack for (non-LSP related) JSON-RPC messages as part of our 'brokered services' between VS and VS Code.

Beyond that, @CyrusNajmabadi expressed interest in switching their LSP server to messagepack in dotnet/roslyn#42595 (comment), and I think we can do that without any change to the LSP spec (or vscode-jsonrpc for that matter). But if VS Code were to participate in that same perf win, vscode-jsonrpc would need to support MessagePack as well, and the VS Code LSP system would have to support allowing the LSP client that spawns the Roslyn LSP server to set up the MessageConnection with the MessagePack encoder before handing off to VS Code to then use RPC invocations (ignorant of messagepack being underneath) to talk to their LSP server.

@dbaeumer
Copy link
Member Author

dbaeumer commented May 8, 2020

I think there are two use cases: you have all connection under control and want to use message pack. With the changes I did to the jsonrpc library that is possible. You don't have all connection under control (e.g. you connect to a running server) and need to negotiate the encoding. This needs support in LSP.

@AArnott
Copy link
Member

AArnott commented May 8, 2020

Sounds reasonable. But how can you connect to an existing server without controlling that connection?

@dbaeumer
Copy link
Member Author

You control the connect but not the content type / encoding. This could depend on the version of the server or the client connecting.

@AArnott
Copy link
Member

AArnott commented Jun 10, 2020

I'm checking out your dbaeumer/encodingSupport2 branch with the intent to find a way to get vscode-jsonrpc to work with messagepack in a way that is compatible with StreamJsonRpc's existing support for it. I may have a PR to send you.

@AArnott
Copy link
Member

AArnott commented Jun 10, 2020

Oh! You've already merged it into master I found last night. Great.

@dbaeumer
Copy link
Member Author

Yes, got merge. With the difference that it is not header negotiable anymore.

@dbaeumer dbaeumer added this to the On Deck milestone Jun 11, 2020
@dbaeumer dbaeumer modified the milestones: On Deck, 3.16 Oct 8, 2020
@dbaeumer
Copy link
Member Author

I will close the issue. Support has been added to the library but nothing has been added to the spec yet. To make use of this in the protocol we need to start on the LSP level and spec it there.

@vinistock
Copy link
Contributor

@dbaeumer has support for using msgpack instead of JSON been finalized? If yes, is there documentation on enabling it?

If not, what's missing and how can I help to push it forward?

I believe using msgpack will provide significant performance benefits for our LSP implementation.

@AArnott
Copy link
Member

AArnott commented May 6, 2023

FWIW, one idea my team has played with is the LSP server somehow advertising (through a manifest perhaps) that it supports msgpack via being launched by some switch. Then the LSP client spawns the server with that switch and assumes msgpack instead of JSON.

@dbaeumer
Copy link
Member Author

The support has been finalized in JSONRPC lib but there is still no handshake up upgrade a server connection. So you need to do what @AArnott suggests that you spawn your server with a special command line argument.

@vinistock
Copy link
Contributor

Adding the command line argument is reasonable and easy to do. But is there some configuration on the client side? How do I tell the client to use msgpack for requests and responses?

@dbaeumer
Copy link
Member Author

@vinistock good point. I never added that :-( but will be easy to do. But please note that this will only work for transports different than Node IPC.

@vinistock
Copy link
Contributor

Do you have any pointers on how to get started? I can take a stab at making this an option for clients.

About the transports: can you elaborate? Would that impact language servers implemented in languages other than NodeJS?

@AArnott
Copy link
Member

AArnott commented May 27, 2023

But please note that this will only work for transports different than Node IPC.

I'm confused over why a transport mechanism or platform would limit whether you can encode LSP as msgpack. We use msgpack across IPC with a node process for non-LSP purposes already.

@dbaeumer
Copy link
Member Author

@vinistock it was my mistake. The code is actually already there since you can always pass a function as a server (see https://insiders.vscode.dev/github/microsoft/vscode-languageserver-node/blob/main/client/src/node/main.ts#L126) which returns a MessageTransport which you can setup accordingly.

@AArnott Node provides a special IPC mechanism where you can directly send and receive JS object and the Node runtime does the serialization / de-serialization. You could may be use msgpack -> number[] -> Node IPC but I guess this is counter productive. You can of course use msgpack with node using sockets / pipes.

@vinistock
Copy link
Contributor

If you a pass a function as the server, because you only return a MessageTransport, you need to handle the life cycle of the server process (this._serverProcess) yourself, correct?

Is there interest in adding an option and handling this inside the library? Something like

const serverOptions = {
  run: /* ... */,
  debug: /* ... */,
  transport_encoding: TransportEncodings.Msgpack
}

@dbaeumer
Copy link
Member Author

@vinistock yes, that is correct.

What we can think of is supporting something like this in options: https://insiders.vscode.dev/github/microsoft/vscode-languageserver-node/blob/main/jsonrpc/src/node/test/messages.test.ts#L237

I don't want to have any hard reference to external libraries.

@vinistock
Copy link
Contributor

Thank you for considering it, I think it'll be a good quality of life improvement when using different encodings - since we can avoid having to deal with the life cycle of the process which is already implemented in the library.

I put together a draft of what this could look like in #1250. Please let me know if you're okay with the addition and the approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants