Permalink
Switch branches/tags
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
289 lines (230 sloc) 11 KB

WebSockets API

Introduction

This document is a proposal for a WebSockets-based client-server API. It is not intended to replace the REST API, but rather to complement it and provide an alternative interface for certain operations.

The primary goal is to offer a more efficient interface than the REST API: by using a bidirectional protocol such as WebSockets we can avoid the overheads involved in long-polling (SSL negotiation, HTTP headers, etc). In doing so we will reduce the latency between server and client by allowing the server to send events as soon as they arrive, rather than having to wait for a poll from the client.

Note: This proposal got continued in a google document you can find here: https://docs.google.com/document/d/104ClehFBgqLQbf4s-AKX2ijr8sOAxcizfcRs_atsB0g

Handshake

  1. Instead of calling /sync, the client makes a websocket request to /_matrix/client/rN/stream, passing the query parameters access_token and since, and optionally filter - all of which have the same meaning as for /sync.
    • The client sets the Sec-WebSocket-Protocol to m.json. (Servers may offer alternative encodings; at present only the JSON encoding is specified but in future we will specify alternative encodings.)
  2. The server returns the websocket handshake; the socket is then connected.

If the server does not return a valid websocket handshake, this indicates that the server or an intermediate proxy does not support WebSockets. In this case, the client should fall back to polling the /sync REST endpoint.

Example

Client request:

GET /_matrix/client/v2_alpha/stream?access_token=123456&since=s72594_4483_1934 HTTP/1.1
Host: matrix.org
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
Sec-WebSocket-Protocol: m.json
Sec-WebSocket-Version: 13
Origin: https://matrix.org

Server response:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: HSmrc0sMlYUkAGmm5OPpG2HaGWk=
Sec-WebSocket-Protocol: m.json

Update Notifications

Once the socket is connected, the server begins streaming updates over the websocket. The server sends Update notifications about new messages or state changes. To make it easy for clients to parse, Update notifications have the same structure as the response to /sync: an object with the following members:

Key Type Description
next_batch string The batch token to supply in the since param of the next /sync request. This is not required for streaming of events over the WebSocket, but is provided so that clients can reconnect if the socket is disconnected.
presence Presence The updates to the presence status of other users.
rooms Rooms Updates to rooms.

Example

Message from the server:

{
    "next_batch": "s72595_4483_1934",
    "presence": {
        "events": []
    },
    "rooms": {
        "join": {},
        "invite": {},
        "leave": {}
    }
}

Client-initiated operations

The client can perform certain operations by sending a websocket message to the server. Such a "Request" message should be a JSON-encoded object with the following members:

Key Type Description
id string A unique identifier for this request
method string Specifies the name of the operation to be performed; see below for available operations
param object The parameters for the requested operation.

The server responds to a client Request with a Response message. This is a JSON-encoded object with the following members:

Key Type Description
id string The same as the value in the corresponding Request object. The presence of the id field distinguishes a Response message from an Update notification.
result object On success, the results of the request.
error object On error, an object giving the resons for the error. This has the same structure as the "standard error response" for the Matrix API: an object with the fields errcode and error.

Request methods

It is not intended that all operations which are available via the REST API will be available via the WebSockets API, but a few simple, common operations will be exposed. The initial operations will be as follows.

ping

This is a no-op which clients may use to keep their connection alive.

The request params and the response result should be empty.

send

Send a message event to a room. The parameters are as follows:

Parameter Type Description
room_id string Required. The room to send the event to
event_type string Required. The type of event to send.
content object Required. The content of the event.

The result is as follows:

Key Type Description
event_id string A unique identifier for the event.

The id from the Request message is used as the transaction ID by the server.

state

Update the state on a room.

Parameter Type Description
room_id string Required. The room to set the state in
event_type string Required. The type of event to send.
state_key string Required. The state_key for the state to send.
content object Required. The content of the event.

The result is as follows:

Key Type Description
event_id string A unique identifier for the event.

Example

Client request:

{
    "id": "12345",
    "method": "send",
    "params": {
        "room_id": "!d41d8cd:matrix.org",
        "event_type": "m.room.message",
        "content": {
            "msgtype": "m.text",
            "body": "hello"
        }
    }
}

Server response:

{
    "id": "12345",
    "result": {
        "event_id": "$66697273743031:matrix.org"
    }
}

Alternative server response, in case of error:

{
    "id": "12345",
    "error": {
       "errcode": "M_MISSING_PARAM",
       "error": "Missing parameter: event_type"
    }
}

Rationale

Alternatives to WebSockets include HTTP/2, CoAP, and simply rolling our own protocol over raw TCP sockets. However, the need to implement browser-based clients essentially reduces our choice to WebSockets. HTTP/2 streams will probably provide an interesting alternative in the future, but current browsers do not appear to give javascript applications low-level access to the protocol.

Concerning the continued use of the JSON encoding: we prefer to focus on the transition to WebSockets initially. Replacing JSON with a compact representation such as CBOR, MessagePack, or even just compressed JSON will be a likely extension for the future. The support for negotiation of subprotocols within WebSockets should make this a simple transition once time permits.

The number of methods available for client requests is deliberately limited, as each method requires code to be written to map it onto the equivalent REST implementation. Some REST methods - for instance, user registration and login - would be pointless to expose via WebSockets. It is likely, however, that we will increate the number of methods available via the WebSockets API as it becomes clear which would be most useful.

Open questions

Throttling

At least in v2 sync, clients are inherently self-throttling - if they do not poll quickly enough, events will be dropped from the next result. This proposal raises the possibility that events will be produced more quickly than they can be sent to the client; backlogs will build up on the server and/or in the intermediate network, which will not only lead to high latency on events being delivered, but will lead to responses to client requests also being delayed.

We may need to implement some sort of throttling mechanism by which the server can start to drop events. The difficulty is in knowing when to start dropping events. A few ideas:

  • Use websocket pings to measure the RTT; if it starts to increase, start dropping events. But this requires knowledge of the base RTT, and a useful model of what constitutes an excessive increase.
  • Have the client acknowledge each batch of events, and use a window to ensure the number of outstanding batches is limited. This is annoying as it requires the client to have to acknowledge batches - and it's not clear what the right window size is: we want a big window for long fat networks (think of mobile clients), but a small one for one with lower latency.
  • Start dropping events if the server's TCP buffer starts filling up. This has the advantage of delegating the congestion-detection to TCP (which already has a number of algorithms to deal with it, to greater or lesser effectiveness), but relies on homeservers being hosted on OSes which use sensible TCP congestion-avoidance algorithms, and more critically, an ability to read the fill level of the TCP send buffer.