Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2108: Sync over Server Sent Events #2108

Open
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@stalniy
Copy link

commented Jun 11, 2019

Fixes #2024
full proposal

Introduction

Currently, Matrix clients use long polling to get the latest state from the server, it becomes an issue when you have a lot of clients because:

  • homeserver needs to process a lot of requests, which by the way may return nothing
  • it affects network bandwidth, each time client sends request it creates a new HTTP request with headers, cookies and other stuff
  • for mobile clients, it spends their cellular Internet and eats money
  • when homeserver uses SSL over HTTP (what is recommended), clients are doing again and again the most expensive operation, the TLS handshake

So, instead of long polling I propose to implement sync logic over Server Sent Events(SSE)

Proposal

Server Sent Events(SSE) is a way for servers to push events to clients. It was a part of HTML5 standard and now available in all major web and mobile browsers.
It was specifically designed to overcome challenges related to short/long polling. By introducing this technology, we can get the next benefits:

  • only 1 persisted connection per client that is kept open "forever".
  • SSE is built on top of HTTP protocol, so can be used in communication between servers
  • SSE is more compliant with existing IT infrastructure like (Load Balancer, Firewall, etc)
  • web and mobile browsers support automatic reconnection and Last-Event-Id header out of the box
  • Matrix protocol is built over HTTP, so SSE should fit good in protocol specification

@stalniy stalniy force-pushed the stalniy:sse-support branch from 9b268ee to 9cde100 Jun 11, 2019

@turt2live

This comment has been minimized.

Copy link
Member

commented Jun 11, 2019

@stalniy thanks for making this! Just two things need to be done before this is ready for review:

  • Please line wrap around 90 characters for easier review
  • Please Sign Off on the changes so we are able to merge the proposal later.

@turt2live turt2live changed the title Sync over Server Sent Events MSC2108: Sync over Server Sent Events Jun 11, 2019

@turt2live turt2live added the proposal label Jun 11, 2019

@stalniy

This comment has been minimized.

Copy link
Author

commented Jun 11, 2019

Cool! will do in few hours

feat(sync): adds proposal for SSE to get the latest state
Signed-off-by: Sergii <sergiy.stotskiy@gmail.com>

@stalniy stalniy force-pushed the stalniy:sse-support branch from 9cde100 to d35fb83 Jun 11, 2019

@stalniy

This comment has been minimized.

Copy link
Author

commented Jun 11, 2019

Added Signed-off-by: Sergii <sergiy.stotskiy@gmail.com> to commit message.
Also wrapped lines ~ 90 characters in md file.

@Half-Shot
Copy link
Contributor

left a comment

Idea sounds quite neat but more clarifications are needed in the doc.

Show resolved Hide resolved proposals/2108-sync-via-server-sent-events.md Outdated
Show resolved Hide resolved proposals/2108-sync-via-server-sent-events.md Outdated
Show resolved Hide resolved proposals/2108-sync-via-server-sent-events.md
Show resolved Hide resolved proposals/2108-sync-via-server-sent-events.md Outdated
Show resolved Hide resolved proposals/2108-sync-via-server-sent-events.md Outdated
* instead of using the `since` query parameter, the next batch token will be passed through the `Last-Event-ID` header.
* each event will have the same format as what `/sync` returns. The id of each event will be the `next_batch` token
* the server sends events in exactly the same way that it would send responses to `/sync` calls with the `since`
parameter set to the previous `next_batch`

This comment has been minimized.

Copy link
@Half-Shot

Half-Shot Jun 12, 2019

Contributor

As per above, perhaps you could give a description on what an event is in SSE terms? I assume a SSE event is the body being sent along the wire, but it's not explicit here.

This comment has been minimized.

Copy link
@stalniy

stalniy Jun 18, 2019

Author

Added links to documentation for the shape of matrix sync endpoint and for the format of SSE event.

This comment has been minimized.

Copy link
@turt2live

turt2live Jun 26, 2019

Member

It still doesn't make any sense to me: how is a client meant to resume their sync stream? All they'd have is a Last-Event-ID and a filter.

This comment has been minimized.

Copy link
@uhoreg

uhoreg Jun 26, 2019

Member

If their call to /sync/sse gets killed, then they just make a new call, with the Last-Event-ID header set to the last SSE event ID that it saw.

It may be helpful to give an example of how the requests/responses look, so that people don't have to go digging through the SSE docs.

This comment has been minimized.

Copy link
@turt2live

turt2live Jun 26, 2019

Member

+1 to examples. The SSE docs are a bit too complicated to try and parse in parallel here. Doesn't even need to be valid SSE:

GET /_matrix/client/r0/sync/sse
Last-Event-ID: 12

{"sync": "response_here"}
Show resolved Hide resolved proposals/2108-sync-via-server-sent-events.md
Show resolved Hide resolved proposals/2108-sync-via-server-sent-events.md Outdated
Show resolved Hide resolved proposals/2108-sync-via-server-sent-events.md Outdated
@stalniy

This comment has been minimized.

Copy link
Author

commented Jun 18, 2019

I looked through the changelog of Client-Server communication but cannot find any information of why /events API was deprecated. Can somebody clarify please?

In terms of SSE, it make sense to publish new events to clients as soon as changes are applied by homeserver, so I guess that most of the times almost all fields of /sync response will be empty, except 1 which actually contains payload for a single event that represent a change.

So, right now I have doubts whether it make sense to reuse the payload format of /sync endpoint

@stalniy

This comment has been minimized.

Copy link
Author

commented Jun 25, 2019

Any news?

@turt2live turt2live self-requested a review Jun 25, 2019

@uhoreg

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

I looked through the changelog of Client-Server communication but cannot find any information of why /events API was deprecated. Can somebody clarify please?

I'd guess that it's because /sync can do the same thing, but includes other things too.

In terms of SSE, it make sense to publish new events to clients as soon as changes are applied by homeserver, so I guess that most of the times almost all fields of /sync response will be empty, except 1 which actually contains payload for a single event that represent a change.

So, right now I have doubts whether it make sense to reuse the payload format of /sync endpoint

I don't think it should be too bad, because all of the top-level items (except for next_batch) in the response can be omitted, so if the payload is just a single event, then only the relevant sections will be included. (I don't think synapse omits empty sections in /sync, but that's synapse's fault.)

@stalniy

This comment has been minimized.

Copy link
Author

commented Jun 26, 2019

Is there something else I can do to move this forward? Or may I start playing with implementation?

but instead propose to use a different underlying technology to do this. So:
* lets expose `/sync/sse` URL for SSE in order to be backward compatible with other clients and servers
* this URL returns the same data as continually calling `/sync`
* it accepts the same parameters as `/sync`, except `since` and `timeout`

This comment has been minimized.

Copy link
@turt2live

turt2live Jun 26, 2019

Member

how does set_presence work in this proposal?

This comment has been minimized.

Copy link
@uhoreg

uhoreg Jun 26, 2019

Member

Good question. My guess is that it would be equivalent to continually calling /sync with the same set_presence parameter, for as long as the SSE connection is alive. When the SSE connection dies, then the behaviour would be the same as a client no longer calling /sync. Maybe this should be made more explicit.

but instead propose to use a different underlying technology to do this. So:
* lets expose `/sync/sse` URL for SSE in order to be backward compatible with other clients and servers
* this URL returns the same data as continually calling `/sync`
* it accepts the same parameters as `/sync`, except `since` and `timeout`

This comment has been minimized.

Copy link
@turt2live

turt2live Jun 26, 2019

Member

how does full_state work in this proposal?

This comment has been minimized.

Copy link
@uhoreg

uhoreg Jun 26, 2019

Member

Good question. I think that there are three options:

  1. drop full_state for SSE. If a client wants to do full_state, they can call /sync to start with, and then continue the sync using SSE.
  2. only return the full state for the first SSE event, and incremental state for subsequent SSE events
  3. return the full state for event SSE event. The first SSE event gets sent immediately (as it would when calling /sync with full_state), but subsequent SSE events only get sent when there's actually something new

Personally, I think my preference would be 2, then 1, then 3, with 1 and 2 being fairly close, and 3 being much less preferred.

So, instead of long polling I propose to implement
sync logic over [Server Sent Events][mdn-sse](SSE)

## Proposal

This comment has been minimized.

Copy link
@turt2live

turt2live Jun 26, 2019

Member

There's no mention of how limited syncs work in this proposal. Syncs can be limited due to a filter or due to the server's maximum willingness to serve events.

The existing sync endpoint is built for long polling, which doesn't really make it suitable for SSE. Although the sync format does lend itself to being a nice and backwards compatible data structure, I think it would be best if we used a more stream-oriented structure for SSE.

This comment has been minimized.

Copy link
@uhoreg

uhoreg Jun 26, 2019

Member

My thought on this was that it would work just like calling /sync repeatedly. For example, if you call /sync/sse with the Last-Event-ID set to some value, and the equivalent /sync call would have return a limited sync, then the first SSE event that you get will also be a limited result. Also, if the server is streaming SSE events, and you suddenly get a billion events in one room, then the next SSE event that it sends will be limited. It doesn't quite fit perfectly with a streaming model (for example, calling /sync/sse with the same Last-Event-ID two times won't give you the same result for the first event), but it fits with the way that Matrix currently handles truncating results. And I don't think SSE has a native method of skipping over events.

This comment has been minimized.

Copy link
@uhoreg

uhoreg Jun 26, 2019

Member

Again, adding some examples to the proposal may be useful.

@turt2live

This comment has been minimized.

Copy link
Member

commented Jun 26, 2019

Is there something else I can do to move this forward? Or may I start playing with implementation?

@stalniy: typically review takes a lot longer than we'd like. Implementations help prove that a MSC is needed and serves a purpose, however at this early stage in the proposal process it doesn't make much sense to engineer a perfect implementation. Something which shows the idea as a proposal aide is certainly worthwhile, given the complexity of this proposal (ie: a plain HTML+JS page which dumps JSON bodies into the DOM as a demo). A proper implementation demonstration is required later in the proposal stages.

For the time being, it's best to bring it up every couple months in #matrix-spec:matrix.org to try and solicit review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.