Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Client-Server v2 Design #4

Merged
merged 62 commits into from Jan 21, 2015

Conversation

Projects
None yet
5 participants
Owner

ara4n commented Dec 24, 2014

Why did the list of 10 use cases at the beginning of client_server_use_cases.log get deleted?

I'm not entirely sure how these use cases are intended to factor into the CS v2 API discussion, so it's a bit hard to comment.

On the data-flows for forums, I'm not convinced that each thread should be a room. Instead, it's surely analogous to having multiple threads of conversations within a single room, where the room is the forum - just like on an IRC-style room. So i'd suggest we just support threading messages within a room, and possibly the ability to filter on specific thread IDs?

Kegsay added some commits Dec 29, 2014

Add general API design
Several missing sections: VoIP, presence, typing, but many core ordering and event mapping issues are addressed.
Contributor

Kegsay commented Dec 30, 2014

Why did the list of 10 use cases at the beginning of client_server_use_cases.log get deleted?

Restored.

Member

dbkr commented Jan 5, 2015

Client capabilities will need to take into account capabilities of clients that do not have an active stream but are being pushed for events. These clients could still answer VoIP calls from a push and so their caps should advertise VoIP. However, we need to be wary of keeping stale capabilities if I install a fancy client that supports VoIP and push then decide I don't like it and uninstall it (potentially without logging out). Does everyone think I still support VoIP because that client is being pushed (even if the push is going into the void)?

@Kegsay Kegsay changed the title from Client Design to Client-Server v2 Design Jan 5, 2015

Owner

erikjohnston commented Jan 5, 2015

Global /initialSync API

  • For each room the user is joined: Name, topic, # members, last message, room ID, aliases

Why are name and topic events special cased here? Can we not make this generic?

Event Stream API

  • Home servers may receive state events over federation that are superceded by state events previously sent to the client. The home server cannot send these events to the client else they would end up erroneously clobbering the superceding state event.
  • As a result, the home server reserves the right to omit sending state events which are known to be superceded already.

Why? This will mean that people will see different history depending on the exact timings of their event stream, and if they are looking at it live or if they are paginating back. Just because a topic has been replaced doesn't mean I don't want to know that someone tried to change the topic just before that. The inconsistency here could lead to mass confusion if half the room see a state event and the other half don't.

  • For clients which do not persist scrollback for a room, this is not a problem as they only care about the recent messages.

The definition of "persist" here is ambiguous, as it presumably also includes clients that "persist" events in memory. Perhaps "store" is a better word? e.g.
"For clients which do not store (either on disk or in memory) events received for a room. For example, clients which process events and then immediately throw the events away"

  • If this happen, the home server will send a m.room.redaction for the event in question.
  • If the event was a state event, it will synthesise a new state event to correct the client's room state.

These need to be local server events.

Room Creation

  • Invitee list of user IDs, public/private, name of room, alias of room, topic of room

Joining a room

  • Room ID, Room aliases (plural), Name, topic, member list (f.e. member: user ID, avatar, presence, display name, power level, whether they are typing), enough messages to fill screen (and whether there are more)

Again, why are we special casing "name" and "topic" events here?

  • We propose that a server-generated event is sent down the event stream to all clients, rather than annotating the join event. The server-generated event works nicely for Application Services where an entity subscribes to a room without a join event.

How does this work in practice? What happens between the client getting a join event and getting the generated event?

Action APIs

Send a message

  • HTTP: When sending a message with a higher seqnum, it will block the request until it receives earlier seqnums. The block will expire after a timeout and reject the message stating that it was missing a seqnum.

What are the failure modes here? The most worrying one I can currently think of is if a HTTP library only allows a max number, say 5, concurrent connections (e.g. a web browser does this). If the first PUT blackholes and 5 other requests get stuck behind it, then the retry of the original PUT gets blocked behind the requests waiting for it, the only way out of this is when one of other requests times out.

  • For signing: You send the original message to the HS and it will return the full event JSON which will be sent. This full event is then signed and sent to the HS again to send the message

This does not need to be happen for E2E, since that only protects contents. The only time this would be useful is if a client wanted to sign the entire event for some paranoid reason.

Sessions

A session is a group of requests sent within a short amount of time by the same client.

The phrase "within a short amount of time" scares me here. What do you mean by that? Surely we want sessions to last as long as the client is being used? For a desktop app this would be hours, if not days/weeks/months.

If the server expires a session and the client uses an old session ID, the server should fail the request with the old session ID and send a new session ID in response for the client to use.

Do we really want to be so brutal? Having to resend a request just because the server has timed us out seems like a bit of a pain.

How does this work with "clients" that only want to send the odd message every now and again, and don't want to have to start sessions?

In reply to (Events)

This differs from the updates key as they do not update the event itself, and are not required in order to display the parent event.

This, to me, implies that you can't display an event unless you have all of its child events. How do you know if you have them all? I think the intention with this sentence was to indicate that whenever a server tells clients about an event it always includes any child events that updated it.

Example using updates and in_reply_to

The example needs some ellipses.

Capabilities

Server

It seems very odd to me to force clients to do two requests to get the current server capabilities, even if it does make it more consistent with client capabilities.

Client

This seems to be all about trying to do a massive optimization. I think there needs to be a discussion about what and how we should be optimizing things. For example, should we be optimizing against bandwidth or the number of round trips to get a users current capabilities?

  • Should we hash the union of capabilities for the different devices? Or should we give a list of hashes.
  • How do we handle different versions of the same capability (e.g. V1 vs V2 VoIP)?
  • Should persistent/static capabilities be in separate hashes? These are capabilities which don't change when a device goes online/offline.
Owner

erikjohnston commented Jan 5, 2015

Things that I think are missing here:

  • Server level events.
  • A client might receive an event from a room it doesn't know about, either because:
    1. Federation only just found out you were in the room (this is possible if your server lost its history and then got told it was in the room again)
    2. Client did not pull in room in global initial sync (if it paginated).
  • Per device presence.
  • Offline mode, i.e. requests won't cause to come online.
  • Linking messages to typing notifications, to try and avoid having to constantly do "send message" and then "stop typing". This also fixed the annoyance where receiving users see a message and /then/ see the stop typing notif, which might come down quite a while later.
Owner

erikjohnston commented Jan 5, 2015

How are we doing delivery/read receipts? A client event per event to give feedback on?

Contributor

Kegsay commented Jan 5, 2015

The suggestion is that we could do individual receipts, but I would prefer a "read up to" marker simply for better scaling. What this looks like is not specified.

Sessions time out after a short amount of time without any requests. A Web app would be listening and so be making requests. Can you provide a case where this is insufficient?

Name and topic events are mainly there to say "state events". They were included because the api was derived from use cases which used name and topic. The only case where we may want to limit the kinds of state events is on public room lists which the client hasn't joined. You don't want to expose the entire room state in this case.

Owner

erikjohnston commented Jan 5, 2015

The suggestion is that we could do individual receipts, but I would prefer a "read up to" marker simply for better scaling. What this looks like is not specified.

Maybe we should specify it? :)


Sessions time out after a short amount of time without any requests.

This is not the same as:

A session is a group of requests sent within a short amount of time by the same client.

The first one (to me) means that the session is kept active and times out, the second means that we a session is grouping together a bunch of requests that happen to have been sent roughly at the same time.

Contributor

NegativeMjark commented Jan 6, 2015

When specifying the HTTP API we should prefix the URLS with "/_matrix/client/v2" rather than "/_matrix/client/api/v2" to make it consistent with the other API prefixes.

Contributor

NegativeMjark commented Jan 6, 2015

We might need a few different filter options for how "relates_to" is bundled to the client.

  • When does the bundle appear in the stream? For the email use-case we want the thread to appear at the most recent message. For comment-sections on google+ style posts we want the thread to appear at the earliest message.
  • How is the bundle ordered? Do we want the most recent messages or the oldest messages first?
  • For an email use-case we might want to display the most recent message, followed by its parents for context, followed by their parents, etc. This effectively inverts the graph so that the most recent message is at the root.
Owner

ara4n commented Jan 6, 2015

I also continue to really dislike the fact that "relates_to" only lets you specify a single relationship between two events of the same type. Forcing that m.room.message A "relates_to" another m.room.message B means that A is a threaded conversation reply to B is needlessly restrictive. What if A and B relate to each other by some other metric (e.g. they are part of a group rather than a thread? or we distinguish mail-style threading from multithreaded-IM threading for the same set of messages? etc)?

Contributor

NegativeMjark commented Jan 6, 2015

I also continue to really dislike the fact that "relates_to" only lets you specify a single relationship between two events of the same type. Forcing that m.room.message A "relates_to" another m.room.message B means that A is a threaded conversation reply to B is needlessly restrictive. What if A and B relate to each other by some other metric (e.g. they are part of a group rather than a thread? or we distinguish mail-style threading from multithreaded-IM threading for the same set of messages? etc)?

There is a good argument here for extending "relates_to" to include information about how the message relates to the message other than just through it's event type.

I'd suggest that we extend the "relates_to" to take a list of pairs of event_ids and relationship types.

"relates_to" : [["in_reply_to", "$event_id1"], ["another_type_of_relation", "$event_id2"]]

The filter API would need flags to control which relationships were and were not bundled within an event.

If the mechanism was this generic would we still need a separate "updates" key? Or would "updates" become:

"relates_to" : [["updates", "$event_id2"]]
Contributor

Kegsay commented Jan 7, 2015

I like Mark's idea. For what it's worth, the issues brought up with pagination are not unique to relates_to. For example, global initial sync has a section on specifying whether you want the X most recent rooms, but you could also see a client wanting the X oldest rooms. Given there are so many APIs which are using pagination (scrollback (both message events and relates_to), global initial sync, contextual windowing, public room lists) would it be worth having a general pagination token API? For scrollback, which takes two forms of pagination, you would provide 2 tokens e.g. /scrollback?pagin_relates_to=foo&pagin_msg_events=bar. This would keep things pretty clear and provide a consistent way of doing pagination. This API would be able to configure things like the order (most recent vs oldest), the size of a chunk (the limit= param), etc.

Contributor

Kegsay commented Jan 8, 2015

https://matrix.org/jira/browse/SPEC-14 should be included. We should support something akin to sticky threads / flagging messages which bing. Whether this forms part of the event JSON spec or is a flag you can give when sending a message event is debatable.

Owner

ara4n commented Jan 11, 2015

surely metadata like "this msg is a notice" or "this msg is sticky/flagged" should be on the event itself - especially for mutable flags. and the api then needs to support querying on said metadata for presenting sticky msgs etc

Contributor

Kegsay commented Jan 11, 2015

Yes it needs to be on the event, but where? If it is in the content then that means m.room.message events will need an additional key on the event content to state this, and the home server cannot enforce any rules, and it would force us to make individual keys on event content filterable. If it is a flag on the cs api then the hs could feasibly do some sanity checks and add the key on the top level of the event (like origin_server_ts).

Owner

ara4n commented Jan 11, 2015

Yes it needs to be on the event, but where?

Event type (but means it might be invisible if clients didn't know how to interpret it)

Content type (but it needs to be orthogonal to content type so you can have sticky image messages etc)

If it is in the content then that means m.room.message events will need an additional key on the event content to state this,

Sounds like this is the way to go.
and the home server cannot enforce any rules,

Why not?
and it would force us to make individual keys on event content filterable.

And why would this be a bad thing?

If it is a flag on the cs api then the hs could feasibly do some sanity checks and add the key on the top level of the event (like origin_server_ts).


Reply to this email directly or view it on GitHub.

Contributor

Kegsay commented Jan 11, 2015

Hmm, you can actually enforce rules on it, just like we do for m.room.member events, so ignore me. A key on the content wfm

Owner

erikjohnston commented Jan 11, 2015

(You can't enforce it if the content has been encrypted)

Owner

ara4n commented Jan 11, 2015

(You can't enforce it if the content has been encrypted)

unless it's a new unencrypted field like event type, placed in the layer above content? (and we submit events including the layer above)=

Owner

ara4n commented Jan 11, 2015

More questions which came up whilst reviewing today:

  • Is it deliberate that we have separate client-server-v2.rst and general_api.rst docs? A lot of the useful context from client-server-v2.rst is badly needed in general_api.rst (i.e. the whole API Changes Summary... and, well, the whole rest of the document?)
  • The 'excluded' summary of changes in general_api.rst seems very sane... other than "Multiple devices (other than VoIP)". This is confusing, as multiple device support is absolute table stakes baseline (we expect everyone to use matrix on multiple matrix-enabled apps on both web and mobile simultaneously)... and i'm not seeing anywhere in the draft where we're actually designing this out? Is this a typo?
  • In "Rejected events", it says: "A homeserver may find out via federation that it should not have accepted an event (e.g. to send a message/state event in a room)". Under what circumstances does this happen? Is this a race between receiving a message from one federated server and an auth event state change denying it from another? If so, we need to make this clearer.
  • We don't seem to define what "public" and "private" rooms mean anywhere. And to be honest I'm still not sure what it means. Is a public room one that you can join without an invite (or other access credential)? Or is it whether a room's alias has been published in a public directory somewhere?
  • Why is contextual windowing a separate API from handling scrollback?
  • Why has the concept of "streaming token" that is all over client-server-v2.rst seemingly disappeared?
  • How do I keep unread message count in sync across multiple devices?
Contributor

Kegsay commented Jan 12, 2015

I have removed client-server-v2.rst entirely to avoid confusion. general_api.rst replaces it entirely. Cleared up the 'excluded' summary since that's not really what I mean. Clarified "Rejected events" section. Factored out room history API commonality. Clarified the types of tokens that cs v2 proposes.

We don't seem to define what "public" and "private" rooms mean anywhere. And to be honest I'm still not sure what it means. Is a public room one that you can join without an invite (or other access credential)? Or is it whether a room's alias has been published in a public directory somewhere?

Clarified as "published rooms" rather than "public rooms" to avoid confusion with the join_rules of a room, which can be invite or public (anyone is free to join it).

How do I keep unread message count in sync across multiple devices?

You inspect the read-up-to marker that comes down your event stream. There's nothing special about keeping this bit of data in sync vs everything else (e.g. room member state, room name, etc).

Owner

erikjohnston commented Jan 12, 2015

Some of the points I made earlier seem to have been missed:

Comments on the general_api.rst

Event Stream API

  • Home servers may receive state events over federation that are superceded by state events previously sent to the client. The home server cannot send these events to the client else they would end up erroneously clobbering the superceding state event.
  • As a result, the home server reserves the right to omit sending state events which are known to be superceded already.

Why? This will mean that people will see different history depending on the exact timings of their event stream, and if they are looking at it live or if they are paginating back. Just because a topic has been replaced doesn't mean I don't want to know that someone tried to change the topic just before that. The inconsistency here could lead to mass confusion if half the room see a state event and the other half don't.

  • If this happen, the home server will send a m.room.redaction for the event in question.
  • If the event was a state event, it will synthesise a new state event to correct the client's room state.

These need to be local server events.

Joining a room

  • We propose that a server-generated event is sent down the event stream to all clients, rather than annotating the join event. The server-generated event works nicely for Application Services where an entity subscribes to a room without a join event.

How does this work in practice? What happens between the client getting a join event and getting the generated event?

Action APIs

Send a message

  • HTTP: When sending a message with a higher seqnum, it will block the request until it receives earlier seqnums. The block will expire after a timeout and reject the message stating that it was missing a seqnum.

What are the failure modes here? The most worrying one I can currently think of is if a HTTP library only allows a max number, say 5, concurrent connections (e.g. a web browser does this). If the first PUT blackholes and 5 other requests get stuck behind it, then the retry of the original PUT gets blocked behind the requests waiting for it, the only way out of this is when one of other requests times out.

  • For signing: You send the original message to the HS and it will return the full event JSON which will be sent. This full event is then signed and sent to the HS again to send the message

This does not need to be happen for E2E, since that only protects contents. The only time this would be useful is if a client wanted to sign the entire event for some paranoid reason.

Sessions

A session is a group of requests sent within a short amount of time by the same client.

The phrase "within a short amount of time" scares me here. What do you mean by that? Surely we want sessions to last as long as the client is being used? For a desktop app this would be hours, if not days/weeks/months.

If the server expires a session and the client uses an old session ID, the server should fail the request with the old session ID and send a new session ID in response for the client to use.

Do we really want to be so brutal? Having to resend a request just because the server has timed us out seems like a bit of a pain.

How does this work with "clients" that only want to send the odd message every now and again, and don't want to have to start sessions?

In reply to (Events)

This differs from the updates key as they do not update the event itself, and are not required in order to display the parent event.

This, to me, implies that you can't display an event unless you have all of its child events. How do you know if you have them all? I think the intention with this sentence was to indicate that whenever a server tells clients about an event it always includes any child events that updated it.

Things I still think are missing

  • Server level events.
  • A client might receive an event from a room it doesn't know about, either because:
    1. Federation only just found out you were in the room (this is possible if your server lost its history and then got told it was in the room again)
    2. Client did not pull in room in global initial sync (if it paginated).
Contributor

Kegsay commented Jan 12, 2015

These need to be local server events.

This has already been specified. "This will be a local server event (not shared with other servers)."

What are the failure modes here? The most worrying one I can currently think of is if a HTTP library only allows a max number, say 5, concurrent connections (e.g. a web browser does this). If the first PUT blackholes and 5 other requests get stuck behind it, then the retry of the original PUT gets blocked behind the requests waiting for it, the only way out of this is when one of other requests times out.

This has already been marked as a problem on the send a message API: "[...]Is this even practical, given clients have a limit on the number of concurrent connections? [...]" and Action IDs: "Blocking requests with higher seqnums is troublesome if there is a max # of concurrent connections a client can have open."

The phrase "within a short amount of time" scares me here. What do you mean by that? Surely we want sessions to last as long as the client is being used? For a desktop app this would be hours, if not days/weeks/months.

I responded to this in the comments of this PR, and clarified the meaning of this in a later commit: "[...]Sessions time out after a short amount of time without any requests.[...]"

Do we really want to be so brutal? Having to resend a request just because the server has timed us out seems like a bit of a pain.

I have already added this feedback as a known issue in the Action IDs section: "Session expiry: Do we really have to fonx the request if it was done with an old session ID?"

Server level events.

What specifically is missing? It's mentioned that servers can send events. Some server-generated events are even specified e.g. m.homeserver.scrollback in the Joining API.

A client might receive an event from a room it doesn't know about

This has already been specified in the "Unknown rooms" section of the event stream API.


As for the bits I have not addressed:

This, to me, implies that you can't display an event unless you have all of its child events. How do you know if you have them all? I think the intention with this sentence was to indicate that whenever a server tells clients about an event it always includes any child events that updated it.

I'll tweak the phrasing of this.

How does this work with "clients" that only want to send the odd message every now and again, and don't want to have to start sessions?

What's the use case here? The session ID is just returned in response to the first request made without a session ID, so I don't really see the problem here.

This does not need to be happen for E2E, since that only protects contents. The only time this would be useful is if a client wanted to sign the entire event for some paranoid reason.

Is there any reason to have multi-stage signing then, or can I nuke that entirely?

Why? This will mean that people will see different history depending on the exact timings of their event stream, and if they are looking at it live or if they are paginating back. Just because a topic has been replaced doesn't mean I don't want to know that someone tried to change the topic just before that. The inconsistency here could lead to mass confusion if half the room see a state event and the other half don't.

Can you provide an alternative which doesn't place unreasonable requirements on the client? The best alternative I can think of is giving the clobbered event then sending the current state again immediately after it (which still creates an inconsistent state since history is wrong). I think this is a compromise we need to make for eventual consistency.

How does this work in practice? What happens between the client getting a join event and getting the generated event?

This is mainly for the benefit of other devices. If a device performs the join, they will get a chunk of events in response to the join, along with a token so they aren't blocked going into the next screen. For other devices however, they will just begin getting events for a new room they joined, without a chunk token to scrollback if desired. This is the use case we're trying to resolve here (which isn't clear at all in the doc; will clarify). There isn't anything of interest between the join and the server-generated event, it just means other devices won't be able to scrollback until they get it.

Owner

erikjohnston commented Jan 12, 2015

Okay, cool. It looks like I missed a lot of these things because they were addressed in different places than where I noted the problems.

I responded to this in the comments of this PR, and clarified the meaning of this in a later commit: "[...]Sessions time out after a short amount of time without any requests.[...]"

I still think the statement "A session is a group of requests sent within a short amount of time by the same client." is completely misleading/wrong, and the clarification doesn't contradict this.

Server level events.

What specifically is missing?

I guess mostly what they look like and how they behave. Are they just like regular events? How do you tell a normal event from a server local one? Can they be sent? What about EDU style ones? E.g. for server status notifications. Some of this should probably go into use cases I guess.

How does this work with "clients" that only want to send the odd message every now and again, and don't want to have to start sessions?

What's the use case here? The session ID is just returned in response to the first request made without a session ID, so I don't really see the problem here.

The use case is for automatic scripts that want to send a message to a room in the simplest way. E.g. like all the various automatic IRC notices we have. I guess its probably fine so long as bots don't /have/ to remember session ids.

Why do clients need to store session ids? Can that not be done purely server side?

Why? This will mean that people will see different history depending on the exact timings of their event stream, and if they are looking at it live or if they are paginating back. Just because a topic has been replaced doesn't mean I don't want to know that someone tried to change the topic just before that. The inconsistency here could lead to mass confusion if half the room see a state event and the other half don't.

Can you provide an alternative which doesn't place unreasonable requirements on the client? The best alternative I can think of is giving the clobbered event then sending the current state again immediately after it (which still creates an inconsistent state since history is wrong). I think this is a compromise we need to make for eventual consistency.

Stick an extra flag on a state event to indicate whether it should clobber existing state or not?

How does this work in practice? What happens between the client getting a join event and getting the generated event?

This is mainly for the benefit of other devices. If a device performs the join, they will get a chunk of events in response to the join, along with a token so they aren't blocked going into the next screen. For other devices however, they will just begin getting events for a new room they joined, without a chunk token to scrollback if desired. This is the use case we're trying to resolve here (which isn't clear at all in the doc; will clarify). There isn't anything of interest between the join and the server-generated event, it just means other devices won't be able to scrollback until they get it.

My worry here is that a client will have to wait between a "join" event and another server generated event. It's always annoying from a client and what goes wrong perspective if we have to wait asynchronously for something before we can process it. Unless I'm misunderstanding what is happening?

Owner

erikjohnston commented Jan 12, 2015

Also: Invites should probably include some information about the room other than the room id and who invited you. What should these things be? Name, topic, member lists? Should it also include a "reason" field so that people can give a reason for the invite?

Kegsay added some commits Jan 12, 2015

Initial Sync: Ongoing > Draft with additional work
Descoped some points to v2.1 as discussed irl.
Add session API. Add server-generated events.
Moved 'Inviting' API from Final to ONGOING in light of issues brought up
from PR comments and irl discussions.
Contributor

Kegsay commented Jan 13, 2015

Why do clients need to store session ids? Can that not be done purely server side?

This is an interesting point. The client needs to know its session so it can reset its action ID when it starts a new session. It may be easier to say "if you get back a 'reset action ID' flag on your request, then reset your action IDs". The concept of a session would still be present, but the boilerplate of having to send it on every request would be avoided. This needs some input from Mark.

Stick an extra flag on a state event to indicate whether it should clobber existing state or not?

That, combined with being an "out-of-order" event (so it can be inserted into the right place in the message history screen) could possibly work. It's an annoying extra check which needs to be done for every state event ever though, but I suppose if it only comes down with the "out of order" flag set then this could actually work.

My worry here is that a client will have to wait between a "join" event and another server generated event. It's always annoying from a client and what goes wrong perspective if we have to wait asynchronously for something before we can process it. Unless I'm misunderstanding what is happening?

You have the right idea. My counter to that was simply along the lines that "given this device was not the one that triggered the action, it is unlikely that they will require that data immediately, so relying on the async response is acceptable." As I said before, this does not apply to the device which clicked through to the room, since they got all the data they needed in response to the request.

Also: Invites should probably include some information about the room other than the room id and who invited you. What should these things be? Name, topic, member lists? Should it also include a "reason" field so that people can give a reason for the invite?

I've marked these notes on the Invite API and brought it down to ONGOING as a result.

Contributor

Kegsay commented Jan 13, 2015

Erik, Mark, Dave and I had a discussion on the concept of Sessions/Action IDs to try to unblock some of the problems raised. General notes from this are below:

Sending messages

Problems trying to solve:

  • Need to get ordering right (aka sequence numbers)
  • Need idempotency

Original proposal:

  • Use action IDs which handle both cases.
  • These IDs reset to 0 when a new session is started.

Problems raised with original proposal:

  • Linking action IDs to sessions sucks because if you get an expired session the server has to reject the action.
  • Combining ordering and idempotency isn't great when you're a bot who wants to be idempotent but doesn't give a damn about ordering. This also has client implementation ramifications. Some simple scripts for example may be able to only store state scoped to a request. You can do idempotency like this, but can't do sequence numbers since you need global state for all requests.
  • Ordering has huge amounts of edge case woes, whilst being deceptively simple to implement. This is not good design.

Why ordering is Hard:

  • The server needs to send requests in the right order, meaning it needs previous requests before it can send the current request.
  • This dependency link creates a house of cards scenario where if 1 request is lost, then every request after that gets wedged. If "wedged" is a literal "the connection is held open", this creates concurrent connection problems since some clients may have a limit to the number of concurrent connections they can have open. This can create an infinite wedge if, when the connections time out, you do not retry to lowest sequence number first. This is a subtle bug which can make the client be unable to send any requests at all.
  • If you instead close the connection, it's misleading that the 200 OK to send a message isn't actually that you've sent the message, because you're waiting on earlier sequence numbers. This forces you to run some async method (e.g. an event stream) to know if the request you sent was "actually" sent.
  • In addition, there is additional client complexity when ordering because there are edge cases where you MUST re-sequence the pending requests. Imagine sending 5 messages. A naive client will mark the first message with a seqnum of 1, the second as 2, etc. Now imagine that the first request ALWAYS fails (e.g. it contains blacklisted words). The client needs to have coping strategies to decrement the sequence numbers of all the pending requests, else the server will not send them as it will be waiting for the first request.

Proposed solution:

  • A lot of these problems are protocol specific. They go away if you use something like websockets.
  • Accept that HTTP needs to be synchronous if you want the ordering (that is, waiting for the first request to 200, then send the next one).
  • Use batching to mitigate the delays (e.g. send first request, then wait for response. If whilst waiting there are 3 additional messages to send, then blob them into an array to send as the second request when the first 200s).
  • Action IDs no longer linked to a session (there is no need to do this if you don't do ordering, since the whole point was to reset the counter to 0 when a new session started).
  • Action IDs now revert to their v1 purpose of being transaction IDs, purely for idempotency.

Sessions

The session concept was designed to combine APIs which could "time out" through network connectivity losses. The idea was that there would be a single timer which could detect if they were connected or not. In practice, this combined the typing timeout and the presence timeout (which in v1 is implicitly linked to the event stream).

Problems:

  • Expiring random bits of EDU-like state when the server decides you've timed out is confusing. Communicating this to the client even moreso. In the current proposal, this meant rejecting the next action and telling the client to start a new session.
  • Determining the liveness of a session relies on making requests. This effectively means polling the event stream, but we skirt around this by implicitly making sessions on every request.

Typing

We propose splitting out the typing part of Sessions to be as v1 does it. The assumption here is that there isn't going to be lots of client timers repoking the server saying they are still typing, given that you'll on average have 1 room set to typing per user. This makes understanding timers for typing easier to understand.

Presence

We would prefer to not have to keep repoking the server to say the client is alive, when we are already implicitly doing this via the event stream. We also want a way to "appear offline" whilst still making requests.

We propose swapping the concept of a "session" which is determined by the server with a concept of a "launch-id" which is determined by the client. A "launch-id" of a client is the period in which a client will be persisting EDU-like data (e.g. presence, typing). For mobile devices, this is set when the app launches (hence the term), and a new launch ID is generated whenever the app is relaunched from cold.

With this new concept, we propose coupling more explicitly the relationship between launch-ids, presence and the event stream. This involves adding 2 new flags to the event stream: new (indicating a new launch or an existing launch ID) and set presence. The simple use case where a client is going online then offline:

  Client => boots up and generates launch ID "Q".
  Client => initial sync
  Client => hits the event stream with:
                 launch-id=Q&new=true&set_presence="ONLINE"
  Client <= Receives normal event stream response.
  Client => constantly repolls with:
                 launch-id=Q

If the client goes into a long tunnel, the server may time out the launch ID. A timed out launch ID results in presence going to offline.When the client exits the tunnel, the repoll will either fail outright (e.g. 4xx), or return a response with something in the body to say "hey, I no longer recognise this launch ID". When the client receives this, they need to regenerate the launch ID: launch-id=R&new=true&set_presence="ONLINE". NB: It may be possible to keep the launch ID the same, since you are explicitly stating that you're creating a new launch already. If you do not specify set_presence, then presence will default to whatever your previous active launch ID was (if you had an existing launch ID) or "offline" if you didn't have any previous launch ID.

Launch IDs are scoped to a device ID / user ID combo (e.g. access_token). Later launch IDs created via new=true clobber older launch IDs, so there is only ever One True Launch ID for the device ID / user ID combo.

This API covers the "appear offline" use case (simply don't set_presence), and it can be expanded to allow more complicated status updates on startup by hitting another endpoint (though this isn't a key part to understand the mechanism).


Summary

  • Revert typing API to function as v1 typing does.
  • Revert action IDs to function as transaction IDs as v1 does.
  • Do not attempt to handle ordering messages when sending, but support and encourage batching to mitigate delays.
  • Scrap the broad notion of session IDs which time out all sorts of stuff and replace with an "Launch ID" which is more closely tied to the event stream and specifically presence.

ara4n and others added some commits Jan 14, 2015

Typing API, Action ID, Event ordering changes
Apply most changes discussed in #4 (comment)
- Revert typing API to be like v1
- Revert Action IDs to be like Transaction IDs are for v1. Keep the echo down the event stream though.
- Add batching notes to some Action APIs and remove ordering by action ID.
Sessions still need to scrapped.
Remove session section
This is no longer required.
Add profile propagation notes
After much IRL discussion
Contributor

Kegsay commented Jan 21, 2015

Revised notes on Presence:

  • There is no concept of sessions, lauch IDs, etc.
  • By default, a device is seen as online whenever it is polling the eventStream. It can provide a presence param (offline/idle/online) whilst polling to override this behaviour.
  • We can also override the presence state at any time (e.g. for setting the device as idle, or as offline, depending on what the client thinks the user is up to) by PUTing to /users/{userId}/presence/m.presence or similar. The server times out the client and sets its presence to offline after 30s (or eventStream poll timeout + 5s or something).
  • When the client starts polling the eventstream again, it can specify the presence to avoid 'flickering' or defaulting to being online when it's really idle.
  • User may set status at any point, applies to the whole user, and is not affected by device or user activity.

Kegsay added a commit that referenced this pull request Jan 21, 2015

Merge pull request #4 from matrix-org/use-cases
Add Client-Server v2 General API Design

@Kegsay Kegsay merged commit 51a7681 into master Jan 21, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment