Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the CHATHISTORY extension specification - refactored #393

Merged
merged 27 commits into from
Nov 9, 2020

Conversation

prawnsalad
Copy link
Contributor

Overhaul from previous CHATHISTORY spec #349
Creating a new PR since the majority of it has now changed and most previous comments are no longer relevant with these changes.

Changes:

  • Explanation of each subcommand
  • Unified and simpler syntax for each subcommand
  • Removal of many ambiguities throughout the spec
  • Removal of conflicting command parameters
  • Removal of unrelated specs being mentioned in examples and notes
  • Simplified and explicitly defined errors

extensions/chathistory.md Outdated Show resolved Hide resolved
extensions/chathistory.md Outdated Show resolved Hide resolved
extensions/chathistory.md Outdated Show resolved Hide resolved
@edk0
Copy link
Contributor

edk0 commented Jun 26, 2019

I think "The returned messages MUST be in ascending time order" (especially, but not only, when combined with the following SHOULD) is wrong for the same line of reasoning expressed here. The spec should not require servers to break the causal order of messages.

It also feels like this section is specifying the batch type, not the command.

@slingamn
Copy link
Contributor

Reraising the two issues I brought up on #349:

  • It's still unclear how targets work with direct messages. (What exactly is a "query name"? If it's a nickname, does it include messages sent to the client under a different nickname? Does it include messages sent from the client?) I think we should specify:
    • that the user's nickname, as a target, includes messages sent by the user
    • that it includes messages sent to and by the user under different nicknames
    • possibly add a special-purpose target like *self for retrieving these messages
  • I think we should either deprecate or remove the use of msgid in the order queries (BEFORE, AFTER, LATEST, and BETWEEN); given the weakness of the guarantees on the relationship between msgid and timestamp, I don't see a good way to use msgid correctly in those contexts

@slingamn
Copy link
Contributor

I got confused about the distinction between AFTER and LATEST despite having actually implemented the previous version of the spec. I think a minor language tweak would help: describing LATEST as "Request the most recent messages that have been sent, excluding anything sent before the given timestamp or msgid".

@prawnsalad
Copy link
Contributor Author

@slingamn

It's still unclear how targets work with direct messages

Now specified. Though I'm not sure in what case your *self suggestion would be used?

@prawnsalad
Copy link
Contributor Author

I think "The returned messages MUST be in ascending time order" (especially, but not only, when combined with the following SHOULD) is wrong for the same line of reasoning expressed here. The spec should not require servers to break the causal order of messages.

It also feels like this section is specifying the batch type, not the command.

After some discussion re. this comment, the working phrase at the moment is:

The returned messages MUST be in chronological order

If no other input comes up I'll update the spec to use this.

@slingamn
Copy link
Contributor

Questions about the BETWEEN semantics with msgids as the query parameters:

  1. BETWEEN says "The returned messages MUST start from the inclusive first message selector", which seems to potentially contradict the general instruction that "The returned messages MUST be in ascending time order", given that the msgid order need not agree with the timestamp order
  2. What should happen if one of the msgids is not found in the message buffer / datastore?
  3. Is it illegal to send a timestamp as one endpoint of the BETWEEN interval and a msgid as the other?

@prawnsalad
Copy link
Contributor Author

  1. BETWEEN says "The returned messages MUST start from the inclusive first message selector", which seems to potentially contradict the general instruction that "The returned messages MUST be in ascending time order", given that the msgid order need not agree with the timestamp order

Message IDs are not reliable for any type of ordering as they are simply an identifier, so I'm not sure in what case you would attempt to order with them?

  1. What should happen if one of the msgids is not found in the message buffer / datastore?

Good point... I've just been returning the empty batch and leaving it at that. I'll add this as a note.

  1. Is it illegal to send a timestamp as one endpoint of the BETWEEN interval and a msgid as the other?

I don't see why it should be limited in this way. If I have a msgid and want to find the next 5minutes worth of conversation this would allow that.

@slingamn
Copy link
Contributor

Message IDs are not reliable for any type of ordering as they are simply an identifier, so I'm not sure in what case you would attempt to order with them?

I actually agree, but if this is so, what are the semantics of BETWEEN msgid=x msgid=y anyway? Basically it seems like we have to decide that the messages are in some linear order, then find where x and y appear in the linear order, then decide which one is first, then return messages from the first (inclusive) to the last (exclusive). But there are potential problems with this:

  1. There's no single linear order of all the messages across the entire network (even if there is on each individual server)
  2. In a scenario where multiple servers are inserting history items into a distributed datastore, the datastore's view of the ordering may not coincide with the order of the messages seen by any one server, or with the delivery order seen by any one client; it may even violate "causal order" constraints

I'm still bearish about the use of msgids in AFTER/BEFORE/BETWEEN/LATEST queries. I think time-based queries have more predictable behavior, since everyone (the client, each server, and a shared datastore) can agree on whether timestamp A precedes timestamp B.

@slingamn
Copy link
Contributor

I'd like to propose removing BETWEEN, since we haven't really been able to determine what it's useful for. (Infinite scroll can use LATEST and then iterated BEFORE; as a sidenote, it might actually be helpful to provide pseudocode for this. If a client really wants the functionality of BETWEEN, they can implement it using AFTER.)

@slingamn
Copy link
Contributor

slingamn commented Sep 19, 2019

What's the rationale for BEFORE being inclusive of the parameter, but AFTER being exclusive? Seems like it would make more sense for them both to be inclusive.

If we also made LATEST inclusive and removed BETWEEN entirely, then all the subcommands would be inclusive, which is relatively straightforward.

@slingamn
Copy link
Contributor

I think input from client developers would be helpful on both of these questions (removing BETWEEN and inclusive vs. exclusive). The impression I'm getting is that both BETWEEN and exclusive queries are more useful if clients plan to use the CHATHISTORY command without support for message deduplication. But even so, the alternative of inclusive BEFORE and AFTER queries is viable without full support for deduplication, so I continue to think that we should remove BETWEEN and make all the queries inclusive.

@slingamn
Copy link
Contributor

I've addressed all the outstanding issues I'm aware of. Let's take comments for a week or so and then merge as a draft?

@slingamn
Copy link
Contributor

Last call for comments!

@slingamn
Copy link
Contributor

Let's merge this on Monday.

@slingamn
Copy link
Contributor

Concerns were raised in #ircv3 about replay of self-messages across nickname changes:

As it currently stands, it seems there is no reliable way of determining whether a message in the chat history was sent by the user itself.

The person raising those concerns suggested adding a self tag to these messages. My own preference is for a split approach:

  1. For channel message history, do nothing
  2. For direct message history, servers MAY (SHOULD?) rewrite self-messages such that the source is the client's current nickmask (i.e., nickmask at the time of the replay)

the motivation being that rewriting is at once less important in the channel setting, and more disruptive (since nickname changes are often "part of the conversation" there). (In the channel setting, this is also potentially mitigated by account-tag, although I don't know of anyone who is currently using account-tag to identify the sources of messages to users.)

* clarify that only successful CHATHISTORY commands get a BATCH
* correct some references to `event-playback` to read `draft/event-playback`
@slingamn
Copy link
Contributor

Added a very hesitant line about rewriting:

With direct message history, servers may wish to rewrite the sources or targets of messages to correspond to the current nicknames of the two users.

@slingamn
Copy link
Contributor

slingamn commented Nov 3, 2020

I think we should merge this.

@jwheare
Copy link
Member

jwheare commented Nov 3, 2020

Happy to merge this if all contributors are happy. A few non-blocking thoughts:

  • Is there a reason TAGMSG is only included with event-playback?
  • Would e.g. chathistory-events be a better name than event-playback?
  • Is there scope for a way to indicate that you've reached the end of history either forward or backwards, or are we content for clients to just keep requesting more until they stop seeing new messages?

@slingamn
Copy link
Contributor

slingamn commented Nov 3, 2020

Is there a reason TAGMSG is only included with event-playback?

I'm not sure. #362 and #293 assume that by default, only PRIVMSG and NOTICE are suitable for inclusion in a chathistory batch.

It's possible that the default/baseline assumption about TAGMSG is that they may produce a state change in the client, hence this caution?

Is there scope for a way to indicate that you've reached the end of history either forward or backwards, or are we content for clients to just keep requesting more until they stop seeing new messages?

Right now you get an empty batch. This seems fine to me on its own terms, but it's the same response you get for a permissions error or a nonexistent target, which might be the result of an excess of caution on security/privacy issues.

@jwheare
Copy link
Member

jwheare commented Nov 3, 2020

There's no risk of TAGMSG causing a state related desync, they are by design lossy, so can't communicate important consistent state. #293 looks too old to consider TAGMSG, and no one in #362 never provided a rationale.

Re: empty batch, I'm thinking more if the end comes half way through a reply, you have to make another request to confirm it's the end (with an empty batch). Seems minorly wasteful, but might not be a significant issue.

@emersion
Copy link
Contributor

emersion commented Nov 3, 2020

Re: empty batch, I'm thinking more if the end comes half way through a reply, you have to make another request to confirm it's the end (with an empty batch). Seems minorly wasteful, but might not be a significant issue.

You can check if the number of returned messages == the limit the client sent, and only request more if that's the case. So the empty batch should only happen when there are as many remaining messages as the limit.

@jwheare
Copy link
Member

jwheare commented Nov 3, 2020

Yes, it's just that's a heuristic rather than something explicit. I think I'd prefer the latter, but it's not a blocker for merge.

@jwheare
Copy link
Member

jwheare commented Nov 3, 2020

To give a concrete hypothetical of what something explicit might be, a tag on the BATCH might work:

@chathistory-end :irc.host BATCH +ID chathistory #channel

I prefer to avoid positional params, for extensibility purposes, but a param is another option that would support clients without tags.

e.g.

:irc.host BATCH +ID chathistory #channel end

@jwheare
Copy link
Member

jwheare commented Nov 3, 2020

An additional comment – probably more of a blocker in the absence of any rationale – I'd upgrade these SHOULDs to MUSTs:

The type of the batch SHOULD be chathistory, and it SHOULD take a single additional parameter, the canonical name of the target being queried

@slingamn
Copy link
Contributor

slingamn commented Nov 3, 2020

I'm not that enthusiastic about adding a "no more messages" indicator.

I think I might like to add a new FAIL code (perhaps INVALID_TARGET) covering all the cases where either the target does not exist or the client doesn't have permissions to query it. I don't think this would add any security concerns?

@emersion
Copy link
Contributor

emersion commented Nov 3, 2020

I think I might like to add a new FAIL code (perhaps INVALID_TARGET) covering all the cases where either the target does not exist or the client doesn't have permissions to query it. I don't think this would add any security concerns?

+1

@slingamn
Copy link
Contributor

slingamn commented Nov 8, 2020

Let's merge this?

@jwheare
Copy link
Member

jwheare commented Nov 8, 2020

Sure, can do tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants