Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SEARCH #496

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add SEARCH #496

wants to merge 1 commit into from

Conversation

delthas
Copy link
Contributor

@delthas delthas commented Apr 15, 2022

Overview

This spec enables server-side search of PRIVMSG/NOTICE from clients to (generally) bouncers.

Typical use case from my IRC client:

  • I am in channel #alice
  • I can't quite remember Alice's birthday, but I know we talked about that at some point, perhaps weeks or months ago
  • Instead of SSH-ing to ZNC/soju, and grepping logs, I simply do, from my client:
  • /SEARCH birthday
  • A list of messages matching birthday appears. At a glance, I quickly find <bob> isn't alice's birthday on May 5th?

For modern GUI clients, this could typically be used by a search field (see eg Discord, Telegram).

It reuses the existing logic of CHATHISTORY and message-tags etc and is designed to do one thing well: server-side search and that's it. If you want additional message context, you could do CHATHISTORY BETWEEN on the message results.

Known implementations

Servers

  • soju

Clients

  • senpai

Copy link
Contributor

@progval progval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

Lots of comments below


The order of returned messages within the batch is implementation-defined, but SHOULD be ascending time order or some approximation thereof, regardless of the subcommand used. The server-time tag on each message SHOULD be the time at which the message was received by the IRC server. When provided, the msgid tag that identifies each individual message in a response MUST be the msgid tag as originally sent by the IRC server.

Servers SHOULD provide clients with a consistent message order that is valid across the lifetime of a single connection, and which determinately orders any two messages (even if they share a timestamp). This order SHOULD coincide with the order in which messages are returned within a response batch. It need not coincide with the delivery order of messages when they were relayed on any particular server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention three orders here: 1. ???, 2. in response batch 3. delivery; and say 1 and 2 should coincide. But I don't understand what this first order is. Is this something internal to the IRCd?


SEARCH <attributes>

The `attributes` parameter is a dictionary of attributes keys and values, formatted according to the [`message-tags`](message-tags.md) specification (without the leading `@`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/according to/like message tags, as defined in/

I would have gone with the ISUPPORT format instead of message-tags format; because the ISUPPORT format is already used in parameters; but ok.


This specification introduces the `SEARCH` commmand.

## Descriptiob
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo


If the batch capability was negotiated, the server MUST reply to a successful SEARCH command using a batch with batch type `search`. If no content exists to return, the server SHOULD return an empty batch in order to avoid the client waiting for a reply.

The server then replies with a batch of batch type `search` containing messages matching all the specified match attributes. These messages MUST be `PRIVMSG` or `NOTICE` messages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it a MUST instead of a SHOULD? No TAGMSG?


Additionally, the following attributes MUST be recognized:
* `limit`: a number representing an upper bound on the count of messages to return. The server MAY return fewer messages than this number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should allow vendors to define their own vendor-prefixed extensions; as well as a mechanism to list supported attributes.

* `from`: the message was sent with this nick.
* `after`: the message was sent at or after this time (same format as the [`server-time`](server-time.md) specification).
* `before`: the message was sent at or before this time (same format as the [`server-time`](server-time.md) specification).
* `text`: the message text matches the specified text. The actual algorithm used for matching the text is implementation defined.
Copy link
Contributor

@progval progval Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual algorithm used for matching the text is implementation defined

IMO, not defining it at all makes this spec unusable by some clients. It should at least define whether:

  1. it matches full lines or just substrings by default
  2. .* vs * vs %

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible solution: standardize two different attributes:

  1. one with generic IRC matching (just * and ?) as a standard which is easy to translate to any backend
  2. one with advanced implementation-defined behavior

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand, I like the potential of specifying multiple sets of matching behavior that could be used - you could potentially imagine even more advanced pattern matching. However, in practice, would clients care about this format, or just end users? Additionally, I think it's worth considering that "viable" pattern types/matching algorithms may be highly dependent on backend (eg if you're using Postgres with a GIN index, MySQL FULLTEXT index, elastisearch, ...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why the matching algo is undefined is indeed that each message store implementation may have different capabilities and syntax. Users would be expected to type the search query, hence it's not a very big deal to leave it implementation-defined.

* `before`: the message was sent at or before this time (same format as the [`server-time`](server-time.md) specification).
* `text`: the message text matches the specified text. The actual algorithm used for matching the text is implementation defined.

If `after` is specified, messages SHOULD be searched from that time. Otherwise, messages SHOULD be searched from the `before` time, which defaults to the current server time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this sentence mean? That before should be ignored if after is given?

Comment on lines +100 to +122
~~~~
[c] SEARCH from=jackie;in=#chan
[s] :irc.host BATCH +ID search
[s] @batch=ID;msgid=1234;time=2019-01-04T14:33:26.123Z :jackie!indent@host PRIVMSG #chan :Be what you want
[s] @batch=ID;msgid=1234;time=2019-01-04T14:35:26.123Z :jackie!indent@host PRIVMSG #chan :Want what you be
[s] :irc.host BATCH -ID
~~~~

Searching messages matching the text `fast` in `#chan`, returning up to 2 messages
~~~~
[c] SEARCH text=fast;in=#chan;limit=2
[s] :irc.host BATCH +ID search
[s] @batch=ID;msgid=1234;time=2019-01-04T14:33:26.123Z :bill!indent@host PRIVMSG #chan :That was fast!
[s] @batch=ID;msgid=1234;time=2019-01-04T14:35:26.123Z :jackie!indent@host PRIVMSG #chan :Fasting is hard.
[s] :irc.host BATCH -ID
~~~~

Searching messages when none match
~~~~
[c] SEARCH before=2010-01-01T00:00:00.000Z;in=#chan
[s] :irc.host BATCH +ID search
[s] :irc.host BATCH -ID
~~~~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the same format for example blocks as other specs; and replace irc.host with an invalid or documentation domain.

You should also fix the duplicate msgids.

@slingamn slingamn mentioned this pull request Nov 21, 2022
8 tasks

### Returned message notes

The order of returned messages within the batch is implementation-defined, but SHOULD be ascending time order or some approximation thereof, regardless of the subcommand used. The server-time tag on each message SHOULD be the time at which the message was received by the IRC server. When provided, the msgid tag that identifies each individual message in a response MUST be the msgid tag as originally sent by the IRC server.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about supporting specification of a specific ordering to return? (Oldest, newest, most relevant, etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants