-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SEARCH #496
base: master
Are you sure you want to change the base?
Add SEARCH #496
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
--- | ||
title: Search | ||
layout: spec | ||
work-in-progress: true | ||
copyrights: | ||
- | ||
name: "delthas" | ||
email: "delthas@dille.cc" | ||
period: "2022" | ||
--- | ||
|
||
## Notes for implementing work-in-progress version | ||
|
||
This is a work-in-progress specification. | ||
|
||
Software implementing this work-in-progress specification MUST NOT use the | ||
unprefixed `search` capability name. Instead, implementations SHOULD | ||
use the `draft/search` capability name to be interoperable with other | ||
software implementing a compatible work-in-progress version. | ||
|
||
The final version of the specification will use an unprefixed capability name. | ||
|
||
## Introduction | ||
|
||
This feature is intended to clients to search server-side through messages that were previously sent. Server-side search enables clients to quickly find messages rather than downloading all the history from their server. | ||
|
||
It is *not* a goal of this specification to offer context around matching messages, as this is covered by the [`chathistory`](chathistory.md) specification. | ||
|
||
The server as mentioned in this document may refer to either an IRC server or an IRC bouncer. | ||
|
||
Full support for this extension requires support for the batch, server-time and message-tags capabilities. However, limited functionality is available to clients without support for these CAPs. Servers SHOULD NOT enforce that clients support all related capabilities before using the search extension. | ||
|
||
## Architecture | ||
|
||
### Capabilities | ||
|
||
This specification introduces the `search` capability. This capability advertises to clients that the `SEARCH` command is available. | ||
|
||
The `search` capability MUST be negotiated. | ||
|
||
### Batch type | ||
|
||
This specification introduces the `search` batch type. This batch type is used when listing matching messages of a `SEARCH` command. | ||
|
||
### Messages | ||
|
||
This specification introduces the `SEARCH` commmand. | ||
|
||
## Descriptiob | ||
|
||
Clients can request a message search by sending the `SEARCH` command to the server. | ||
|
||
This command has the following general syntax: | ||
|
||
SEARCH <attributes> | ||
|
||
The `attributes` parameter is a dictionary of attributes keys and values, formatted according to the [`message-tags`](message-tags.md) specification (without the leading `@`). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/according to/like message tags, as defined in/ I would have gone with the ISUPPORT format instead of message-tags format; because the ISUPPORT format is already used in parameters; but ok. |
||
|
||
If the batch capability was negotiated, the server MUST reply to a successful SEARCH command using a batch with batch type `search`. If no content exists to return, the server SHOULD return an empty batch in order to avoid the client waiting for a reply. | ||
|
||
The server then replies with a batch of batch type `search` containing messages matching all the specified match attributes. These messages MUST be `PRIVMSG` or `NOTICE` messages. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is it a MUST instead of a SHOULD? No TAGMSG? |
||
|
||
### Returned message notes | ||
|
||
The order of returned messages within the batch is implementation-defined, but SHOULD be ascending time order or some approximation thereof, regardless of the subcommand used. The server-time tag on each message SHOULD be the time at which the message was received by the IRC server. When provided, the msgid tag that identifies each individual message in a response MUST be the msgid tag as originally sent by the IRC server. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about supporting specification of a specific ordering to return? (Oldest, newest, most relevant, etc) |
||
|
||
Servers SHOULD provide clients with a consistent message order that is valid across the lifetime of a single connection, and which determinately orders any two messages (even if they share a timestamp). This order SHOULD coincide with the order in which messages are returned within a response batch. It need not coincide with the delivery order of messages when they were relayed on any particular server. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You mention three orders here: 1. ???, 2. in response batch 3. delivery; and say 1 and 2 should coincide. But I don't understand what this first order is. Is this something internal to the IRCd? |
||
|
||
### Errors and Warnings | ||
|
||
Errors are returned using the standard replies syntax. | ||
|
||
If the selectors were invalid, the `INVALID_PARAMS` error code SHOULD be returned. | ||
|
||
FAIL SEARCH INVALID_PARAMS [invalid_parameters] :Invalid parameters | ||
|
||
If the search cannot be run due to an internal error, the `INTERNAL_ERROR` error code SHOULD be returned. | ||
|
||
FAIL SEARCH INTERNAL_ERROR [extra_context] :The search could not be run | ||
|
||
### Standard search attributes | ||
|
||
Servers MUST recognise the following attributes. | ||
|
||
The following match attributes are considered a match when: | ||
* `in`: the message was sent to this target (channel or user). | ||
* `from`: the message was sent with this nick. | ||
* `after`: the message was sent at or after this time (same format as the [`server-time`](server-time.md) specification). | ||
* `before`: the message was sent at or before this time (same format as the [`server-time`](server-time.md) specification). | ||
* `text`: the message text matches the specified text. The actual algorithm used for matching the text is implementation defined. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
IMO, not defining it at all makes this spec unusable by some clients. It should at least define whether:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A possible solution: standardize two different attributes:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On one hand, I like the potential of specifying multiple sets of matching behavior that could be used - you could potentially imagine even more advanced pattern matching. However, in practice, would clients care about this format, or just end users? Additionally, I think it's worth considering that "viable" pattern types/matching algorithms may be highly dependent on backend (eg if you're using Postgres with a GIN index, MySQL FULLTEXT index, elastisearch, ...) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason why the matching algo is undefined is indeed that each message store implementation may have different capabilities and syntax. Users would be expected to type the search query, hence it's not a very big deal to leave it implementation-defined. |
||
|
||
If `after` is specified, messages SHOULD be searched from that time. Otherwise, messages SHOULD be searched from the `before` time, which defaults to the current server time. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this sentence mean? That |
||
|
||
Additionally, the following attributes MUST be recognized: | ||
* `limit`: a number representing an upper bound on the count of messages to return. The server MAY return fewer messages than this number. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should allow vendors to define their own vendor-prefixed extensions; as well as a mechanism to list supported attributes. |
||
### Examples | ||
|
||
Searching messages sent by `jackie` in `#chan` | ||
~~~~ | ||
[c] SEARCH from=jackie;in=#chan | ||
[s] :irc.host BATCH +ID search | ||
[s] @batch=ID;msgid=1234;time=2019-01-04T14:33:26.123Z :jackie!indent@host PRIVMSG #chan :Be what you want | ||
[s] @batch=ID;msgid=1234;time=2019-01-04T14:35:26.123Z :jackie!indent@host PRIVMSG #chan :Want what you be | ||
[s] :irc.host BATCH -ID | ||
~~~~ | ||
|
||
Searching messages matching the text `fast` in `#chan`, returning up to 2 messages | ||
~~~~ | ||
[c] SEARCH text=fast;in=#chan;limit=2 | ||
[s] :irc.host BATCH +ID search | ||
[s] @batch=ID;msgid=1234;time=2019-01-04T14:33:26.123Z :bill!indent@host PRIVMSG #chan :That was fast! | ||
[s] @batch=ID;msgid=1234;time=2019-01-04T14:35:26.123Z :jackie!indent@host PRIVMSG #chan :Fasting is hard. | ||
[s] :irc.host BATCH -ID | ||
~~~~ | ||
|
||
Searching messages when none match | ||
~~~~ | ||
[c] SEARCH before=2010-01-01T00:00:00.000Z;in=#chan | ||
[s] :irc.host BATCH +ID search | ||
[s] :irc.host BATCH -ID | ||
~~~~ | ||
Comment on lines
+100
to
+122
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please use the same format for example blocks as other specs; and replace You should also fix the duplicate msgids. |
||
|
||
## Implementation Considerations | ||
|
||
Server implementations may use various algorithms for matching messages against the specified `text`. Some implementation may choose to match by substrings, by whole words, or by other algorithms such as what is offered by their database (e.g. SQLite full-text search). The comparison may be case-insensitive or case-sensitive. | ||
|
||
## Security Considerations | ||
|
||
Processing logs can be slow. Servers offering this feature should implement a timeout on their total request time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo