Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Aggregations (experimental) #1633

Closed
wants to merge 3 commits into from
Closed

Aggregations (experimental) #1633

wants to merge 3 commits into from

Conversation

pik
Copy link
Contributor

@pik pik commented Nov 19, 2016

Matrix is currently missing an implementation/spec for processing Aggregation type events.

This includes some functionality necessary and expected by users from a chat client, such as being able to edit a message or add a reaction a to a message sent by another user.

While it's possible to implement said functionality as direct and hard-coded endpoints, this would have a number of drawbacks:

  • Specs are hard to alter without breaking things, the more exact the standard, the less maleable to future needs
  • Specing hard-coded suppport for a function such as 'message edit' does not provide anything for users wishing to have richer client experiences (emoji reactions)
  • Specing hard-coded support for a particular kind of aggregation is restrictive and still provides nothing for users of Matrix/Synapse as a federate-distributed message log rather than as a chat client.

This is a stab at an open-ended aggregation approach:

  • Experimental for soliciting feedback (not intended to be merged in current state).
  • Only works with postgresql backend atm because of different semantics between sqlite3 and postgresql JSON operations.

Currently the following is functional:

  • A room creator can POST an aggregation_spec to /room/(?P<room_id>[^/]+)/aggregation$
  • Synapse will process aggregation type events 'm.room._aggregation' in the background and fill in information for the target_id in the aggregation_entries table.
  • append and replace type operations are supported e.g. emoticon support to a room could be added by
    the room creator POSTing the following:
emoticon_aggregation_spec = {
    'constraints': [],
    'aggregation_event_name': 'm.room._aggregation.emoticon',
    'aggregation_field_names': ['emoticon'],
    'aggregation_type': 'append',
    'aggregation_event_schema': {
        'type': 'object',
        'properties': {
            'emoticon': { 'type': 'string' },
            'msgtype': { 'type': 'string' },
            'target_id':  { 'type': 'string' }
        },
        'required': ['emoticon', 'msgtype', 'target_id'],
        'additionalProperties': False
    }
}

This in turn would cause Synapse to run background_updates which will process 'm.room._aggregation.emoticon' type events and append them to an array in the aggregate_entries table e.g.:

target_id        | $14794019940aHDNb:pik-test
room_id          | !QKscJgkpWveOZhvGME:pik-test
event_name       | m.room._aggregation.emoticon
latest_event_id  | $14794147870yHlCA:pik-test
aggregation_data | ["{\"emoticon\": \"::smile::\", \"event_id\": \"$14794020481Zmmku:pik-test\", \"sender\": \"@pik:pik-test\"}", "{\"emoticon\": \"::flowers::\", \"event_id\": \"$14794022480fdPPD:pik-test\", \"sender\": \"@pik:pik-test\"}"]
  • When retrieving events for a client Synapse LEFT JOINS on the aggregation_entries table and send a single bulk entry for each target_id e.g.
In [354]: cli.api._send("GET", "/events/%s" % '$14796538688CLcYU:pik-test')
Out[354]: 
{'age': 6149350,
 'aggregation_data': {'m.room._aggregation.emoticon': {'aggregation_data': ['{"emoticon": "::smile::", "event_id": "$14796538949JTYis:pik-test", "sender": "@pik:pik-test"}'],
   'latest_event_id': '$14796538949JTYis:pik-test'}},
 'content': {'body': 'hello world', 'msgtype': 'm.text'},
 'event_id': '$14796538688CLcYU:pik-test',
 'origin_server_ts': 1479653868980,
 'room_id': '!DNrCCfWAYShGxMhnzw:pik-test',
 'sender': '@pik:pik-test',
 'type': 'm.room.message',
 'unsigned': {'age': 6149350},
 'user_id': '@pik:pik-test'}

TODO

Clean Synapse API Implementation

There are a number of approaches to returning 'aggregated' events depending on whether they are pruned from the events table or not in the database. While the client when receiving a set of bulk events for a target_id will always have a latest_event_id used in the aggregation and would know to ignore events prior to this, it would be preferable to not send those events at all to the client after they have been aggregated. This should mean either filtering the retrieved event array (e.g comparing to a cache of already known aggregated values) or pruning the aggregated events from the events table in the db entirely.

Emoticon / Edit support in vector-web (riot) client.

This is a little bit complicated because the client should support both receiving bulked (aggregated entries) and singular events (for singular events it should imitate the server aggregation strategy).

@matrixbot
Copy link
Member

Can one of the admins verify this patch?

4 similar comments
@matrixbot
Copy link
Member

Can one of the admins verify this patch?

@matrixbot
Copy link
Member

Can one of the admins verify this patch?

@matrixbot
Copy link
Member

Can one of the admins verify this patch?

@matrixbot
Copy link
Member

Can one of the admins verify this patch?

 * API now returns aggregation_data
 * removed unused code
 * Note that aggregation_entries is now one per (event_type, target_id) rather than
   one per (target_id) - so they are manually merged server-side to yield a single
   aggregation_data blob on the event.
 * TODO: make aggregation code less invasive on Synapse structure
@Half-Shot
Copy link
Collaborator

Half-Shot commented Nov 21, 2016

I'm quite excited for a feature like this, as it also fits my needs for counting various properties of messages.

However, could you make this into a google doc or something as a spec doc. It feels too cramped to be stuck on a PR. Especially since this is a spec change and really needs to be reviewed in matrix-doc first.

@AndrewJDR
Copy link

Very excited for this as well. Also referencing in the riot issue for editable messages:
element-hq/element-web#1862

@ara4n
Copy link
Member

ara4n commented Dec 5, 2016

(just wanted to confirm that properly digesting this is on our radar - @pik: thanks for the significant work here!)

@AndrewJDR
Copy link

Some other example usages of this when bridges are involved are below. Please correct me if these would be inappropriate usages of the aggregations api... I'm trying to understand the different ways they can be used:

With bridges, when a real matrix user sends a message into a bridged room, it would be nice to be able to specify to the user if a specific message failed to be delivered to the bridged network with a red X icon next to the message in the client.

Another example -- We have an iMessage bridge. When an outside iMessage user sends a message and it passes to me through the matrix bridge, it would be nice to know whether the sender had used SMS or iMessage to send the message.

@ara4n
Copy link
Member

ara4n commented Dec 13, 2016

hi @pik - this continues to be on our radar. I think the best way to proceed is going to be to work on the proposal change to the spec first, and then evolve the implementation as required. The workflow for this is to write a google doc proposal as per the ones hiding in https://drive.google.com/drive/u/0/folders/0B4wHq8qP86r2ck15MHEwMmlNVUk, and then we'll be able to review it properly there. We use google rather than github for this as the review semantics (both commenting and 'suggesting') are quite a lot richer.

@ara4n
Copy link
Member

ara4n commented Dec 20, 2016

@pik: i've finally had a chance to properly read through this (the requirement for being able to retrospectively attach metadata to events is getting ever more urgent, especially for being able to upvote/downvote messages for moderation purposes).

Initial thoughts are:

  • We should still move it to a googledoc if you're okay doing so. I'll comment here until then however.
  • The idea of being able to define a freeform schema for the aggregation metadata is interesting. We don't let users define schemas for events anywhere else; they're always defined out-of-band (i.e. in the swagger spec). What is the advantage of defining a schema here?
  • How does it actually sum the aggregates? O:-) In other words, if 100K users click 'like' on an event, do we then get 100K blobs in the aggregation_data field? I'd expect to receive the sum instead.
  • How does it handle federation?

The approach of using separate tables to track the aggregations feels right overall however, and this is definitely a step in the right direction. I'd be happy to help try to steer this through with you to get it merged!

@pik
Copy link
Contributor Author

pik commented Dec 25, 2016

We should still move it to a googledoc if you're okay doing so.

Apologies for the delay on the spec part. Here is a link to a preliminary Google Doc: https://docs.google.com/document/d/1CnNbYSSea0KcyhEI6-rB8R8u6DCZyZv-Pv4hhoXJHSE/edit?usp=sharing

Since a lot of choices made in this PR (implementation wise) depend on some specification choices I didn't go into exact detail of implementation pending agreement on the Spec more generally (they can be fleshed out later).

How does it actually sum the aggregates? O:-) In other words, if 100K users click 'like' on an event, do we then get 100K blobs in the aggregation_data field? I'd expect to receive the sum instead.

If Slack style emoji's are desired e.g. Matthew, Pik reacted with 🎆 -- than I guess append would be the only way to do it, depending on the application they might not care for those details though, in which case client-side aggregation or support for more conditional operators (I added the discussion to the google doc under Conditional Logic & Aggregation Logic).

The idea of being able to define a freeform schema for the aggregation metadata is interesting. We don't let users define schemas for events anywhere else; they're always defined out-of-band (i.e. in the swagger spec). What is the advantage of defining a schema here?

How does it handle federation?

Let's pick these up in Google Doc discussion/comments?

@AndrewJDR
Copy link

Just curious if there's been any forward progress on this?

@markwooff
Copy link

Just curious if there's been any progress or updates with this?

@richvdh
Copy link
Member

richvdh commented Mar 23, 2018

For the record: complicated things like aggregations are currently shelved in favour of getting stability on the features we have, rather than adding new ones.

[Sadly I gather pik is no longer with us. I very much hope that one day we can pick this up again and give his work the attention it deserved.]

@ara4n
Copy link
Member

ara4n commented May 29, 2019

we now have aggregations via #5186, which took some inspiration from @pik's work here (as per the notes in https://github.com/matrix-org/matrix-doc/blob/matthew/msc1849/proposals/1849-aggregations.md), so i'm going to close this. pik: thanks again for setting the ball rolling on this, and I'm really sorry that we didn't get to work on this sooner, and that you're not here to see the conclusion :(

@ara4n ara4n closed this May 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants