Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Project: Chunking #3785

Closed
hawkowl opened this issue Sep 3, 2018 · 4 comments
Closed

Project: Chunking #3785

hawkowl opened this issue Sep 3, 2018 · 4 comments
Labels
A-Performance Performance, both client-facing and admin-facing

Comments

@ara4n
Copy link
Member

ara4n commented Mar 3, 2019

this would/will hopefully fix #1760 (i think), as there will be fewer extremities to resolve if they are hidden inside chunks?

@ShadowJonathan
Copy link
Contributor

@anoadragon453 Amber hasn't worked on python for about a year (burnout), the related issues are closed, and this meta-issue doesn't have any other context as far as i can see, can this be closed?

@anoadragon453
Copy link
Member

The historical state is useful, but indeed I don't think we benefit from having this open currently.

If we decide to work on this again, a new issue with more information about what chunking actually is (tracking connected chunks of the DAG) and what needs to be done would be useful.

@ShadowJonathan
Copy link
Contributor

Ahhh, so thats what it is! I was confused as to what "chunking" in this context actually meant, if it's compression of long stretches of DAG into just a few database entries, then thats indeed an interesting venue.

I don't have any interest in this at the moment, but i'll remember this idea for other setups and optimisations, possibly.

MadLittleMods added a commit that referenced this issue Feb 7, 2022
…vers (MSC2716) (#11114)

Fix #11091
Fix #10764 (side-stepping the issue because we no longer have to deal with `fake_prev_event_id`)

 1. Made the `/backfill` response return messages in `(depth, stream_ordering)` order (previously only sorted by `depth`)
    - Technically, it shouldn't really matter how `/backfill` returns things but I'm just trying to make the `stream_ordering` a little more consistent from the origin to the remote homeservers in order to get the order of messages from `/messages` consistent ([sorted by `(topological_ordering, stream_ordering)`](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)).
    - Even now that we return backfilled messages in order, it still doesn't guarantee the same `stream_ordering` (and more importantly the [`/messages` order](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)) on the other server. For example, if a room has a bunch of history imported and someone visits a permalink to a historical message back in time, their homeserver will skip over the historical messages in between and insert the permalink as the next message in the `stream_order` and totally throw off the sort.
       - This will be even more the case when we add the [MSC3030 jump to date API endpoint](matrix-org/matrix-spec-proposals#3030) so the static archives can navigate and jump to a certain date.
       - We're solving this in the future by switching to [online topological ordering](matrix-org/gomatrixserverlib#187) and [chunking](#3785) which by its nature will apply retroactively to fix any inconsistencies introduced by people permalinking
 2. As we're navigating `prev_events` to return in `/backfill`, we order by `depth` first (newest -> oldest) and now also tie-break based on the `stream_ordering` (newest -> oldest). This is technically important because MSC2716 inserts a bunch of historical messages at the same `depth` so it's best to be prescriptive about which ones we should process first. In reality, I think the code already looped over the historical messages as expected because the database is already in order.
 3. Making the historical state chain and historical event chain float on their own by having no `prev_events` instead of a fake `prev_event` which caused backfill to get clogged with an unresolvable event. Fixes #11091 and #10764
 4. We no longer find connected insertion events by finding a potential `prev_event` connection to the current event we're iterating over. We now solely rely on marker events which when processed, add the insertion event as an extremity and the federating homeserver can ask about it when time calls.
    - Related discussion, #11114 (comment)


Before | After
--- | ---
![](https://user-images.githubusercontent.com/558581/139218681-b465c862-5c49-4702-a59e-466733b0cf45.png) | ![](https://user-images.githubusercontent.com/558581/146453159-a1609e0a-8324-439d-ae44-e4bce43ac6d1.png)



#### Why aren't we sorting topologically when receiving backfill events?

> The main reason we're going to opt to not sort topologically when receiving backfill events is because it's probably best to do whatever is easiest to make it just work. People will probably have opinions once they look at [MSC2716](matrix-org/matrix-spec-proposals#2716) which could change whatever implementation anyway.
> 
> As mentioned, ideally we would do this but code necessary to make the fake edges but it gets confusing and gives an impression of “just whyyyy” (feels icky). This problem also dissolves with online topological ordering.
>
> -- #11114 (comment)

See #11114 (comment) for the technical difficulties
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Performance Performance, both client-facing and admin-facing
Projects
None yet
Development

No branches or pull requests

5 participants