Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event Cache API 🕸 #3058

Open
2 of 10 tasks
bnjbvr opened this issue Jan 26, 2024 · 5 comments
Open
2 of 10 tasks

Event Cache API 🕸 #3058

bnjbvr opened this issue Jan 26, 2024 · 5 comments
Assignees

Comments

@bnjbvr
Copy link
Member

bnjbvr commented Jan 26, 2024

Context

After working for a bit on the Rust SDK, I've observed it seems there's a missing layer between the high-level Timeline in the UI crate, and the Rust SDK itself. In addition to rendering the events themselves, the Timeline is involved in a lot of functionality:

  • it computes the read receipts displayed on each event in a room,
  • it runs backpagination, keeping track of the prev_batch tokens, re-spawning requests to /messages until it reaches a desired amount of messages, or the end of a timeline,
  • it retries decrypting events after new keys have been received, be it from the encryption sync loop, or from the backup storage

These extra responsibilities make the Timeline more a view-and-controller bundle, as opposed to a pure view over the events it receives. This tight coupling comes as an advantage (the code freely does what it needs when it needs to), but also it comes with a few drawbacks: it's harder to test and fix a logic bug in isolation without rendering a timeline, and all that functionality is not reusable by other clients who wouldn't want to render as a timeline.

Since rendering a timeline requires subscribing to the room it belongs to, and we have requirements for other features that don't involve a specific room, but the entire room list, some of that functionality is re-implemented elsewhere:

  • renewed attempts to decrypt still-encrypted events happen when computing the latest_event for a room, so as we have something to display in the room list,
  • the room unread markers (namely, the unread counts, # of new messages/notifications/mentions) are also needed for all the rooms, not only a room we'd be visiting (or the whole point of the unread marker would be lost)
  • in addition to that, while the latest events and unread counts are technically information derived (computed) from other events, it's serialized and stored into the database as part of the RoomInfo struct, into the state store.
  • the scattering of unread counts and read receipts has caused some impedance mismatch, exemplified as read receipts: handle implicit receipts for unread counts #3054 (the unread counts computation expected read markers even for our own events, while the timeline code handling read receipts took into account implicit read receipts)

The Proposal

Overall, my take is that all this kind of functionality is really nice to have, when writing any Matrix client. While it's possible that we could continue like we've done so far, I think that having a new abstraction to provide all this functionality, independently of a given observed room, would be super nice and useful. I call this new abstraction the Event Graph (API, if you insist).

The goals are the following:

  • make the Timeline (mostly) a view over the events observed via the Event Graph, for a clearer and cleaner separation of responsibilities
  • make the code more modular and easier to test in isolation, to improve maintainability
  • unify some functionality that's scattered in multiple places (notably automatic re-decryption, unread markers with read receipts)
  • make the Event Graph persistent (circumventing the need for a specific Timeline API cache, evoked in Tracking issue for timeline API #1103)
    • if a Timeline is just a view of the event graph, then a Timeline can easily be re-rendered from the cache
    • the storage of the event graph could be considered an on-disk cache, that we could remove/empty at will (the only consequence would be degrading the user experience by causing new backpagination for events we had before emptying the cache)
    • we could make it work across processes right from the beginning, allowing us to reuse events received from a notification sync into a main timeline
  • pave the way for future features, like full-text search

Unknowns

  • One thing to investigate is whether we could make the Event Graph have no in-memory caches of the on-disk data. This would remove the need for the cross-process lock altogether by considering the database as "the single source of truth", but it might have a bad impact on performance.
  • Tracking issue for timeline API #1103 contains a few interesting remarks and caveats about memory usage and having a timeline cache.
  • it's a bit of work to disentangle all the code in the timeline, and it's hard to estimate the time it'll take.

Drawbacks

  • it may cause a bit of churn by introducing new bugs, while we try to add new features at the same time
  • some problems may be challenging to solve, e.g. reconciling the cache with new sync events / backpagination. (@Hywan and I dare to call this "fun".)

TL;DR

This is a proposal to weed out some functionality out of the timeline and matrix-sdk-base client into a new Event Graph API, to make it simpler to add new features in the future, that will benefit all the users of the Rust SDK, for Fractal, all the Element(X) apps, and all the users of the SDK.

Subprojects

  • initial implementation to get a sense how components would fit together
  • move backpagination support into the event graph (EG), so EG is aware of all events within a room
    • implement a basic reconciliation of events; don't handle separate connected components of events yet
  • (independently of the above) add a cache for room events, and get rid of the SlidingSyncRoom::timeline_queue
  • move read receipts support into EG
  • introduce a new data structure RoomDerivedInfo (or any better name, really), that when updated notifies the RoomList to refresh the corresponding room item (we should probably wait on sdk: notify the room list of a room info change due to late decryption #3018 which introduces such a mechanism)
  • move unread counts into EG
    • expose those through RoomDerivedInfo
  • move attempts to re-decrypt into EG
    • move latest_event into RoomDerivedInfo
  • [meta] EventCache storage, story of a plan #3280
@bnjbvr
Copy link
Member Author

bnjbvr commented Jan 30, 2024

(moved to OP)

@ara4n
Copy link
Member

ara4n commented Feb 9, 2024

Finally read this through fully. fwiw, from my side, the overall idea is great. However (as per our chat on mon):

  • Please don't call it an 'event graph' API - we use this exclusively in Matrix to refer to the federation DAG (e.g. grep https://spec.matrix.org/unstable/client-server-api/ and friends for event graph)
  • A better name might be Event Store (or Event Cache, although cache overemphasizes transience?)
  • The idea of pushing the timeline API room-pagination logic down into a general-purpose "please do stuff with events from this room" seems very sane

@bnjbvr bnjbvr changed the title Event Graph API 🕸 Event Cache API 🕸 Feb 12, 2024
@bnjbvr
Copy link
Member Author

bnjbvr commented Feb 12, 2024

Event Cache is nice: it had been suggested to me privately by someone else, it also makes it clear that it can be cleared up at any time, "just" resulting in a lesser user experience. Also I haven't had any other suggestions, and I'd like to move on with the renaming, so let's settle on EventCache at the moment 😁

Hywan added a commit to Hywan/matrix-rust-sdk that referenced this issue Mar 14, 2024
This patch is a work-in-progress. It explores an experimental data
structure to store events in an efficient way.

Note: in this comment, I will use the term _store_ to mean _database_
or _storage_.

The biggest constraint is the following: events can be ordered in
multiple ways, either topological order, or sync order. The problem is
that, when syncing events (with `/sync`), or when fetching events (with
`/messages`), we **don't know** how to order the newly received events
compared to the already downloaded events. A reconciliation algorithm
must be written (see matrix-org#3058). However, from the “storage” point of view,
events must be read, written and re-ordered efficiently.

The simplest approach would be to use an `order_index` for example.
Every time a new event is inserted, it uses the position of the last
event, increments it by one, and done.

However, inserting a new event in _the middle_ of existing events would
shift all events on one side of the insertion point: given `a`, `b`,
`c`, `d`, `e`, `f` with `f` being the most recent event, if `g` needs
to be inserted between `b` and `c`, then `c`, `d`, `e`, `f`'s ordering
positions need to be shifted. That's not optimal at all as it would
imply a lot of updates in the store.

Example of a relational database:

| ordering_index | event |
|----------------|-------|
| 0              | `a`   |
| 1              | `b`   |
| 2              | `g`   |
| 3              | `c`   |
| …              | …     |

An insertion can be O(n), and it can happen more frequently than one
can think of. Let's imagine a permalink to an old message: the user
opens it, a couple of events are fetched (with `/messages`), and these
events must be inserted in the store, thus potentially shifting a lot of
existing events. Another example: Imagine the SDK has a search API for
events; as long as no search result is found, the SDK will back-paginate
until reaching the beginning of the room; every time there is a
back-pagination, a block of events will be inserted: there is more and
more events to shift at each back-pagination.

OK, let's forget the `order_index`. Let's use a linked list then? Each
event has a _link_ to the _previous_ and to the _next_ event.

Inserting an event would be at worst O(3) in this case: if the previous
event exists, it must be updated, if the next event exists, it must be
updated, finally, insert the new event.

Example with a relational database:

| previous | id      | event | next    |
|----------|---------|-------|---------|
| null     | `id(a)` | `a`   | `id(b)` |
| `id(a)`  | `id(b)` | `b`   | `id(c)` |
| `id(b)`  | `id(c)` | `c`   | null    |

This approach ensures a fast _writing_, but a terribly slow _reading_.
Indeed, reading N events require N queries in the store. Events aren't
contiguous in the store, and cannot be ordered by the database engine
(e.g. with `ORDER BY` for SQL-based database). So it really requires one
query per event. That's a no-go.

In the two scenarios above, another problem arises. How to represent
a gap? Indeed, when new events are synced (via `/sync`), sometimes the
response contains a `limited` flag, which means that the results are
_partial_.

Let's take the following example: the store contains `a`, `b`, `c`.
After a long offline period (during which the room has been pretty
active), a sync is started, which provides the following events: `x`,
`y`, `z` + the _limited_ flag. The app is killed and reopened later.
The event cache store will contain `a`, `b`, `c`, `x`, `y`, `z`. How
do we know that there is a hole/a gap between `c` and `x`? This is an
important information! When `z`, `y` and `x` are displayed, and the user
would like to scroll up, the SDK must know that it must back-paginate
before providing `c`, `b` and `a`.

So the data structure we use must also represent gaps. This information
is also crucial for the events reconciliation algorithm.

What about a mix between the two? Here is _Linked Chunk_.

A _linked chunk_ is like a linked list, except that each node is either
a _Gap_ or an _Items_. A _Gap_ contains nothing, it's just a gap. An
_Items_ contains _several_ events. A node is called a _Chunk_. A _chunk_
has a maximum size, which is called a _capacity_. When a chunk is full,
a new chunk is created and linked appropriately. Inside a chunk, an
ordering index is used to order events. At this point, it becomes a
trade-off the find the appropriate chunk size to balance the performance
between reading and writing. Nonetheless, if the chunk size is 50, then
reading events is 50 times more efficient with a linked chunk than with
a linked list, and writing events is at worst O(49), compare to the O(n
- 1) of the ordering index.

Example with a relational database. First table is `events`, second
table is `chunks`.

| chunk id | index | event |
|----------|-------|-------|
| `$0`     | 0     | `a`   |
| `$0`     | 1     | `b`   |
| `$0`     | 2     | `c`   |
| `$0`     | 3     | `d`   |
| `$2`     | 0     | `e`   |
| `$2`     | 1     | `f`   |
| `$2`     | 2     | `g`   |
| `$2`     | 3     | `h`   |

| chunk id | type  | previous | next |
|----------|-------|----------|------|
| `$0`     | items | null     | `$1` |
| `$1`     | gap   | `$0`     | `$2` |
| `$2`     | items | `$1`     | null |

Reading the last chunk consists of reading all events where the
`chunk_id` is `$2` for example, and contains events `e`, `f`, `g` and
`h`. We can sort them easily by using the `event_index` column. The
previous chunk is a gap. The previous chunk contains events `a`, `b`,
`c` and `d`.

Being able to read events by chunk clearly limit the amount of reading
and writing in the store. It is also close to what will be really done
in real life with this store. It also allows to represent gaps. We can
replace a gap by new chunk pretty easily with few writings.

A summary:

| Data structure | Reading           | Writing         |
|----------------|-------------------|-----------------|
| Ordering index | “O(1)”[^1] (fast) | O(n - 1) (slow) |
| Linked list    | O(n) (slow)       | O(3) (fast)     |
| Linked chunk   | O(n / capacity)   | O(capacity - 1) |

This patch contains a draft implementation of a linked chunk. It will
strictly only contain the required API for the `EventCache`, understand
it _is not_ designed as a generic data structure type.

[^1]: O(1) because it's simply one query to run; the database engine
does the sorting for us in a very efficient way, particularly if the
`ordering_index` is an unsigned integer.
Hywan added a commit to Hywan/matrix-rust-sdk that referenced this issue Mar 14, 2024
This patch is a work-in-progress. It explores an experimental data
structure to store events in an efficient way.

Note: in this comment, I will use the term _store_ to mean _database_
or _storage_.

The biggest constraint is the following: events can be ordered in
multiple ways, either topological order, or sync order. The problem is
that, when syncing events (with `/sync`), or when fetching events (with
`/messages`), we **don't know** how to order the newly received events
compared to the already downloaded events. A reconciliation algorithm
must be written (see matrix-org#3058). However, from the “storage” point of view,
events must be read, written and re-ordered efficiently.

The simplest approach would be to use an `order_index` for example.
Every time a new event is inserted, it uses the position of the last
event, increments it by one, and done.

However, inserting a new event in _the middle_ of existing events would
shift all events on one side of the insertion point: given `a`, `b`,
`c`, `d`, `e`, `f` with `f` being the most recent event, if `g` needs
to be inserted between `b` and `c`, then `c`, `d`, `e`, `f`'s ordering
positions need to be shifted. That's not optimal at all as it would
imply a lot of updates in the store.

Example of a relational database:

| ordering_index | event |
|----------------|-------|
| 0              | `a`   |
| 1              | `b`   |
| 2              | `g`   |
| 3              | `c`   |
| …              | …     |

An insertion can be O(n), and it can happen more frequently than one
can think of. Let's imagine a permalink to an old message: the user
opens it, a couple of events are fetched (with `/messages`), and these
events must be inserted in the store, thus potentially shifting a lot of
existing events. Another example: Imagine the SDK has a search API for
events; as long as no search result is found, the SDK will back-paginate
until reaching the beginning of the room; every time there is a
back-pagination, a block of events will be inserted: there is more and
more events to shift at each back-pagination.

OK, let's forget the `order_index`. Let's use a linked list then? Each
event has a _link_ to the _previous_ and to the _next_ event.

Inserting an event would be at worst O(3) in this case: if the previous
event exists, it must be updated, if the next event exists, it must be
updated, finally, insert the new event.

Example with a relational database:

| previous | id      | event | next    |
|----------|---------|-------|---------|
| null     | `id(a)` | `a`   | `id(b)` |
| `id(a)`  | `id(b)` | `b`   | `id(c)` |
| `id(b)`  | `id(c)` | `c`   | null    |

This approach ensures a fast _writing_, but a terribly slow _reading_.
Indeed, reading N events require N queries in the store. Events aren't
contiguous in the store, and cannot be ordered by the database engine
(e.g. with `ORDER BY` for SQL-based database). So it really requires one
query per event. That's a no-go.

In the two scenarios above, another problem arises. How to represent
a gap? Indeed, when new events are synced (via `/sync`), sometimes the
response contains a `limited` flag, which means that the results are
_partial_.

Let's take the following example: the store contains `a`, `b`, `c`.
After a long offline period (during which the room has been pretty
active), a sync is started, which provides the following events: `x`,
`y`, `z` + the _limited_ flag. The app is killed and reopened later.
The event cache store will contain `a`, `b`, `c`, `x`, `y`, `z`. How
do we know that there is a hole/a gap between `c` and `x`? This is an
important information! When `z`, `y` and `x` are displayed, and the user
would like to scroll up, the SDK must know that it must back-paginate
before providing `c`, `b` and `a`.

So the data structure we use must also represent gaps. This information
is also crucial for the events reconciliation algorithm.

What about a mix between the two? Here is _Linked Chunk_.

A _linked chunk_ is like a linked list, except that each node is either
a _Gap_ or an _Items_. A _Gap_ contains nothing, it's just a gap. An
_Items_ contains _several_ events. A node is called a _Chunk_. A _chunk_
has a maximum size, which is called a _capacity_. When a chunk is full,
a new chunk is created and linked appropriately. Inside a chunk, an
ordering index is used to order events. At this point, it becomes a
trade-off the find the appropriate chunk size to balance the performance
between reading and writing. Nonetheless, if the chunk size is 50, then
reading events is 50 times more efficient with a linked chunk than with
a linked list, and writing events is at worst O(49), compare to the O(n
- 1) of the ordering index.

Example with a relational database. First table is `events`, second
table is `chunks`.

| chunk id | index | event |
|----------|-------|-------|
| `$0`     | 0     | `a`   |
| `$0`     | 1     | `b`   |
| `$0`     | 2     | `c`   |
| `$0`     | 3     | `d`   |
| `$2`     | 0     | `e`   |
| `$2`     | 1     | `f`   |
| `$2`     | 2     | `g`   |
| `$2`     | 3     | `h`   |

| chunk id | type  | previous | next |
|----------|-------|----------|------|
| `$0`     | items | null     | `$1` |
| `$1`     | gap   | `$0`     | `$2` |
| `$2`     | items | `$1`     | null |

Reading the last chunk consists of reading all events where the
`chunk_id` is `$2` for example, and contains events `e`, `f`, `g` and
`h`. We can sort them easily by using the `event_index` column. The
previous chunk is a gap. The previous chunk contains events `a`, `b`,
`c` and `d`.

Being able to read events by chunk clearly limit the amount of reading
and writing in the store. It is also close to what will be really done
in real life with this store. It also allows to represent gaps. We can
replace a gap by new chunk pretty easily with few writings.

A summary:

| Data structure | Reading           | Writing         |
|----------------|-------------------|-----------------|
| Ordering index | “O(1)”[^1] (fast) | O(n - 1) (slow) |
| Linked list    | O(n) (slow)       | O(3) (fast)     |
| Linked chunk   | O(n / capacity)   | O(capacity - 1) |

This patch contains a draft implementation of a linked chunk. It will
strictly only contain the required API for the `EventCache`, understand
it _is not_ designed as a generic data structure type.

[^1]: O(1) because it's simply one query to run; the database engine
does the sorting for us in a very efficient way, particularly if the
`ordering_index` is an unsigned integer.
Hywan added a commit to Hywan/matrix-rust-sdk that referenced this issue Mar 14, 2024
This patch is a work-in-progress. It explores an experimental data
structure to store events in an efficient way.

Note: in this comment, I will use the term _store_ to mean _database_
or _storage_.

The biggest constraint is the following: events can be ordered in
multiple ways, either topological order, or sync order. The problem is
that, when syncing events (with `/sync`), or when fetching events (with
`/messages`), we **don't know** how to order the newly received events
compared to the already downloaded events. A reconciliation algorithm
must be written (see matrix-org#3058). However, from the “storage” point of view,
events must be read, written and re-ordered efficiently.

The simplest approach would be to use an `order_index` for example.
Every time a new event is inserted, it uses the position of the last
event, increments it by one, and done.

However, inserting a new event in _the middle_ of existing events would
shift all events on one side of the insertion point: given `a`, `b`,
`c`, `d`, `e`, `f` with `f` being the most recent event, if `g` needs
to be inserted between `b` and `c`, then `c`, `d`, `e`, `f`'s ordering
positions need to be shifted. That's not optimal at all as it would
imply a lot of updates in the store.

Example of a relational database:

| ordering_index | event |
|----------------|-------|
| 0              | `a`   |
| 1              | `b`   |
| 2              | `g`   |
| 3              | `c`   |
| …              | …     |

An insertion can be O(n), and it can happen more frequently than one
can think of. Let's imagine a permalink to an old message: the user
opens it, a couple of events are fetched (with `/messages`), and these
events must be inserted in the store, thus potentially shifting a lot of
existing events. Another example: Imagine the SDK has a search API for
events; as long as no search result is found, the SDK will back-paginate
until reaching the beginning of the room; every time there is a
back-pagination, a block of events will be inserted: there is more and
more events to shift at each back-pagination.

OK, let's forget the `order_index`. Let's use a linked list then? Each
event has a _link_ to the _previous_ and to the _next_ event.

Inserting an event would be at worst O(3) in this case: if the previous
event exists, it must be updated, if the next event exists, it must be
updated, finally, insert the new event.

Example with a relational database:

| previous | id      | event | next    |
|----------|---------|-------|---------|
| null     | `id(a)` | `a`   | `id(b)` |
| `id(a)`  | `id(b)` | `b`   | `id(c)` |
| `id(b)`  | `id(c)` | `c`   | null    |

This approach ensures a fast _writing_, but a terribly slow _reading_.
Indeed, reading N events require N queries in the store. Events aren't
contiguous in the store, and cannot be ordered by the database engine
(e.g. with `ORDER BY` for SQL-based database). So it really requires one
query per event. That's a no-go.

In the two scenarios above, another problem arises. How to represent
a gap? Indeed, when new events are synced (via `/sync`), sometimes the
response contains a `limited` flag, which means that the results are
_partial_.

Let's take the following example: the store contains `a`, `b`, `c`.
After a long offline period (during which the room has been pretty
active), a sync is started, which provides the following events: `x`,
`y`, `z` + the _limited_ flag. The app is killed and reopened later.
The event cache store will contain `a`, `b`, `c`, `x`, `y`, `z`. How
do we know that there is a hole/a gap between `c` and `x`? This is an
important information! When `z`, `y` and `x` are displayed, and the user
would like to scroll up, the SDK must know that it must back-paginate
before providing `c`, `b` and `a`.

So the data structure we use must also represent gaps. This information
is also crucial for the events reconciliation algorithm.

What about a mix between the two? Here is _Linked Chunk_.

A _linked chunk_ is like a linked list, except that each node is either
a _Gap_ or an _Items_. A _Gap_ contains nothing, it's just a gap. An
_Items_ contains _several_ events. A node is called a _Chunk_. A _chunk_
has a maximum size, which is called a _capacity_. When a chunk is full,
a new chunk is created and linked appropriately. Inside a chunk, an
ordering index is used to order events. At this point, it becomes a
trade-off the find the appropriate chunk size to balance the performance
between reading and writing. Nonetheless, if the chunk size is 50, then
reading events is 50 times more efficient with a linked chunk than with
a linked list, and writing events is at worst O(49), compare to the O(n
- 1) of the ordering index.

Example with a relational database. First table is `events`, second
table is `chunks`.

| chunk id | index | event |
|----------|-------|-------|
| `$0`     | 0     | `a`   |
| `$0`     | 1     | `b`   |
| `$0`     | 2     | `c`   |
| `$0`     | 3     | `d`   |
| `$2`     | 0     | `e`   |
| `$2`     | 1     | `f`   |
| `$2`     | 2     | `g`   |
| `$2`     | 3     | `h`   |

| chunk id | type  | previous | next |
|----------|-------|----------|------|
| `$0`     | items | null     | `$1` |
| `$1`     | gap   | `$0`     | `$2` |
| `$2`     | items | `$1`     | null |

Reading the last chunk consists of reading all events where the
`chunk_id` is `$2` for example, and contains events `e`, `f`, `g` and
`h`. We can sort them easily by using the `event_index` column. The
previous chunk is a gap. The previous chunk contains events `a`, `b`,
`c` and `d`.

Being able to read events by chunk clearly limit the amount of reading
and writing in the store. It is also close to what will be really done
in real life with this store. It also allows to represent gaps. We can
replace a gap by new chunk pretty easily with few writings.

A summary:

| Data structure | Reading           | Writing         |
|----------------|-------------------|-----------------|
| Ordering index | “O(1)”[^1] (fast) | O(n - 1) (slow) |
| Linked list    | O(n) (slow)       | O(3) (fast)     |
| Linked chunk   | O(n / capacity)   | O(capacity - 1) |

This patch contains a draft implementation of a linked chunk. It will
strictly only contain the required API for the `EventCache`, understand
it _is not_ designed as a generic data structure type.

[^1]: O(1) because it's simply one query to run; the database engine
does the sorting for us in a very efficient way, particularly if the
`ordering_index` is an unsigned integer.
Hywan added a commit to Hywan/matrix-rust-sdk that referenced this issue Mar 14, 2024
This patch is a work-in-progress. It explores an experimental data
structure to store events in an efficient way.

Note: in this comment, I will use the term _store_ to mean _database_
or _storage_.

The biggest constraint is the following: events can be ordered in
multiple ways, either topological order, or sync order. The problem is
that, when syncing events (with `/sync`), or when fetching events (with
`/messages`), we **don't know** how to order the newly received events
compared to the already downloaded events. A reconciliation algorithm
must be written (see matrix-org#3058). However, from the “storage” point of view,
events must be read, written and re-ordered efficiently.

The simplest approach would be to use an `order_index` for example.
Every time a new event is inserted, it uses the position of the last
event, increments it by one, and done.

However, inserting a new event in _the middle_ of existing events would
shift all events on one side of the insertion point: given `a`, `b`,
`c`, `d`, `e`, `f` with `f` being the most recent event, if `g` needs
to be inserted between `b` and `c`, then `c`, `d`, `e`, `f`'s ordering
positions need to be shifted. That's not optimal at all as it would
imply a lot of updates in the store.

Example of a relational database:

| ordering_index | event |
|----------------|-------|
| 0              | `a`   |
| 1              | `b`   |
| 2              | `g`   |
| 3              | `c`   |
| …              | …     |

An insertion can be O(n), and it can happen more frequently than one
can think of. Let's imagine a permalink to an old message: the user
opens it, a couple of events are fetched (with `/messages`), and these
events must be inserted in the store, thus potentially shifting a lot of
existing events. Another example: Imagine the SDK has a search API for
events; as long as no search result is found, the SDK will back-paginate
until reaching the beginning of the room; every time there is a
back-pagination, a block of events will be inserted: there is more and
more events to shift at each back-pagination.

OK, let's forget the `order_index`. Let's use a linked list then? Each
event has a _link_ to the _previous_ and to the _next_ event.

Inserting an event would be at worst O(3) in this case: if the previous
event exists, it must be updated, if the next event exists, it must be
updated, finally, insert the new event.

Example with a relational database:

| previous | id      | event | next    |
|----------|---------|-------|---------|
| null     | `id(a)` | `a`   | `id(b)` |
| `id(a)`  | `id(b)` | `b`   | `id(c)` |
| `id(b)`  | `id(c)` | `c`   | null    |

This approach ensures a fast _writing_, but a terribly slow _reading_.
Indeed, reading N events require N queries in the store. Events aren't
contiguous in the store, and cannot be ordered by the database engine
(e.g. with `ORDER BY` for SQL-based database). So it really requires one
query per event. That's a no-go.

In the two scenarios above, another problem arises. How to represent
a gap? Indeed, when new events are synced (via `/sync`), sometimes the
response contains a `limited` flag, which means that the results are
_partial_.

Let's take the following example: the store contains `a`, `b`, `c`.
After a long offline period (during which the room has been pretty
active), a sync is started, which provides the following events: `x`,
`y`, `z` + the _limited_ flag. The app is killed and reopened later.
The event cache store will contain `a`, `b`, `c`, `x`, `y`, `z`. How
do we know that there is a hole/a gap between `c` and `x`? This is an
important information! When `z`, `y` and `x` are displayed, and the user
would like to scroll up, the SDK must know that it must back-paginate
before providing `c`, `b` and `a`.

So the data structure we use must also represent gaps. This information
is also crucial for the events reconciliation algorithm.

What about a mix between the two? Here is _Linked Chunk_.

A _linked chunk_ is like a linked list, except that each node is either
a _Gap_ or an _Items_. A _Gap_ contains nothing, it's just a gap. An
_Items_ contains _several_ events. A node is called a _Chunk_. A _chunk_
has a maximum size, which is called a _capacity_. When a chunk is full,
a new chunk is created and linked appropriately. Inside a chunk, an
ordering index is used to order events. At this point, it becomes a
trade-off the find the appropriate chunk size to balance the performance
between reading and writing. Nonetheless, if the chunk size is 50, then
reading events is 50 times more efficient with a linked chunk than with
a linked list, and writing events is at worst O(49), compare to the O(n
- 1) of the ordering index.

Example with a relational database. First table is `events`, second
table is `chunks`.

| chunk id | index | event |
|----------|-------|-------|
| `$0`     | 0     | `a`   |
| `$0`     | 1     | `b`   |
| `$0`     | 2     | `c`   |
| `$0`     | 3     | `d`   |
| `$2`     | 0     | `e`   |
| `$2`     | 1     | `f`   |
| `$2`     | 2     | `g`   |
| `$2`     | 3     | `h`   |

| chunk id | type  | previous | next |
|----------|-------|----------|------|
| `$0`     | items | null     | `$1` |
| `$1`     | gap   | `$0`     | `$2` |
| `$2`     | items | `$1`     | null |

Reading the last chunk consists of reading all events where the
`chunk_id` is `$2` for example, and contains events `e`, `f`, `g` and
`h`. We can sort them easily by using the `event_index` column. The
previous chunk is a gap. The previous chunk contains events `a`, `b`,
`c` and `d`.

Being able to read events by chunk clearly limit the amount of reading
and writing in the store. It is also close to what will be really done
in real life with this store. It also allows to represent gaps. We can
replace a gap by new chunk pretty easily with few writings.

A summary:

| Data structure | Reading           | Writing         |
|----------------|-------------------|-----------------|
| Ordering index | “O(1)”[^1] (fast) | O(n - 1) (slow) |
| Linked list    | O(n) (slow)       | O(3) (fast)     |
| Linked chunk   | O(n / capacity)   | O(capacity - 1) |

This patch contains a draft implementation of a linked chunk. It will
strictly only contain the required API for the `EventCache`, understand
it _is not_ designed as a generic data structure type.

[^1]: O(1) because it's simply one query to run; the database engine
does the sorting for us in a very efficient way, particularly if the
`ordering_index` is an unsigned integer.
@bnjbvr
Copy link
Member Author

bnjbvr commented Apr 15, 2024

Something we want to be sure to not reproduce in the Rust SDK: element-hq/element-web#18325 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants