Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Refactor create_new_client_event to use a new parameter, state_event_ids, which accurately describes the usage with MSC2716 instead of abusing auth_event_ids #12083

Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/12083.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Refactor `create_new_client_event` to use a new parameter, `state_event_ids`, which accurately describes the usage with [MSC2716](https://github.com/matrix-org/matrix-doc/pull/2716) instead of abusing `auth_event_ids`.
2 changes: 1 addition & 1 deletion scripts-dev/complement.sh
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,4 @@ fi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like batch sending is completely broken in 1.53, so would be nice to get this merged quickly :3

-- @tulir, #12083 (comment)

What's broken about it?

This PR doesn't fix anything (just refactors a parameter differently)

Here are the Complement tests passing on v1.53.0 Synapse tag. The only thing broken is Complement itself which is being fixed in matrix-org/complement#324

$ cd complement
# checkout https://github.com/matrix-org/complement/pull/324
$ git checkout madlittlemods/fix-unrecognised-appservice-access-token

$ cd synapse
$ git checkout v1.53.0
$ COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh TestImportHistoricalMessages

--- PASS: TestImportHistoricalMessages (19.46s)
    --- PASS: TestImportHistoricalMessages/parallel (6.28s)
        --- SKIP: TestImportHistoricalMessages/parallel/TODO:_Trying_to_send_insertion_event_with_same_`next_batch_id`_will_reject (0.00s)
        --- SKIP: TestImportHistoricalMessages/parallel/TODO:_What_happens_when_you_point_multiple_batches_at_the_same_insertion_event? (0.00s)
        --- PASS: TestImportHistoricalMessages/parallel/Federation (0.00s)
            --- PASS: TestImportHistoricalMessages/parallel/Federation/Historical_messages_are_visible_when_joining_on_federated_server_-_auto-generated_base_insertion_event (4.12s)
            --- PASS: TestImportHistoricalMessages/parallel/Federation/Historical_messages_are_visible_when_joining_on_federated_server_-_pre-made_insertion_event (4.67s)
            --- PASS: TestImportHistoricalMessages/parallel/Federation/Historical_messages_are_visible_when_already_joined_on_federated_server (5.33s)
            --- PASS: TestImportHistoricalMessages/parallel/Federation/When_messages_have_already_been_scrolled_back_through,_new_historical_messages_are_visible_in_next_scroll_back_on_federated_server (5.38s)
        --- PASS: TestImportHistoricalMessages/parallel/Existing_room_versions (0.00s)
            --- PASS: TestImportHistoricalMessages/parallel/Existing_room_versions/Not_allowed_to_redact_MSC2716_insertion,_batch,_marker_events (0.82s)
            --- PASS: TestImportHistoricalMessages/parallel/Existing_room_versions/Room_creator_can_send_MSC2716_events (0.90s)
        --- PASS: TestImportHistoricalMessages/parallel/Unrecognised_prev_event_ID_will_throw_an_error (1.34s)
        --- PASS: TestImportHistoricalMessages/parallel/Duplicate_next_batch_id_on_insertion_event_will_be_rejected (1.94s)
        --- PASS: TestImportHistoricalMessages/parallel/Normal_users_aren't_allowed_to_batch_send_historical_messages (2.05s)
        --- PASS: TestImportHistoricalMessages/parallel/Unrecognised_batch_id_will_throw_an_error (2.06s)
        --- PASS: TestImportHistoricalMessages/parallel/Batch_send_endpoint_only_returns_state_events_that_we_passed_in_via_state_events_at_start (2.52s)
        --- PASS: TestImportHistoricalMessages/parallel/Should_be_able_to_send_a_batch_without_any_state_events_at_start_-_user_already_joined_in_the_current_room_state (2.79s)
        --- PASS: TestImportHistoricalMessages/parallel/Should_be_able_to_batch_send_historical_messages_into_private_room (3.07s)
        --- PASS: TestImportHistoricalMessages/parallel/Historical_events_from_multiple_users_in_the_same_batch (3.11s)
        --- PASS: TestImportHistoricalMessages/parallel/Historical_events_resolve_in_the_correct_order (3.57s)
        --- PASS: TestImportHistoricalMessages/parallel/should_resolve_member_state_events_for_historical_events (3.62s)
        --- PASS: TestImportHistoricalMessages/parallel/Historical_events_from_batch_send_do_not_come_down_in_an_incremental_sync (3.72s)
PASS
ok  	github.com/matrix-org/complement/tests	21.795s

Copy link
Member

@tulir tulir Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to always throw

AssertionError: Attempting to create a non-m.room.create event with no prev_events

Happened on 1.53, so I tested this branch and it worked, then I tried plain develop and it failed again

Maybe specific to event creation workers again? Did Complement ever get worker support?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stack trace (I think this was from develop):

2022-02-28 22:09:26,425 - synapse.http.server - 100 - ERROR - POST-702 - Failed handle request via 'RoomBatchSendEventRestServlet': <XForwardedForRequest at 0x7f1cec068910 method='POST' uri='/_matrix/client/unstable/org.matrix.msc2716/rooms/%21luiLbOOYPPuupTkaHC:pc.mau.dev/batch_send?prev_event_id=$sOyl_xFS22jWSD0Tczo_fQ1c4aFc0vB809kYQj5tJ2c&user_id=@_tulir_whatsapp_<redacted>:pc.mau.dev&com.beeper.also_allow_user=@tulir:pc.mau.dev' clientproto='HTTP/1.1' site='8008'>
Traceback (most recent call last):
  File "synapse/http/server.py", line 269, in _async_render_wrapper
    callback_return = await self._async_render(request)
  File "synapse/http/server.py", line 471, in _async_render
    callback_return = await raw_callback_return
  File "synapse/rest/client/room_batch.py", line 148, in on_POST
    await self.room_batch_handler.persist_state_events_at_start(
  File "synapse/handlers/room_batch.py", line 240, in persist_state_events_at_start
    ) = await self.event_creation_handler.create_and_send_nonmember_event(
  File "synapse/handlers/message.py", line 857, in create_and_send_nonmember_event
    event, context = await self.create_event(
  File "synapse/handlers/message.py", line 611, in create_event
    event, context = await self.create_new_client_event(
  File "synapse/util/metrics.py", line 106, in measured_func
    r = await func(self, *args, **kwargs)
  File "synapse/handlers/message.py", line 977, in create_new_client_event
    assert (
AssertionError: Attempting to create a non-m.room.create event with no prev_events
2022-02-28 22:09:26,429 - synapse.access.http.8008 - 427 - INFO - POST-702 - 127.0.0.1 - 8008 - {@_tulir_whatsapp_<redacted>:pc.mau.dev} Processed request: 0.004sec/0.003sec (0.001sec, 0.000sec) (0.000sec/0.001sec/2) 55B 500 "POST /_matrix/client/unstable/org.matrix.msc2716/rooms/%21luiLbOOYPPuupTkaHC:pc.mau.dev/batch_send?prev_event_id=$sOyl_xFS22jWSD0Tczo_fQ1c4aFc0vB809kYQj5tJ2c&user_id=@_tulir_whatsapp_<redacted>:pc.mau.dev&com.beeper.also_allow_user=@tulir:pc.mau.dev HTTP/1.1" "mautrix-whatsapp/0.2.4+dev.3479e1cc mautrix-go/v0.10.11" [0 dbevts]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And of course a simple batch send with curl doesn't reproduce it, while the bridge does consistently on every attempt 🙃

Copy link
Member

@tulir tulir Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was able to reproduce it with curl by sending a non-member event in state_events_at_start. Happens on develop and 1.53.0, doesn't happen after merging this PR on top of develop. Doesn't happen on 1.52.0

Is non-member state events included in complement tests? I've stopped sending them from the bridge now (it only sent a dummy event as a workaround for #11188), but it should probably be tested (or rejected with a proper error, if it's not supposed to be allowed in state_events_at_start)

Copy link
Contributor Author

@MadLittleMods MadLittleMods Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-member event in state_events_at_start

😫 Ayee, that's one area I haven't really used yet.

Relevant code: synapse/handlers/room_batch.py#L232-L253

Feel free to create a separate issue to track this and will create a fix shortly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the specific issue seems to go away with this PR. No idea why, but that's why I thought this was the fix. Anyway, I made #12110 for either adding tests or removing the branch.

# Run the tests!
echo "Images built; running complement"
go test -v -tags synapse_blacklist,msc2403 -count=1 $EXTRA_COMPLEMENT_ARGS ./tests/...
go test -v -tags synapse_blacklist,msc2403,msc2716 -count=1 $EXTRA_COMPLEMENT_ARGS ./tests/...
64 changes: 48 additions & 16 deletions synapse/handlers/message.py
Original file line number Diff line number Diff line change
Expand Up @@ -493,6 +493,7 @@ async def create_event(
allow_no_prev_events: bool = False,
prev_event_ids: Optional[List[str]] = None,
auth_event_ids: Optional[List[str]] = None,
state_event_ids: Optional[List[str]] = None,
require_consent: bool = True,
outlier: bool = False,
historical: bool = False,
Expand Down Expand Up @@ -527,6 +528,16 @@ async def create_event(

If non-None, prev_event_ids must also be provided.

state_event_ids:
The full state at a given event. This is used particularly by the MSC2716
/batch_send endpoint which shares the same state across the whole batch.
The state events will be stripped down to only what's necessary to auth
a given event and set as the auth_event_ids. For insertion events, we will
add all of these state events as the explicit state so the rest of the
historical batch can inherit the same state and state_group. This should
normally be left as None, which will cause the auth_event_ids to be
calculated based on the room state at the prev_events.

require_consent: Whether to check if the requester has
consented to the privacy policy.

Expand Down Expand Up @@ -612,6 +623,7 @@ async def create_event(
allow_no_prev_events=allow_no_prev_events,
prev_event_ids=prev_event_ids,
auth_event_ids=auth_event_ids,
state_event_ids=state_event_ids,
depth=depth,
)

Expand Down Expand Up @@ -772,6 +784,7 @@ async def create_and_send_nonmember_event(
allow_no_prev_events: bool = False,
prev_event_ids: Optional[List[str]] = None,
auth_event_ids: Optional[List[str]] = None,
state_event_ids: Optional[List[str]] = None,
ratelimit: bool = True,
txn_id: Optional[str] = None,
ignore_shadow_ban: bool = False,
Expand Down Expand Up @@ -801,6 +814,15 @@ async def create_and_send_nonmember_event(
based on the room state at the prev_events.

If non-None, prev_event_ids must also be provided.
state_event_ids:
The full state at a given event. This is used particularly by the MSC2716
/batch_send endpoint which shares the same state across the whole batch.
The state events will be stripped down to only what's necessary to auth
a given event and set as the auth_event_ids. For insertion events, we will
add all of these state events as the explicit state so the rest of the
historical batch can inherit the same state and state_group. This should
normally be left as None, which will cause the auth_event_ids to be
calculated based on the room state at the prev_events.
ratelimit: Whether to rate limit this send.
txn_id: The transaction ID.
ignore_shadow_ban: True if shadow-banned users should be allowed to
Expand Down Expand Up @@ -856,8 +878,10 @@ async def create_and_send_nonmember_event(
requester,
event_dict,
txn_id=txn_id,
allow_no_prev_events=allow_no_prev_events,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was the previous omission of this a bug?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(could we get rid of allow_no_prev_events and just allow there to be no prev events when state_event_ids is given?)

Copy link
Contributor Author

@MadLittleMods MadLittleMods Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was the previous omission of this a bug?

Yes and seems to be why this PR fixes @tulir's issue, #12083 (comment)

(could we get rid of allow_no_prev_events and just allow there to be no prev events when state_event_ids is given?)

I'm leaning to no although technically yes probably.

The assertion nature of allow_no_prev_events disallowing the case for the majority of normal event sending cases is nice.

Also in our case, we only want the first event to have no prev_event_ids. The rest in the chain using state_event_ids should have prev_event_ids pointing at the event before it in the chain.

state_event_ids is really just used to derive auth_event_ids. Although I do see how they're related, auth_event_ids explicitly set by state_event_ids or derived from prev_event_ids in normal cases.

Copy link
Member

@erikjohnston erikjohnston Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context: we see a fair few "can't create event with no prev events" errors already, so having an allow_no_prev_events flag should hopefully make it easier to reason about what is going on when investigating that issue.

prev_event_ids=prev_event_ids,
auth_event_ids=auth_event_ids,
state_event_ids=state_event_ids,
outlier=outlier,
historical=historical,
depth=depth,
Expand Down Expand Up @@ -893,6 +917,7 @@ async def create_new_client_event(
allow_no_prev_events: bool = False,
prev_event_ids: Optional[List[str]] = None,
auth_event_ids: Optional[List[str]] = None,
state_event_ids: Optional[List[str]] = None,
depth: Optional[int] = None,
) -> Tuple[EventBase, EventContext]:
"""Create a new event for a local client
Expand All @@ -915,38 +940,43 @@ async def create_new_client_event(
Should normally be left as None, which will cause them to be calculated
based on the room state at the prev_events.

state_event_ids:
The full state at a given event. This is used particularly by the MSC2716
/batch_send endpoint which shares the same state across the whole batch.
The state events will be stripped down to only what's necessary to auth
a given event and set as the auth_event_ids. For insertion events, we will
add all of these state events as the explicit state so the rest of the
historical batch can inherit the same state and state_group. This should
normally be left as None, which will cause the auth_event_ids to be
calculated based on the room state at the prev_events.

depth: Override the depth used to order the event in the DAG.
Should normally be set to None, which will cause the depth to be calculated
based on the prev_events.

Returns:
Tuple of created event, context
"""
# Strip down the auth_event_ids to only what we need to auth the event.
# Strip down the state_event_ids to only what we need to auth the event.
# For example, we don't need extra m.room.member that don't match event.sender
full_state_ids_at_event = None
if auth_event_ids is not None:
# If auth events are provided, prev events must be also.
if state_event_ids is not None:
# Do a quick check to make sure that prev_event_ids is present to
# make the type-checking around `builder.build` happy.
# prev_event_ids could be an empty array though.
assert prev_event_ids is not None

# Copy the full auth state before it stripped down
full_state_ids_at_event = auth_event_ids.copy()

temp_event = await builder.build(
prev_event_ids=prev_event_ids,
auth_event_ids=auth_event_ids,
auth_event_ids=state_event_ids,
depth=depth,
)
auth_events = await self.store.get_events_as_list(auth_event_ids)
state_events = await self.store.get_events_as_list(state_event_ids)
# Create a StateMap[str]
auth_event_state_map = {
(e.type, e.state_key): e.event_id for e in auth_events
}
# Actually strip down and use the necessary auth events
state_map = {(e.type, e.state_key): e.event_id for e in state_events}
# Actually strip down and only use the necessary auth events
auth_event_ids = self._event_auth_handler.compute_auth_events(
event=temp_event,
current_state_ids=auth_event_state_map,
current_state_ids=state_map,
for_verification=False,
)

Expand Down Expand Up @@ -989,10 +1019,12 @@ async def create_new_client_event(
context = EventContext.for_outlier()
elif (
event.type == EventTypes.MSC2716_INSERTION
and full_state_ids_at_event
and state_event_ids
and builder.internal_metadata.is_historical()
):
old_state = await self.store.get_events_as_list(full_state_ids_at_event)
# Add explicit state to the insertion event so the rest of the batch
# can inherit the same state and state_group
old_state = await self.store.get_events_as_list(state_event_ids)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some nice context for why we're doing this that was only a PR review comment previously, #10975 (comment)

context = await self.state.compute_event_context(event, old_state=old_state)
else:
context = await self.state.compute_event_context(event)
Expand Down
71 changes: 41 additions & 30 deletions synapse/handlers/room_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,12 +121,11 @@ async def create_requester_for_user_id_from_app_service(

return create_requester(user_id, app_service=app_service)

async def get_most_recent_auth_event_ids_from_event_id_list(
async def get_most_recent_full_state_ids_from_event_id_list(
self, event_ids: List[str]
) -> List[str]:
"""Find the most recent auth event ids (derived from state events) that
allowed that message to be sent. We will use this as a base
to auth our historical messages against.
"""Find the most recent event_id and grab the full state at that event.
We will use this as a base to auth our historical messages against.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this comment more accurate


Args:
event_ids: List of event ID's to look at
Expand All @@ -136,24 +135,23 @@ async def get_most_recent_auth_event_ids_from_event_id_list(
"""

(
most_recent_prev_event_id,
most_recent_event_id,
_,
) = await self.store.get_max_depth_of(event_ids)
# mapping from (type, state_key) -> state_event_id
prev_state_map = await self.state_store.get_state_ids_for_event(
most_recent_prev_event_id
most_recent_event_id
)
# List of state event ID's
prev_state_ids = list(prev_state_map.values())
auth_event_ids = prev_state_ids
full_state_ids = list(prev_state_map.values())

return auth_event_ids
return full_state_ids

async def persist_state_events_at_start(
self,
state_events_at_start: List[JsonDict],
room_id: str,
initial_auth_event_ids: List[str],
initial_state_ids_at_event: List[str],
app_service_requester: Requester,
) -> List[str]:
"""Takes all `state_events_at_start` event dictionaries and creates/persists
Expand All @@ -164,10 +162,13 @@ async def persist_state_events_at_start(
Args:
state_events_at_start:
room_id: Room where you want the events persisted in.
initial_auth_event_ids: These will be the auth_events for the first
state event created. Each event created afterwards will be
added to the list of auth events for the next state event
created.
initial_state_ids_at_event:
The base set of state of the historical batch. When persisting each of
the events in state_events_at_start, the state will be stripped down to
only what's necessary to auth the given event and set as the
auth_event_ids. After each state event from state_events_at_start is
persisted, it will be added to the ongoing list of state_event_ids for
the next state event to be persisted with.
app_service_requester: The requester of an application service.

Returns:
Expand All @@ -176,7 +177,7 @@ async def persist_state_events_at_start(
assert app_service_requester.app_service

state_event_ids_at_start = []
auth_event_ids = initial_auth_event_ids.copy()
state_event_ids = initial_state_ids_at_event.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Side note: all these copies will be expensive for rooms with a lot of state, which will affect how quickly we can batch insert history.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how exactly to side-step this.

state_event_ids is an ongoing list where we add to as we persist state_events_at_start. Then return the full list at the end.

And we state_event_ids.copy() to snapshot the state in the loop and pass it to the event creation functions. Otherwise, the signatures integrity checks go out of whack because we modified the list in place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference on why we have to define state_event_ids here:

# Since each state event is marked as an outlier, the
# `EventContext.for_outlier()` won't have any `state_ids`
# set and therefore can't derive any state even though the
# prev_events are set. Also since the first event in the
# state chain is floating with no `prev_events`, it can't
# derive state from anywhere automatically. So we need to
# set some state explicitly.


# Make the state events float off on their own by specifying no
# prev_events for the first one in the chain so we don't have a bunch of
Expand All @@ -189,9 +190,9 @@ async def persist_state_events_at_start(
)

logger.debug(
"RoomBatchSendEventRestServlet inserting state_event=%s, auth_event_ids=%s",
"RoomBatchSendEventRestServlet inserting state_event=%s, state_event_ids=%s",
state_event,
auth_event_ids,
state_event_ids,
)

event_dict = {
Expand Down Expand Up @@ -226,7 +227,7 @@ async def persist_state_events_at_start(
# Make sure to use a copy of this list because we modify it
# later in the loop here. Otherwise it will be the same
# reference and also update in the event when we append later.
auth_event_ids=auth_event_ids.copy(),
state_event_ids=state_event_ids.copy(),
)
else:
# TODO: Add some complement tests that adds state that is not member joins
Expand All @@ -249,12 +250,12 @@ async def persist_state_events_at_start(
# Make sure to use a copy of this list because we modify it
# later in the loop here. Otherwise it will be the same
# reference and also update in the event when we append later.
auth_event_ids=auth_event_ids.copy(),
state_event_ids=state_event_ids.copy(),
)
event_id = event.event_id

state_event_ids_at_start.append(event_id)
auth_event_ids.append(event_id)
state_event_ids.append(event_id)
# Connect all the state in a floating chain
prev_event_ids_for_state_chain = [event_id]

Expand All @@ -265,7 +266,7 @@ async def persist_historical_events(
events_to_create: List[JsonDict],
room_id: str,
inherited_depth: int,
auth_event_ids: List[str],
state_event_ids: List[str],
app_service_requester: Requester,
) -> List[str]:
"""Create and persists all events provided sequentially. Handles the
Expand All @@ -281,8 +282,13 @@ async def persist_historical_events(
room_id: Room where you want the events persisted in.
inherited_depth: The depth to create the events at (you will
probably by calling inherit_depth_from_prev_ids(...)).
auth_event_ids: Define which events allow you to create the given
event in the room.
state_event_ids:
The full state at a given event. We share the same state across the whole
historical batch. The state events will be stripped down to only what's
necessary to auth a given event and set as the auth_event_ids. For
insertion events, we will add all of these state events as the explicit
state so the rest of the historical batch can inherit the same state and
state_group.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. We probably don't actually want this behaviour, instead we want to only set the state for the insertion event and allow subsequent events to inherit the state from their prev event. When just sending messages this doesn't matter, but if we send state events in the batch then the current behaviour means that the state event won't become part of the state.

For example, if a user joins half way through the batch and they won't be able to send the message (assuming they weren't added as part of state_events_at_start).

I think its trivial to change, that we only set state_event_ids for the initial insertion event, i.e. when the prev_events are empty.

Or am I forgetting how this works yet again?

Copy link
Contributor Author

@MadLittleMods MadLittleMods Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead we want to only set the state for the insertion event and allow subsequent events to inherit the state from their prev event.

This is what currently is happening. And as far as I can tell, the comment explains it that way too.

For non-insertion events, we only use state_event_ids to derive auth_event_ids to set.

For insertion events, we set the state as state_event_ids

For example, if a user joins half way through the batch and they won't be able to send the message (assuming they weren't added as part of state_events_at_start).

It's not expected to have any state events in the batch of events (or at least processed to auth further messages). Any messages being sent should have m.room.member already in the state for that point in time where we are inserting at or added to state_events_at_start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know why I get confused about this every time. Ah well.

From the doc-comment:

We share the same state across the whole historical batch.

Ok, so I think is what is getting me. I think what you mean is that the state for the historical batch is all derived from the state at the insertion event? My reading of it was "all events in the batch have the same state", which is not true?

For non-insertion events, we only use state_event_ids to derive auth_event_ids to set.

Why do we need this for non-insertion events? Can't we just let the code derive it as normal from the state? That way the only special case is handling the insertion event.

It's not expected to have any state events in the batch of events (or at least processed to auth further messages). Any messages being sent should have m.room.member already in the state for that point in time where we are inserting at or added to state_events_at_start.

This means that the timeline will never see when a user joins/leaves, which may be fine for the Gitter use case, but we probably want to support it more generally than other use cases. I think we get this for free though, if we stop specifying the auth events directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this for non-insertion events? Can't we just let the code derive it as normal from the state? That way the only special case is handling the insertion event.

Ahh, this does work and is a good simplification! Updated comments to be more nuanced here too ⏩

I also tried doing something similar for the state_events_at_start chain to avoid the copies but since each state event is an outlier which uses EventContext.for_outlier() and has no state_ids to derive from, we have to explicitly set the state for each and can't avoid it as far as I can tell. I also commented the code with these details.

Overall, more understanding of why things are the way they are from this prompt to change 👍

This means that the timeline will never see when a user joins/leaves, which may be fine for the Gitter use case, but we probably want to support it more generally than other use cases. I think we get this for free though, if we stop specifying the auth events directly.

The problem is that the events are persisted in reverse-chronological order so they have the correct stream_ordering while being backfilled which decrements.

As an example of why this fails, if we /batch_send -> events: [messageFromA, memberEventB, messageFromB], it will first try to persist messageFromB which will fail because memberEventB hasn't been processed yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the events are persisted in reverse-chronological order so they have the correct stream_ordering while being backfilled which decrements.

As an example of why this fails, if we /batch_send -> events: [messageFromA, memberEventB, messageFromB], it will first try to persist messageFromB which will fail because memberEventB hasn't been processed yet.

Oh yes, nyargh. I think we want to fix that, but not in this PR I guess. My hunch is that we want to try and persist the entire batch of events all in one go, rather than doing them one-by-one in reverse order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own reference, here is a Complement test around adding state in the middle of the events batch (it fails for reason outlined above).

Complement test for "State in the middle of the events batch resolves for subsequent events"
t.Run("State in the middle of the events batch resolves for subsequent events", func(t *testing.T) {
	t.Parallel()

	roomID := as.CreateRoom(t, createPublicRoomOpts)
	alice.JoinRoom(t, roomID, nil)

	// Create the "live" event we are going to import our historical events next to
	eventIDsBefore := createMessagesInRoom(t, alice, roomID, 1, "eventIDsBefore")
	eventIdBefore := eventIDsBefore[0]
	timeAfterEventBefore := time.Now()

	events := createMessageEventsForBatchSendRequest([]string{virtualUserID}, timeAfterEventBefore, 1)

	// Add some events from Ricky who joined in the middle of the events batch
	virtualUserID2 := "@ricky:hs1"
	ensureVirtualUserRegistered(t, as, "ricky")
	events = append(
		events,
		createJoinStateEventsForBatchSendRequest([]string{virtualUserID2}, timeAfterEventBefore)...,
	)
	events = append(
		events,
		createMessageEventsForBatchSendRequest([]string{virtualUserID2}, timeAfterEventBefore, 1)...,
	)

	// Import a historical event
	batchSendHistoricalMessages(
		t,
		as,
		roomID,
		eventIdBefore,
		"",
		createJoinStateEventsForBatchSendRequest([]string{virtualUserID}, timeAfterEventBefore),
		events,
		// Status
		200,
	)
})

app_service_requester: The requester of an application service.

Returns:
Expand Down Expand Up @@ -325,7 +331,7 @@ async def persist_historical_events(
# The rest should hang off each other in a chain.
allow_no_prev_events=index == 0,
prev_event_ids=event_dict.get("prev_events"),
auth_event_ids=auth_event_ids,
state_event_ids=state_event_ids,
historical=True,
depth=inherited_depth,
)
Expand All @@ -343,10 +349,10 @@ async def persist_historical_events(
)

logger.debug(
"RoomBatchSendEventRestServlet inserting event=%s, prev_event_ids=%s, auth_event_ids=%s",
"RoomBatchSendEventRestServlet inserting event=%s, prev_event_ids=%s, state_event_ids=%s",
event,
prev_event_ids,
auth_event_ids,
state_event_ids,
)

events_to_persist.append((event, context))
Expand Down Expand Up @@ -376,7 +382,7 @@ async def handle_batch_of_events(
room_id: str,
batch_id_to_connect_to: str,
inherited_depth: int,
auth_event_ids: List[str],
state_event_ids: List[str],
app_service_requester: Requester,
) -> Tuple[List[str], str]:
"""
Expand All @@ -391,8 +397,13 @@ async def handle_batch_of_events(
want this batch to connect to.
inherited_depth: The depth to create the events at (you will
probably by calling inherit_depth_from_prev_ids(...)).
auth_event_ids: Define which events allow you to create the given
event in the room.
state_event_ids:
The full state at a given event. We share the same state across the whole
historical batch. The state events will be stripped down to only what's
necessary to auth a given event and set as the auth_event_ids. For
insertion events, we will add all of these state events as the explicit
state so the rest of the historical batch can inherit the same state and
state_group.
app_service_requester: The requester of an application service.

Returns:
Expand Down Expand Up @@ -438,7 +449,7 @@ async def handle_batch_of_events(
events_to_create=events_to_create,
room_id=room_id,
inherited_depth=inherited_depth,
auth_event_ids=auth_event_ids,
state_event_ids=state_event_ids,
app_service_requester=app_service_requester,
)

Expand Down
Loading