Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Error: 404 - Could not find event - on incoming federation transaction PUT queries #10589

Closed
MurzNN opened this issue Aug 12, 2021 · 10 comments
Closed

Comments

@MurzNN
Copy link

MurzNN commented Aug 12, 2021

Our public ru-matrix.org homeserver have a lot of 404 - Could not find event errors in logs (about 7000-8000 per day), that pointing to one same event with id $Dv3wmat7XbtBm4O37eT+Oa2kxHKUfCj36nCMiHyj5F4. This event is from MatrixHQ room (room_id = !OGEhHVWSdvArJzumhm:matrix.org). #irc:matrix.org (!BAXLHOFjvDKUeLafmO:matrix.org).

And I see that 99.99% of all "Could not find event" 404 errors are with exactly this event id, same error with other event ids happens very rarely (0-7 per day, average from last 7 days is about 2.73 per day).

Here is example of matched log lines with incoming federation transaction from matrix.org server:

2021-08-12 12:00:57,937 - synapse.federation.transport.server - 445 - INFO - PUT-1922933- Received txn 1628611386500 from matrix.org. (PDUs: 1, EDUs: 0)
2021-08-12 12:00:57,947 - synapse.http.server - 88 - INFO - PUT-1922933- <SynapseRequest at 0x7f2bf47735b0 method='PUT' uri='/_matrix/federation/v1/send/1628611386500' clientproto='HTTP/1.0' site=12011> SynapseError: 404 - Could not find event $Dv3wmat7XbtBm4O37eT+Oa2kxHKUfCj36nCMiHyj5F4
2021-08-12 12:00:57,948 - synapse.access.http.12011 - 389 - INFO - PUT-1922933- ::ffff:127.0.0.1 - 12011 - {matrix.org} Processed request: 0.014sec/-0.000sec (0.001sec, 0.000sec) (0.002sec/0.002sec/1) 101B 404 "PUT /_matrix/federation/v1/send/1628611386500 HTTP/1.0" "Synapse/1.40.0rc2 (b=matrix-org-hotfixes,53d7d9251)" [1 dbevts]

And here - from tchncs.de server:

2021-08-12 14:39:45,287 - synapse.access.http.12011 - 389 - INFO - PUT-1970495- ::ffff:127.0.0.1 - 12011 - {tchncs.de} Processed request: 0.356sec/-0.000sec (0.006sec, 0.000sec) (0.031sec/0.003sec/1) 101B 404 "PUT /_matrix/federation/v1/send/1628602889429 HTTP/1.0" "Synapse/1.40.0" [1 dbevts]
2021-08-12 14:39:45,286 - synapse.http.server - 88 - INFO - PUT-1970495- <SynapseRequest at 0x7f2cd7797d60 method='PUT' uri='/_matrix/federation/v1/send/1628602889429' clientproto='HTTP/1.0' site=12011> SynapseError: 404 - Could not find event $Dv3wmat7XbtBm4O37eT+Oa2kxHKUfCj36nCMiHyj5F4
2021-08-12 14:39:45,082 - synapse.federation.transport.server - 445 - INFO - PUT-1970495- Received txn 1628602889429 from tchncs.de. (PDUs: 1, EDUs: 0)

The interesting thing that this event is present in local Synapse database, here is row from events SQL table:

# SELECT * FROM events WHERE event_id = '$Dv3wmat7XbtBm4O37eT+Oa2kxHKUfCj36nCMiHyj5F4';
-[ RECORD 1 ]--------+---------------------------------------------
topological_ordering | 13615
event_id             | $Dv3wmat7XbtBm4O37eT+Oa2kxHKUfCj36nCMiHyj5F4
type                 | m.room.server_acl
room_id              | !BAXLHOFjvDKUeLafmO:matrix.org
content              | 
unrecognized_keys    | 
processed            | t
outlier              | f
depth                | 13615
origin_server_ts     | 1572457978078
received_ts          | 1572467943643
sender               | @abuse:matrix.org
contains_url         | f
instance_name        | 
stream_ordering      | 14534747

And here is from event_json table:

# SELECT * FROM event_json WHERE event_id = '$Dv3wmat7XbtBm4O37eT+Oa2kxHKUfCj36nCMiHyj5F4';
event_id          | $Dv3wmat7XbtBm4O37eT+Oa2kxHKUfCj36nCMiHyj5F4
room_id           | !BAXLHOFjvDKUeLafmO:matrix.org
internal_metadata | {"outlier": false, "stream_ordering": -109786}
json              | {"auth_events": ["$fgAiL9F7RRbilPAfTVYhaOPl7sunZZVhtUCPvphGDL4", "$S38/aRG4Qz8eIdrTyzJuQdH3Ieus1Y1dhczoFKnOB7M", "$AQfAndymjyEmF3BjyHxwK9wIBZxMq0i8APoWoPIubNM"], "content": {"allow": ["*"], "allow_ip_literals": false, "deny": ["*.200acres.org", "*.glowers.club", "*.gossip.love", "*.matrix.kiwifarms.net", "*.nerdsin.space", "*.ordoevangelistarum.com", "*.zemos.net", "200acres.org", "ardaxi.com", "c-24-11-108-182.hsd1.ut.comcast.net", "glowers.club", "gossip.love", "matrix.kiwifarms.net", "nerdsin.space", "ordoevangelistarum.com", "zemos.net"]}, "depth": 13615, "hashes": {"sha256": "YcmM3gaai6K4TsO0n91HmuEYnDglDWoYt6hBXwvy/JA"}, "origin": "matrix.org", "origin_server_ts": 1572457978078, "prev_events": ["$V3WojV7PVU7ZlgbI+evDtSgajnyTwV4ipaMzIUTbXNw"], "prev_state": [], "room_id": "!BAXLHOFjvDKUeLafmO:matrix.org", "sender": "@abuse:matrix.org", "state_key": "", "type": "m.room.server_acl", "signatures": {"matrix.org": {"ed25519:auto": "EKlEGFXWsQdsov4OUlvdzh3wMsv69zmY9cehiZ0KmwW2WE6HMNXkxcW2aB07yHv+WCYaAreox72jxKXBrhy7CQ"}}, "unsigned": {"replaces_state": "$h4lw0pzb9dclPUxxQIT3frAA/EzDOgPXCM38ZxAvY9U", "age": 9965389}}
format_version    | 2

I see no errors about this in Synapse main process and workers, so can't understand why this event couldn't be found.

What I can try more to investigate this problem deeper?

@MurzNN
Copy link
Author

MurzNN commented Aug 12, 2021

Can anybody post the template of SQL query, that Synapse uses to find event by id in federation transaction?
Maybe some inner join or subquery results in an empty result...
I want to reproduce it manually to understand, is this problem on Synapse side, or maybe on database level?

@MurzNN
Copy link
Author

MurzNN commented Aug 12, 2021

Or maybe other way is to delete this event from database, to resync it from other homeservers via federation? Or this action will totally broke all the thing?

@richvdh
Copy link
Member

richvdh commented Aug 13, 2021

Can anybody post the template of SQL query, that Synapse uses to find event by id in federation transaction?

it's really not that simple.

might be worth checking that the rooms table has an entry for !OGEhHVWSdvArJzumhm:matrix.org ?

@richvdh
Copy link
Member

richvdh commented Aug 13, 2021

Or maybe other way is to delete this event from database, to resync it from other homeservers via federation? Or this action will totally broke all the thing?

Don't do that. You'll almost certainly make things worse.

@richvdh
Copy link
Member

richvdh commented Aug 13, 2021

The only other suggestion I have right now is to turn logging up to DEBUG and share the logs for the federation worker. (If you can share the raw logs rather than grepping for individual transactions, it will probably save some back-and-forth).

@MurzNN
Copy link
Author

MurzNN commented Aug 13, 2021

might be worth checking that the rooms table has an entry for !OGEhHVWSdvArJzumhm:matrix.org ?

# select * FROM rooms WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org';
            room_id             | is_public | creator | room_version | has_auth_chain_index 
--------------------------------+-----------+---------+--------------+----------------------
 !OGEhHVWSdvArJzumhm:matrix.org | f         |         | 5            | t
(1 row)

Thanks for suggestion, I'll turn on the DEBUG logs and share the results. @richvdh I even can give you private access to Grafana Loki with that logs, to easier grepping and search, if you allow!

@richvdh
Copy link
Member

richvdh commented Aug 13, 2021

Investigation on @MurzNN's server shows that the problem is caused by an event $poNy2GJameXQnLYxnaEvrR14UNin/D/XHf3Q+maRreA in !BAXLHOFjvDKUeLafmO:matrix.org (#irc:matrix.org) which refers to $Dv3wmat7XbtBm4O37eT+Oa2kxHKUfCj36nCMiHyj5F4. In short, #irc:matrix.org is totally broken.

That bad event was generated in October 2019 by matrix.org. I think this is an old bug which has since been fixed, but I am going to investigate more.

Edit: the above is completely wrong: $Dv3wmat7XbtBm4O37eT+Oa2kxHKUfCj36nCMiHyj5F4 is in !BAXLHOFjvDKUeLafmO:matrix.org, so it's ok for other events in that room to reference it.

@MurzNN
Copy link
Author

MurzNN commented Aug 13, 2021

Sorry, I mixed up the id's of event in first post, so I replaced now them with correct id, and figured out that this event_id is present in rejections table:

# SELECT COUNT(*) FROM public.rejections WHERE event_id = '$Dv3wmat7XbtBm4O37eT+Oa2kxHKUfCj36nCMiHyj5F4'
1

So seems here is the main source of problem.

@richvdh
Copy link
Member

richvdh commented Aug 13, 2021

There may be more than one problem here.

$Dv3wmat7XbtBm4O37eT+Oa2kxHKUfCj36nCMiHyj5F4 is a rejected event, and in current_state_events, so this is a duplicate of #6867.

@MurzNN
Copy link
Author

MurzNN commented Aug 13, 2021

Huge thanks to the @richvdh for investigating the source of this federation problem, this issue is solved now!

Tight solution is purging the problematic room, which contains the event, via those steps:

  1. Find the event in rooms table by event_id, mentioned in "Could not find event" log line. In my case it was $Dv3wmat7XbtBm4O37eT+Oa2kxHKUfCj36nCMiHyj5F4
  2. Find the room_id from that event, in my case it was !BAXLHOFjvDKUeLafmO:matrix.org
  3. Use https://matrix-org.github.io/synapse/develop/admin_api/rooms.html#delete-room-api to purge that room (kicking all local users from it and deleting from local database).
  4. Rejoin to that room users, who must be a member of it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants