-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Outgoing federation is mostly broken after restoring from backup #16025
Comments
Some more infos:
A couple hours ago we deleted the signing key and let synapse generate a new one. Now we get messages from matrix.org but matrix.org only receives old messages some sort of sometimes. We got from an Admin of another server this logs:
Example Room: |
log from a different server and room where my messages dont arrive (two messages to other server)
also some log from right before on the other server
meanwhile the tchncs server, grep for the other server
working test message other server -> tchncs – grep on the tchncs side (odd? am i missing logs?)
|
Update: at least in the PQO.. tchncs community room, now suddenly federation got interrupted from matrix.org -> tchncs.de as well ... still nothing obvious to spot in the logs, tho i am trying to filter in the dark and don't have clear knowledge on what to search for Update on update: this was incorrect – those messages only mysteriously vanished in element-desktop |
@verymilan Is there someone among the contributors whom we could tag directly for this issue? Our internal Kielux event organization communication is currently partially interrupted, which affects us quite a bit. |
At the moment we experience, that some servers has no problem federating with matrix.org and tchncs.de |
@Moini hi! Did you mean to mention me? I personally do not knwo :) |
Just generally anyone who reads this :) Looks like there has been some progress, one of us reports that people now read his old messages. Only in our unencrpyted chat room, the tchncs accounts can post, but their posts cannot be read by others with accounts on different servers. |
Can confirm that I can't read DMs by tchncs.de people (decryption error), though they can apparently read my messages from matrix.org.
Looks like you will have to ask people about somehow exchanging / accepting that new key. I can definitely see this as a required mechanism to prevent taking over servers via DNS hacks. (Even though they can still read my messages..) |
@verymilan information on the endpoint is here Trying to see if there is exponential back-off playing a role since I added the tchncs.de to my trusted key server and killed the keys and as far as I can tell am able to send from my server -> tchncs.de. From my logs, it doesn't even look like tchncs.de is attempting to send stuff to me I believe |
@timaeos will try to reset :o
|
Drat, was hoping it would show a failure. I'll keep looking through my logs to see if there is an A-ha moment |
The situation seems to have changed over the last 24 hours, but it's unclear whether this is because a number of homeservers (like mine) are in the room for tchncs.de to bounce messages off to reach the homeservers that it can't talk to directly. The main issue users are currently experiencing is many "unable to decrypt" errors, which would also be occurring if the messages are bouncing via another homeserver, so federation definitely looks like the issue. The keys look absolutely fine on matrix.org, so we're running out of ideas why it's failing! |
@tcpipuk Only the array number 1 is correct, 0 and 2 is false |
I don't think that endpoint array order is static. Which keys are you saying are incorrect? Based on https://matrix.tchncs.de/_matrix/key/v2/server I'm guessing that you're saying that only |
Something changed, was able to receive some messages on matrix.org which - according to their timestamps - were sent at 11:00 and 15:00 yesterday. |
We believe this is now resolved, the issue appears to have been a faulty federation_sender on the homeserver, so hopefully everything's working fine again now! 🙂 |
@verymilan was able to find an error in one of his federation senders with the help of the people in the synapse admin room. That was resolved and now the server is federating correctly. It's going to take a while for it to churn through the backlog. They'll probably close this issue tomorrow after they get some much needed sleep |
Yay, thanks a lot everyone!!! |
Thank you everybody who has patiently tried to investigate this problem here and in various Matrix rooms. Means alot 🤗 |
Hi there, I have already tried to ask in the Synapse Matrix room without success and start to wonder if its a priority issue of some sort.
I was forced to restore Synapse after both drives failed upon drive replacement. I admit that I also was forced to recreate configs and to put a new signing key (with a new name of course) in place.
The server is now running without high load for over a day.
Sympoms on the matrix.org homeserver:
!PQOOuZmmjzGoXWzKuZ:tchncs.de
) every few hours one or two messages arrive from tchncs.de!gcAYQIzQPMgGUSaAua:tchncs.de
– this was a test dm room with my accounts: initial messages couldn't be decrypted, after that, one wayClient symptoms:
If I receive an encrypted remote message, Element Desktop throws a decryption error even after clearing the cache and logging out and in again – however – element-x-ios still shows that message just fine.
Logs:
INFO
)synapse.handlers.sync - 741 - INFO - GET-2953895 - Failed to find any events in room !gcAYQIzQPMgGUSaAua:tchncs.de at RoomStreamToken(topological=None, stream=337273433, instance_map=immutabledict({}))
tho I could not confirm on a different host:matrix-matrix-synapse-1 | 2023-07-28 20:30:04,453 - synapse.federation.federation_client - 752 - WARNING - sync_partial_state_room-1-$rvTcqV4DvQssCHO0ZELo8zvcr2Lyw1NR8u5J5GNti0U-$46nRykeEPJVC2K1roiwCcgYLXpxTceTI9blNIQ6_Y70 - Signature on retrieved event $e4xQAons8TGPgR4iy4RhGRX_0_dfCZmRTrhdL9MoypM was invalid (unable to verify signature for sender domain tchncs.de: 401: Failed to find any key to satisfy: _FetchKeyRequest(server_name='tchncs.de', minimum_valid_until_ts=1690533075296, key_ids=['ed25519:a_hyvD'])). Checking local store/origin server
Unable to get hierarchy of !jXJzclMjZYfMiParuQ:tchncs.de via federation: 404: Unknown room: !jXJzclMjZYfMiParuQ:tchncs.de
on a different server i hostWARNING - POST-5204440 - Inserting into a wheel timer that hasn't been read from recently.
Messing with configs
Ricardo in the Synapse Admins room was able to query the old key it seems from his server (I didn't find out how to do it properly/unhashed) – since Synapse was running for a while at this point, instead of trying to put it back in place, I have set it to expired in the Synapse settings.
Possibly related issues (that sadly didn't help me yet)
The text was updated successfully, but these errors were encountered: