Skip to content
This repository has been archived by the owner on Dec 19, 2023. It is now read-only.

Silent message drop since the last maintenance #6

Open
airone506 opened this issue May 16, 2023 · 41 comments
Open

Silent message drop since the last maintenance #6

airone506 opened this issue May 16, 2023 · 41 comments
Assignees

Comments

@airone506
Copy link

There is silent message drop since the Libera.chat bridge maintenance & upgrade on May 10th, 2023. Estimated ratio of message dropped is ~20% (based on observations, I might add exact numbers later). The Libera.chat messages are not displayed to the Matrix users in the room.

I did some checks in the past, some of them very diligent. I never observed such a massive drop. Actually, I did observe 100% message delivery even on the very diligent checks.

@airone506
Copy link
Author

I checked quite a busy Libera.Chat channel bridged to Matrix. Within a period of six days, there was only 84.08% delivery success rate, i.e. almost 16% of messages being lost.

@Half-Shot
Copy link
Collaborator

Hiya, do you know which channel(s) you're seeing this on. We deployed a patch last week to hopefully improve the issue we were seeing, so sounds like we missed a thing.

@airone506
Copy link
Author

The issue applies to any bridged channel I'm in. The figures I posted above are from #python. ##English is another Libera channel with some significant traffic volume, where the issue can be clearly seen. Of course it applies for low traffic volume channels like #libera-matrix and #matrix-irc, where I also definitely saw the issue. Matrix room #plasma:kde.org, bridged to Libera channel #plasma also drops the IRC messages...

@Cydox
Copy link

Cydox commented May 21, 2023

Also seeing this on the #fedora (on libera.chat, matrix: #fedora:fedoraproject.org)

A seemingly random sample of messages from people on IRC don't appear on Matrix.

@airone506
Copy link
Author

airone506 commented May 22, 2023

I see the issue in every Libera channel where I am connected both via IRC client and Matrix. I did the last check today for all channels and I can confirm it really is in every channel I'm in.

It seems only a very few users, who are using their Matrix also to access Libera, are bothered by this issue. My guess is they are not aware of the issue.

@Kleidukos
Copy link

Kleidukos commented May 22, 2023

I can confirm that this is currently affecting the #hackage channel on Libera, if you want to see it by yourself, @Half-Shot :)

@progval
Copy link

progval commented May 22, 2023

This also happens in channels with "allowUnconnectedMatrixUsers": true.

@alkisg
Copy link

alkisg commented May 22, 2023

I'm also affected by this, e.g. in #debian, #ubuntu, #debian-next, #ubuntu-server etc (of course all in libera).

@srett
Copy link

srett commented May 25, 2023

Still happening as of today.

@thecb1
Copy link

thecb1 commented May 26, 2023

I saw messages getting lost in both ways btw.

@airone506
Copy link
Author

I saw messages getting lost in both ways btw.

This does not match my experience. Weren't these lost messages from Matrix (invisible to Libera) from the users, who did not perform !reconnect after May 10th? This would make perfect sense.

For reference. matrix-org/matrix-appservice-irc#1712

@thecb1
Copy link

thecb1 commented May 27, 2023

Well, it won't reconnect for me ...

@i-c-o-n
Copy link

i-c-o-n commented May 30, 2023

Same in #coreboot, virtually nothing from IRC gets through to Matrix.

@airone506
Copy link
Author

... virtually nothing from IRC gets through to Matrix.

Yes, this is the current status of the bridge. Way different from the status when this issue has been opened.

@ApostolosB
Copy link

It seems to be working now. Is this considered fixed?

@airone506
Copy link
Author

airone506 commented Jun 1, 2023

... Is this considered fixed?

Definitely not fixed. The issue still appears as it did when this ticket has been opened.

@progval
Copy link

progval commented Jun 1, 2023

Indeed, I just had a message dropped in #swh-team:matrix.org (bridged to #swh-team on Libera) at 08:41:42 UTC, in the Libera->Matrix direction. "allowUnconnectedMatrixUsers": true is set in this room.

@AndrewFerr
Copy link
Member

Has this improved at all? Some fixes were deployed ~10h ago and should have settled down by now.

@progval
Copy link

progval commented Jun 1, 2023

I see another loss at 17:42:49 (2.5h ago) on #swh-sysadm:matrix.org.

@progval
Copy link

progval commented Jun 1, 2023

Also at 20:21:54 on #libera:libera.chat

@airone506
Copy link
Author

airone506 commented Jun 1, 2023

Has this improved at all? Some fixes were deployed...

Thank you for the fixes deployed. I'm sorry to inform the issue is still there.

I just did some quick check in #python Libera channel in the time frame around 18:10 - 18:40 UTC and I saw multiple messages being (seemingly randomly) dropped.

EDIT 1: Just saw multiple messages being dropped in some small Libera channel, time around 21:04 UTC, June 1st.

EDIT 2: Issue seen multiple times in multiple Libera channels between 07:00 and 12:00 UTC, June 2nd.

@Half-Shot Half-Shot self-assigned this Jun 2, 2023
@progval
Copy link

progval commented Jun 6, 2023

another one today at 16:21:02, again in #swh-team:matrix.org (and it still has "allowUnconnectedMatrixUsers": true). right after the dropped message, a puppet connected to IRC.

@etameta
Copy link

etameta commented Jun 6, 2023

Four drops over 20 messages (so 20%) in #linux-it:libera.chat in the last few hours, at 2023-06-06 15:26:08 +0000, 2023-06-06 18:42:55 +0000, 2023-06-06 19:41:54 +0000 and 2023-06-06 19:49:59 +0000 .

@Half-Shot
Copy link
Collaborator

I believe this situation has now improved, but please keep letting me know if it's gotten worse.

@Half-Shot Half-Shot transferred this issue from matrix-org/matrix-appservice-irc Jun 8, 2023
@airone506
Copy link
Author

I just did quick ad-hoc "human-powered" scan of #python at Libera.Chat, going backwards in the timeline.

I saw one message in #python dropped at 06:01:29 UTC (June 8th). Text of the message was "scipy.signal", if that helps for searching.

Another message #python dropped at 04:45:34 UTC (June 8th). The text of the message was "(or this has been my experience when I've grappled with the same questions, anyway)". I did not continue with the checking after this one.

No message drop seen after that one at 06:01:29 UTC and I can tell the situation has dramatically improved indeed.

@Half-Shot: Are these two drops before some fix has been deployed?

@inglor
Copy link

inglor commented Jun 8, 2023

I believe this situation has now improved, but please keep letting me know if it's gotten worse.

Lost messages again this morning 08 Jun 2023 11:22:29 (according to IRC client) GMT+1 in #archlinux-aurweb at Libera.Chat. The next message appearing is 20 seconds later in both IRC and Matrix - screenshots attached.

IRC log:
irc-log

Matrix log:
matrix-log

This is not a very active channel to have multiple messages coming through.

@airone506
Copy link
Author

Another drops spotted in #python at:
11:36:26 UTC, May 8th,
12:36:45 UTC, May 8th,
13:31:26 UTC, May 8th,
13:39:21 UTC, May 8th,
14:00:35 UTC, May 8th,
14:00:36 UTC, May 8th,
14:19:41 UTC, May 8th,
...

@airone506
Copy link
Author

airone506 commented Jun 9, 2023

I saw multiple messages being dropped in multiple rooms. Just letting to know that the issue still exists until now.

@srett
Copy link

srett commented Jun 12, 2023

Still dropping messages as of today

@airone506
Copy link
Author

Still dropping messages as of today.

@airone506
Copy link
Author

Still dropping messages as of today. I'd believe lot of users who rely on the bridge as their daily driver are still unaware of this issue.

@srett
Copy link

srett commented Jun 21, 2023

Indeed, I start to feel this state is worse than having no bridge at all. Even if you know about this, it still leads to confusion and misunderstandings if just the right message gets dropped.

@simonmichael
Copy link

Does the bridge or any external service log these events yet, to quantify it and automate the tedious human reporting ?

@progval
Copy link

progval commented Jun 22, 2023

$1LVFwOd4_H-bpmQGdRsWLMo-t7gLHOTrXw6Wfw2k-C4 and $lSDhR2hpjPnlgEcKyMVDcdb-C-iRbDgOikadks3Pu2A in #irc:matrix.org, sent today at 19:00 UTC, was not sent to Libera.

@srett
Copy link

srett commented Jun 23, 2023

If fixing this is too hard, maybe make the bridge just forward each message to Matrix twice. This should lower dropped messages from 10% to 1%.

@progval
Copy link

progval commented Jun 25, 2023

the two messages tulir posted today on #matrix-dev:matrix.org: $cFi1JlMOJaUjPjEiO_NSw0PLRgAiXgcHejQnqh7A1NU and $HxK2VTDpIS9iiS8BcEFSBpYL3dQrGTVlBjtM8ymCU_8, because his puppet disconnected yesterday at 16:02:59Z (that's probably the same issue as #12)

@BrenBarn
Copy link

BrenBarn commented Jul 1, 2023

The issue still seems to be happening. Is there any progress? All I see in recent messages on this ticket is people saying "the problem still exists". . .

@srett
Copy link

srett commented Jul 3, 2023

@BrenBarn libera staff made an announcement a month ago that they'd take some action as of July 1st, bit nothing seems to have happened. Which would mean they implicitly picked option 1: https://libera.chat/news/matrix-irc-bridge-updates

@BrenBarn
Copy link

BrenBarn commented Jul 3, 2023

@srett: I saw that, but as far as I can see that doesn't really have anything to do with whether this bug is being fixed. That's just libera trying to decide how to work around the problem.

@progval
Copy link

progval commented Jul 21, 2023

This is still happening at the same frequency as last month. Is it useful if we keep reporting them?

@ara4n
Copy link
Member

ara4n commented Jul 30, 2023

We haven't been updating this issue as much as we could or should; apologies. We've been dealing with the various complications around deportalling while trying to fix the bug, and comms with the libera team has taken priority over comms with the community. Good news(?) is that we have a plausible root cause for the bug and will try to ship a fix on Monday.

I've written up a bit more context on https://news.ycombinator.com/item?id=36923504. Apologies that things got so unreliable for so long.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests