Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending two messages at the same and having the first fail still sends the second #22885

Closed
turt2live opened this issue Jul 18, 2022 · 9 comments · Fixed by matrix-org/matrix-js-sdk#3131
Assignees
Labels
A-Timeline O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect Team: App Z-Chronic

Comments

@turt2live
Copy link
Member

Steps to reproduce

  1. Send two messages
  2. Ensure your server fails to send the first
  3. Watch the second get sent (it should be blocked due to the error on the first)
  4. Send another message
  5. That message gets appropriately gated by the failed message

Outcome

What did you expect?

3 of my messages to fail to send, not 2.

What happened instead?

image

The first message sent okay. After it was sent, a new batch of messages was started: the second and third were sent at roughly the same time and due to a server crashing slowly ended up blocked for long enough to work. Once the second failed to send and the third sent, I started a new message (fourth) which appropriately got marked as failed to send.

Resolved by manually re-queuing the messages in the order I wanted them, and screenshotting the messages so the receipients knew what was going on :)

(this is what makes this bug nasty: we have a lot of code which tries to make sure order is preserved, and it broke that guarantee here, making the app lose a bit of user trust that it can safely send messages)

Operating system

Windows 10

Application version

No response

How did you install the app?

No response

Homeserver

t2l.io

Will you send logs?

Yes

@MadLittleMods MadLittleMods changed the title Sending two messages at the same and having the first fail still seconds the second Sending two messages at the same and having the first fail still sends the second Jul 19, 2022
@MadLittleMods MadLittleMods added S-Minor Impairs non-critical functionality or suitable workarounds exist A-Timeline O-Uncommon Most users are unlikely to come across this or unexpected workflow labels Jul 19, 2022
@Johennes
Copy link
Contributor

Johennes commented Jan 20, 2023

I'm upgrading this to S-Major. For one thing, this is part of what we've identified as "Chronic Issues". For another, sending messages appears to be critical functionality to me and manually re-queuing them in the correct order doesn't feel like a satisfactory workaround to me (especially since by the time you realize they were send in the wrong order, the damage is already done).

@Johennes Johennes added S-Major Severely degrades major functionality or product features, with no satisfactory workaround and removed S-Minor Impairs non-critical functionality or suitable workarounds exist labels Jan 20, 2023
@ara4n
Copy link
Member

ara4n commented Jan 22, 2023

(i don't think we expose UX to let the user manually re-queue them as a workaround, either? and certainly not reliably)

@ara4n
Copy link
Member

ara4n commented Feb 14, 2023

I just reproduced this again on Element Nightly version: 2023021301 - have rageshaked.

@turt2live
Copy link
Member Author

confirmed: in a room where encryption is slow, I sent a message which happened to be far too large to send as an event. While that message was encrypting, I sent a second message. The first message (the large one) failed for hopefully obvious reasons, but the second was sent anyways.

After the failure (and second message sent), I tried sending a third which was appropriately prevented.

Feels like the bug is in the loop not breaking out when an error happens given the symptoms.

Currently on Element Nightly version: 0.0.1-nightly.2023021401.

su-ex added a commit to SchildiChat/matrix-js-sdk that referenced this issue Feb 28, 2023
* Element-R: implement encryption of outgoing events ([\matrix-org#3122](matrix-org#3122)).
* Poll model - page /relations results ([\matrix-org#3073](matrix-org#3073)). Contributed by @kerryarchibald.
* Poll model - validate end events ([\matrix-org#3072](matrix-org#3072)). Contributed by @kerryarchibald.
* Handle optional last_known_event_id property in m.predecessor ([\matrix-org#3119](matrix-org#3119)). Contributed by @andybalaam.
* Add support for stable identifier for fixed MAC in SAS verification ([\matrix-org#3101](matrix-org#3101)).
* Provide eventId as well as roomId from Room.findPredecessor ([\matrix-org#3095](matrix-org#3095)). Contributed by @andybalaam.
* MSC3946 Dynamic room predecessors ([\matrix-org#3042](matrix-org#3042)). Contributed by @andybalaam.
* Poll model ([\matrix-org#3036](matrix-org#3036)). Contributed by @kerryarchibald.
* Remove video tracks on video mute without renegotiating ([\matrix-org#3091](matrix-org#3091)).
* Introduces a backwards-compatible API change. `MegolmEncrypter#prepareToEncrypt`'s return type has changed from `void` to `() => void`. ([\matrix-org#3035](matrix-org#3035)). Contributed by @clarkf.
* Stop the ICE disconnected timer on call terminate ([\matrix-org#3147](matrix-org#3147)).
* Clear notifications when we can infer read status from receipts ([\matrix-org#3139](matrix-org#3139)). Fixes element-hq/element-web#23991.
* Messages sent out of order after one message fails ([\matrix-org#3131](matrix-org#3131)). Fixes element-hq/element-web#22885 and element-hq/element-web#18942. Contributed by @justjanne.
* Element-R: fix a bug which prevented encryption working after a reload ([\matrix-org#3126](matrix-org#3126)).
* Element-R: Fix invite processing ([\matrix-org#3121](matrix-org#3121)).
* Don't throw with no `opponentDeviceInfo` ([\matrix-org#3107](matrix-org#3107)).
* Remove flaky megolm test ([\matrix-org#3098](matrix-org#3098)). Contributed by @clarkf.
* Fix "verifyLinks" functionality of getRoomUpgradeHistory ([\matrix-org#3089](matrix-org#3089)). Contributed by @andybalaam.
su-ex added a commit to SchildiChat/matrix-js-sdk that referenced this issue Mar 15, 2023
* Add easy way to determine if the decryption failure is due to "DecryptionError: The sender has disabled encrypting to unverified devices." ([\matrix-org#3167](matrix-org#3167)). Contributed by @florianduros.
* Polls: expose end event id on poll model ([\matrix-org#3160](matrix-org#3160)). Contributed by @kerryarchibald.
* Polls: count undecryptable poll relations ([\matrix-org#3163](matrix-org#3163)). Contributed by @kerryarchibald.
* Fix spec compliance issue around encrypted `m.relates_to` ([\matrix-org#3178](matrix-org#3178)).
* Fix reactions in threads sometimes causing stuck notifications ([\matrix-org#3146](matrix-org#3146)). Fixes element-hq/element-web#24000. Contributed by @justjanne.
* Better type guard parseTopicContent ([\matrix-org#3165](matrix-org#3165)). Fixes matrix-org/element-web-rageshakes#20177 and matrix-org/element-web-rageshakes#20178.
* Fix a bug where events in encrypted rooms would sometimes erroneously increment the total unread counter after being processed locally. ([\matrix-org#3130](matrix-org#3130)). Fixes element-hq/element-web#24448. Contributed by @Half-Shot.
* Stop the ICE disconnected timer on call terminate ([\matrix-org#3147](matrix-org#3147)).
* Clear notifications when we can infer read status from receipts ([\matrix-org#3139](matrix-org#3139)). Fixes element-hq/element-web#23991.
* Messages sent out of order after one message fails ([\matrix-org#3131](matrix-org#3131)). Fixes element-hq/element-web#22885 and element-hq/element-web#18942. Contributed by @justjanne.
@justjanne
Copy link
Contributor

@ara4n @turt2live We've increased the logging for this now. If you still experience this issue, please submit rageshakes so we can get more information on how and why this is happening.

@turt2live
Copy link
Member Author

The original steps to reproduce have always triggered it for me. Not sure my logs will be any more revealing than local debugging, tbh.

@Johennes
Copy link
Contributor

@turt2live we're not able to reproduce this, sadly, so we can't debug it locally. If you can easily reproduce it, we'd very much appreciate another rageshake so that we can try and investigate it based on the additional logging.

@Johennes
Copy link
Contributor

Johennes commented Jun 8, 2023

I'm closing this for now since we're unable to action it without further input. Logging has been extended though, as mentioned above. So happy to reopen if anyone manages to reproduce this.

@Johennes Johennes closed this as completed Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Timeline O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect Team: App Z-Chronic
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants