Skip to content

http - long-poll - EventQueueGet capability should not perform exponential backoff on 'failed' operations #4685

@monty-linden

Description

@monty-linden

Environment

7.2.1.17108480561/win64

Description

Discovered while reviewing disconnect logs from Strawberry:

https://lindenlab.slack.com/archives/C02R43FJP/p1757949291763609?thread_ts=1757688772.468139&cid=C02R43FJP

The viewer's EventQueueGet endpoint is currently using the request retry logic in llcorehttp to resent operations after receiving 499, 502, and other errors. This is a good practice for normal requests but not for these particular long-poll operations. Timeout errors are expected but the retry logic is inserting 16s or more delay between HTTP requests. This artificially delays the intake and processing of events sent on the event queue.

Simply disabling the retry feature on this class of traffic may be sufficient to improve the situation. Some detection and accommodation of runaway failures from a bad peer might be useful.

Reproduction steps

Review the log interpretation section of the slack discussion.


This repo is using Opire - what does it mean? 👇
💵 Everyone can add rewards for this issue commenting /reward 100 (replace 100 with the amount).
🕵️‍♂️ If someone starts working on this issue to earn the rewards, they can comment /try to let everyone know!
🙌 And when they open the PR, they can comment /claim #4685 either in the PR description or in a PR's comment.

🪙 Also, everyone can tip any user commenting /tip 20 @monty-linden (replace 20 with the amount, and @monty-linden with the user to tip).

📖 If you want to learn more, check out our documentation.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingstale

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions