Conversation
| For tasks that are designed to run immediately, but could be deferred to a task worker, always choose | ||
| to defer to the worker. This is helpful for systems with slow databases that could cause the immediate | ||
| task to timeout after its 5 second limit. | ||
|
|
There was a problem hiding this comment.
| For tasks that are designed to run immediately, but could be deferred to a task worker, always choose | |
| to defer to the worker. This is helpful for systems with slow databases that could cause the immediate | |
| task to timeout after its 5 second limit. | |
| For tasks that are designed to run immediately, but could be deferred to a task worker, always choose to defer to the worker. | |
| This is helpful for systems with slow databases that could cause the immediate task to timeout after its 5 second limit. | |
Should we add that it can severely impact performace?
Also this reminds me that we would not need to apply that limit could we run the tasks in a proper async fashion alongside the (sadly not yet) async application.
For the time being I agree that this is the best option we can give admins. +1 to backport.
pedro-psb
left a comment
There was a problem hiding this comment.
I guess you've asked about this before, maybe it's already sorted out, but I believe it will also timeout on the worker. Could you have a look into that?
Yes an immediate task that is deferred to the worker would also be constrained to the 5 second timeout. The tasks in the JIRA were ones that ran on the worker. They do a bunch of distribution updates at once, so most of them get deferred to the worker since they share the same global distribution lock. That is why the new setting overwrites the task's immediate field if possible so that when deferred it doesn't get ran as an immediate task. |
97eed15 to
8d36674
Compare
8d36674 to
122dbc6
Compare
Backport to 3.85: 💔 cherry-picking failed — conflicts found❌ Failed to cleanly apply 71c6e2c on top of patchback/backports/3.85/71c6e2c0cb5d79db06e1e4cfdc634d9c15246afa/pr-7587 Backporting merged PR #7587 into main
🤖 @patchback |
Backport to 3.105: 💔 cherry-picking failed — conflicts found❌ Failed to cleanly apply 71c6e2c on top of patchback/backports/3.105/71c6e2c0cb5d79db06e1e4cfdc634d9c15246afa/pr-7587 Backporting merged PR #7587 into main
🤖 @patchback |
Backport to 3.108: 💚 backport PR created✅ Backport PR branch: Backported as #7596 🤖 @patchback |
We can backport new settings right?
I looked at the JIRA and in this case there really isn't any expensive operation going on for RPM or Container distribution updates. There's two synchronous operations: 1. the database update call, 2. the cache invalidation hook. These should both be fairly fast and happen under a second, so not sure how satellite is hitting the 5 second timeout. The only conclusion I can draw is that their third party systems are really slow.
I really don't think we should update the RPM & Container tasks to be deferred since Services never runs into this problem and it would significantly slow down their system if we did so with the thousands of updates they handle daily. I also didn't think we should allow the immediate timeout to be configurable with a setting since the user would be unaware that it would affect the task worker's heartbeat if increased too large and could end up with more cancelled tasks. Open to suggestions, but I think this setting is the simplest solution that can satisfy everyone's needs.
📜 Checklist
See: Pull Request Walkthrough