Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Implements a task scheduler for resumable potentially long running tasks #15891

Merged
merged 38 commits into from Aug 21, 2023

Conversation

MatMaul
Copy link
Contributor

@MatMaul MatMaul commented Jul 7, 2023

The main purpose is to have resumable tasks, with state stored in the DB and the tasks being resumed after a synapse restart.

This is intended to run potentially long running tasks, and be able to schedule them roughly in the future.
This is NOT intended to precisely schedule tasks. The loop is currently running every minute so this is the max granularity we can have, and tasks can be postpone if the running queue is already too big.

This is kind of a spinoff of #15488 following the reviews, which currently implements something similar in a non generic way. I plan to rebase it on top of this framework.

Pull Request Checklist

@MatMaul MatMaul force-pushed the mv/task-scheduler branch 2 times, most recently from 1c980d4 to 7c49b15 Compare July 7, 2023 16:05
@MatMaul MatMaul force-pushed the mv/task-scheduler branch 11 times, most recently from 7a2a044 to 1d29dcd Compare July 10, 2023 13:29
@MatMaul MatMaul added the Z-Time-Tracked Element employees should track their time spent on this issue/PR. label Jul 10, 2023
@MatMaul MatMaul force-pushed the mv/task-scheduler branch 2 times, most recently from d37e2ec to 527d442 Compare July 11, 2023 13:46
@MatMaul MatMaul marked this pull request as ready for review July 11, 2023 13:46
@MatMaul MatMaul requested a review from a team as a code owner July 11, 2023 13:46
@MatMaul MatMaul changed the title Implements a task scheduler Implements a task scheduler for resumable potentially long running tasks Jul 11, 2023
@MatMaul MatMaul marked this pull request as draft July 11, 2023 14:18
@MatMaul MatMaul marked this pull request as ready for review July 11, 2023 14:55
@clokep
Copy link
Contributor

clokep commented Jul 12, 2023

Out of curiosity -- is there a hope / plan that it might be useful for other tasks in the future? Should we be moving other things to this?

clokep
clokep previously requested changes Jul 14, 2023
@@ -0,0 +1 @@
Implements a task scheduler for resumable potentially long running tasks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be a feature -- it is something internally used by Synapse; admins and end-users don't care.

We could probably just combine this with the other PR?

synapse/types/__init__.py Show resolved Hide resolved
synapse/storage/databases/main/task_scheduler.py Outdated Show resolved Hide resolved
synapse/storage/databases/main/task_scheduler.py Outdated Show resolved Hide resolved
synapse/util/task_scheduler.py Outdated Show resolved Hide resolved
synapse/util/task_scheduler.py Outdated Show resolved Hide resolved
synapse/util/task_scheduler.py Show resolved Hide resolved
synapse/util/task_scheduler.py Outdated Show resolved Hide resolved
synapse/util/task_scheduler.py Show resolved Hide resolved
@clokep
Copy link
Contributor

clokep commented Jul 14, 2023

I'd find it very helpful to see how this would be used in the other PR!

@MatMaul
Copy link
Contributor Author

MatMaul commented Jul 17, 2023

Out of curiosity -- is there a hope / plan that it might be useful for other tasks in the future? Should we be moving other things to this?

Background DB updates could (should ?) probably be port to this later on. I haven't checked the details of that however.

I'd find it very helpful to see how this would be used in the other PR!

I am working on this right now since it is also a nice way to validate this works and is enough at least for this use case. This PR will probably get a follow-up update following me trying a real use case.

@clokep
Copy link
Contributor

clokep commented Aug 8, 2023

I'm putting this back in the overall review queue as it is a hefty chunk of code that I think could use a secondary review.

Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of this seems relatively sane, but is missing a big docstring somewhere about how it all works? It might also help if you could point to a usage of this to see how it is to use?

params TEXT,
result TEXT,
error TEXT
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add some indices for the various fields here? Or is the assumption that this is going to be low volume?

Copy link
Contributor Author

@MatMaul MatMaul Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be low volume. But then an indice doesn't cost much in space so perhaps we should still do it ?
I am going to add one at least on status, since this one is used in the reconcile loop and run every 5mn.

resource_id and action would help for get_tasks but I feel they are perhaps less important.

@MatMaul
Copy link
Contributor Author

MatMaul commented Aug 11, 2023

I think most of this seems relatively sane, but is missing a big docstring somewhere about how it all works?

Big docstring is for today ;)

It might also help if you could point to a usage of this to see how it is to use?

My purge PR has been updated a while ago to use it.

@MatMaul
Copy link
Contributor Author

MatMaul commented Aug 14, 2023

The non-resolved comments are waiting for ACK/feedback, I think we should be really close to merging :)

Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works, just have a query around worker support and a suggestion for improving the DB API a bit

synapse/util/task_scheduler.py Outdated Show resolved Hide resolved
synapse/storage/databases/main/task_scheduler.py Outdated Show resolved Hide resolved
synapse/storage/databases/main/task_scheduler.py Outdated Show resolved Hide resolved
synapse/util/task_scheduler.py Show resolved Hide resolved
synapse/util/task_scheduler.py Show resolved Hide resolved
synapse/storage/databases/main/task_scheduler.py Outdated Show resolved Hide resolved
Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woo!

synapse/storage/databases/main/task_scheduler.py Outdated Show resolved Hide resolved
@MatMaul MatMaul dismissed clokep’s stale review August 21, 2023 12:11

asked by the author :)

@MatMaul MatMaul merged commit 358896e into develop Aug 21, 2023
37 checks passed
@MatMaul MatMaul deleted the mv/task-scheduler branch August 21, 2023 12:17
erikjohnston added a commit that referenced this pull request Sep 1, 2023
I don't think has caused any actual issues.

Introduced in #15891
erikjohnston added a commit that referenced this pull request Sep 1, 2023
I don't think has caused any actual issues.

Introduced in #15891
DMRobertson pushed a commit that referenced this pull request Sep 1, 2023
I don't think has caused any actual issues.

Introduced in #15891
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Z-Time-Tracked Element employees should track their time spent on this issue/PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants