Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a third-party library for a queuing tasks #4622

Closed
NateWr opened this issue Mar 21, 2019 · 42 comments
Closed

Implement a third-party library for a queuing tasks #4622

NateWr opened this issue Mar 21, 2019 · 42 comments
Assignees
Labels
Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day.
Milestone

Comments

@NateWr
Copy link
Contributor

NateWr commented Mar 21, 2019

We could use a way to queue tasks, run them in order, and track their status. This is most needed for the document conversion toolset. But we could also use wherever we have one or more lengthy tasks and want to queue a follow-up when it's done.

Some use cases:

  • Document conversion to JATS XML
  • Compiling stats
  • Large exports
  • Building temporary cache for intensive requests (eg - stats)
  • Crossref and other deposits which require interaction with a third-part API
  • Bulk emails
  • Update search index

Our ScheduledTask approach does some of this already for regularly scheduled tasks. It seems that most libraries separate queuing from scheduling, so we'll have to do some thinking about how to refactor those tasks. "Queuing" means putting a job into a queue and it will be run when all previous jobs are done. "Scheduling" is about when to add a job to the queue.

Our priorities:

  • An expressive syntax that is easy to use, well-documented and quick to learn.
  • A widely-used library that is well maintained.
  • The ability to track the progress of jobs and chain jobs together.
  • The ability to schedule a job to be run at a future date.
  • The ability to add, view and cancel jobs from the API and command line.

I did a quick search and didn't find many options. I may be looking under the wrong terminology. However Laravel Queues v5.5 is compatible with PHP 7.0. It handles job queuing but not scheduling. Here is a demo of how to use it stand-alone. Laravel's scheduling component does not appear to have a stand-alone example, so we may have to roll that part ourselves.

On the plus side, it supports a lot of third-party queue systems (Beanstalk, Amazon SQS, Redis) in addition to our db, so if high-volume users really want to offload some of these tasks they can do so

Related issues: #2550, #4517.

@NateWr NateWr added the Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day. label Mar 21, 2019
@defstat
Copy link
Contributor

defstat commented Nov 16, 2020

I am adding this as a preliminary work on Queues (and some support for Artisan)

https://github.com/defstat/pkp-lib/tree/laravel-queue-integration

@NateWr
Copy link
Contributor Author

NateWr commented Nov 17, 2020

Thanks @defstat! Looks like a lot of tough integration work here. The queuing looks solid and will be a big improvement for us. It seems like the last big challenge here is figuring out how to process jobs in the queue.

If I understand correctly, Laravel's workers will work through the jobs as quickly as possible whenever they detect jobs in the queue. I'm having trouble thinking about how we can do this in a way that works for the existing cron and acron setups.

For those running a cron setup, it should be possible to burn through the queue whenever the cron job is fired. However, our own documentation recommends a cron job to be run daily. For the kind of tasks that we want to queue, we really need the jobs to be executed within a minute or ideally within seconds. Jobs may be delayed when the stack grows large. But we don't want things like emails, XML conversions, etc, to be run once a day. And if a cron job tries to burn through all of the jobs once a day, we're likely to run into the performance problems that the queue is meant to solve.

I guess the solution is to have the cron job initialize Laravel's workers and let them burn through the queue. But I'm still not clear on how this works. If we run the cron job regularly, are we going to end up with a lot of long-running workers that never get shut off? Laravel itself suggests that for the workers to run well, a separate process is needed to monitor the workers and restart them if they stop.

The acron setup seems to have the same difficulties. Either we end up hammering one request with the need to process all jobs, or we use the request to start a worker and end up with lots of workers running.

Maybe all of this is a solved problem. I'm definitely out of my depth on this one. But just trying to figure out how we get there.

cc @asmecher

@diegoabadan
Copy link
Contributor

diegoabadan commented Nov 18, 2020

Wouldn't it be the case for a single frequent cron job and using Scheduling for large tasks that can run infrequently?
https://laravel.com/docs/8.x/scheduling

In this way the frequency settings for these major tasks would leave cron and go to OJS.

@NateWr
Copy link
Contributor Author

NateWr commented Nov 19, 2020

We can recommend people change their cron to be more frequent. However, in practice we are likely to find that a lot of instances are upgraded without making this change.

@diegoabadan
Copy link
Contributor

In this case, something similar to the new version notice may be interesting when OJS realizes that it has spent more time than recommended to run cron.

So OJS itself alerts its users, when there is this forgetfulness in the upgrade or any problem in Cron.

asmecher added a commit to asmecher/pkp-lib that referenced this issue Nov 23, 2020
- Configure global queue manager
- Require minimal Composer packages
- Move __(...) earlier in bootstrap to get ahead of Laravel equivalent
asmecher added a commit to asmecher/pkp-lib that referenced this issue Nov 23, 2020
- Configure global queue manager
- Require minimal Composer packages
- Move __(...) earlier in bootstrap to get ahead of Laravel equivalent
asmecher added a commit to asmecher/pkp-lib that referenced this issue Nov 23, 2020
- Configure global queue manager
- Require minimal Composer packages
- Move __(...) earlier in bootstrap to get ahead of Laravel equivalent
asmecher added a commit to asmecher/pkp-lib that referenced this issue Nov 23, 2020
- Configure global queue manager
- Require minimal Composer packages
- Move __(...) earlier in bootstrap to get ahead of Laravel equivalent
asmecher added a commit to asmecher/pkp-lib that referenced this issue Nov 23, 2020
asmecher added a commit to asmecher/pkp-lib that referenced this issue Nov 24, 2020
asmecher added a commit to asmecher/pkp-lib that referenced this issue Nov 24, 2020
NateWr added a commit to NateWr/pkp-lib that referenced this issue Nov 24, 2020
NateWr added a commit to NateWr/ui-library that referenced this issue Nov 24, 2020
NateWr added a commit to NateWr/pkp-lib that referenced this issue Nov 24, 2020
NateWr added a commit to NateWr/pkp-lib that referenced this issue Nov 24, 2020
NateWr added a commit to NateWr/ojs that referenced this issue Nov 24, 2020
NateWr added a commit to NateWr/ojs that referenced this issue Nov 24, 2020
NateWr added a commit to NateWr/pkp-lib that referenced this issue Nov 24, 2020
NateWr added a commit to NateWr/ui-library that referenced this issue Nov 24, 2020
@NateWr
Copy link
Contributor Author

NateWr commented Mar 8, 2021

While running an audit on a WordPress site, I learned about this library: https://github.com/woocommerce/action-scheduler/. I don't think we'd opt for it in place of Laravel's task queue, but it would be interesting to study it. It looks like it can handle really large queues and if it is running in WooCommerce, that means it is running in lots of the kinds of limited hosting environments that will make processing the queue tricky for us too (when workers can't be run).

@henriqueramos henriqueramos self-assigned this Mar 22, 2021
@henriqueramos
Copy link
Contributor

Have found this post telling us about how to run the Queue on shared hosts, it's pretty similar to our edge case scenario when someone doesn't have access to the crontab.

https://orobogenius.medium.com/laravel-queue-processing-on-shared-hosting-dedd82d0267a

@asmecher
Copy link
Member

@henriqueramos, that's useful for hosts that can't run worker threads, but does still depend on cron access.

@henriqueramos
Copy link
Contributor

It's a WIP yet, but the PR for pkp-lib it's located here https://github.com/pkp/pkp-lib/pull/7049/files

@henriqueramos
Copy link
Contributor

henriqueramos commented May 17, 2021

Hello @NateWr and @asmecher.

PR's created for pkp-lib, ojs and pkp-docs.

Please, take a look on those. Once approved, I will created the versions for omp and ops.

@henriqueramos
Copy link
Contributor

I've implemented a Job to run the Application::getSubmissionSearchIndex processes. Also, have fixed the failed_jobs implementation.

Could you please take a look on the PR? @asmecher and @NateWr.

henriqueramos added a commit to henriqueramos/pkp-lib that referenced this issue Feb 17, 2022
henriqueramos added a commit to henriqueramos/pkp-lib that referenced this issue Feb 17, 2022
henriqueramos added a commit to henriqueramos/pkp-lib that referenced this issue Feb 17, 2022
henriqueramos added a commit to henriqueramos/pkp-lib that referenced this issue Feb 17, 2022
henriqueramos added a commit to henriqueramos/ojs that referenced this issue Feb 17, 2022
@henriqueramos
Copy link
Contributor

@asmecher After few days studing, I got a draft PR for the Laravel scheduling tooling.

OJS
#7710

henriqueramos added a commit to henriqueramos/pkp-lib that referenced this issue Feb 17, 2022
henriqueramos added a commit to henriqueramos/pkp-lib that referenced this issue Feb 17, 2022
henriqueramos added a commit to henriqueramos/ojs that referenced this issue Feb 17, 2022
henriqueramos added a commit to henriqueramos/ojs that referenced this issue Feb 17, 2022
@asmecher
Copy link
Member

Thanks, @henriqueramos, I've had a quick look. A couple of pieces of feedback...

  • This introduces the Commands directory, which has a lot of overlapping purpose with the existing tools directory. Introducing new patterns without larger-scale consideration of existing code leaves a lot of competing patterns and clean-up work for later; you could either
    1. stick with existing patterns for now, or
    2. make a proposal for larger-scale change for consideration as part of this issue
  • The scheduled task toolset needs to be extensible, and right now it's just a hard-coded list of tasks. Consider for example plugins that extend the list, such as the PLN plugin; this needs to be accommodated too.

henriqueramos added a commit to henriqueramos/pkp-lib that referenced this issue Feb 22, 2022
henriqueramos added a commit to henriqueramos/pkp-lib that referenced this issue Feb 22, 2022
henriqueramos added a commit to henriqueramos/pkp-lib that referenced this issue Feb 22, 2022
henriqueramos added a commit to henriqueramos/ojs that referenced this issue Feb 22, 2022
henriqueramos added a commit to henriqueramos/ojs that referenced this issue Feb 22, 2022
henriqueramos added a commit to henriqueramos/ojs that referenced this issue Feb 22, 2022
…runScheduledTasks to use the SchedulerBag from ServiceProvider
@henriqueramos
Copy link
Contributor

@asmecher Currently the tools folder isn't properly loaded by Composer, so we don't have a Fully Qualified Namespace for those commands.

With this migration to the Commands folder, I can call them using the APP\Commands or PKP\Commands fully qualified namespaces.

Also, I've implemented a new Service Provider to be extended, giving us more flexibility to handle new Scheduled Tasks.

@NateWr
Copy link
Contributor Author

NateWr commented Feb 22, 2022

I'll defer to @asmecher on this. But in my opinion it's too late in 3.4's dev cycle to be introducing new architectural patterns. At the very least, a new architectural pattern should be accompanied by a clear strategy to refactor a whole module or subsystem. Even if we can't refactor the whole thing in one version, adopting a new architectural pattern needs to be more than a one-off.

If these patterns are good to use, I'd prefer to see a comprehensive proposal made for how we convert all of our CLI tools to the new pattern. Then we can discuss and schedule this work against 3.5 or another version where we have the time to fully consider and adopt the new pattern. Otherwise, we end up with a codebase full of one-off architectural ideas that were never rolled out across the system. It becomes very hard to understand and maintain them. Each new architectural pattern has its own learning curve, so if we're going to adopt a new one, it needs to replace an old pattern so that we aren't burdening devs with multiple patterns for the same task.

@asmecher
Copy link
Member

@henriqueramos, I agree with @NateWr. Every new pattern needs to be learned and that represents a cognitive load for the dev team; when a new pattern coexists alongside an older pattern, developers need to understand both patterns, the way they relate to each other, and what the plans are to replace the older one. There are a lot of cases like this in the codebase and we need to be very careful about adding more. So it's incumbent on the developer introducing new patterns to do the homework rather than letting it accumulate as technical debt.

In my opinion tying tasks into the tools/ directory and CliTool class doesn't feel like a natural fit. Avoiding that will resolve the problem with tools not having FQCNs.

I'd recommend keeping your scope focused on the problem at hand -- replacing the scheduler toolset with a 3rd-party implementation -- rather than adding rearchitecting to it. You'll find there are enough complications to consider, e.g. the way plugins register tasks, and keeping it focused will be friendlier to the other devs who are working on major forks at the moment.

@henriqueramos
Copy link
Contributor

@asmecher Having a SchedulerServiceProvider will allow us to inject scheduled tasks using a Fluent Interface.

This is the implementation on pkp-lib and this is the implementation on OJS.

And in a plugin, on the plugin's register method, we could do something like:

$app = PKPContainer::getInstance();
$app->register(new PluginServiceRegister($this));

Also we could remove the legacy ScheduledTask stuff (like XML Parsing) from the entire codebase.

@jonasraoni
Copy link
Contributor

jonasraoni commented Jun 18, 2022

Out of curiosity, what's missing to close this issue?

By skimming the code, I'd say it's missing:

  • Sunsetting the scheduledTasks.xml and its related code
  • Deprecating the Acron plugin in a way that keeps backwards compatibility: probably just needs to support the hook AcronPlugin::parseCronTab and ensure entries in the scheduled_tasks table are migrated

@NateWr
Copy link
Contributor Author

NateWr commented Jun 20, 2022

Yeah this should probably be closed in favor or smaller issues for the remaining tasks.

  • Allow queued jobs to be processed by a cron job #7105 Allow queued jobs to be processed by cron job
  • [no issue] Allow jobs to be processed by workers
  • [no issue] Support progress tracking for a group of jobs (ie - if a bulk email creates 100 jobs, show "25/100 jobs complete")

@NateWr
Copy link
Contributor Author

NateWr commented Jun 20, 2022

Anyone willing to file the other issues?

@NateWr
Copy link
Contributor Author

NateWr commented Jun 20, 2022

It looks like #7171 added a UI to view jobs, but it hasn't yet captured the use case described here: #7171 (comment)

@NateWr
Copy link
Contributor Author

NateWr commented Jun 28, 2022

Ok, I've created a couple of new issues:

I decided not to create an issue for progress tracking, because we don't yet have a need for it. I'm going to close this issue as done and we can continue working on the other issues.

@NateWr NateWr closed this as completed Jun 28, 2022
@asmecher asmecher changed the title Consider implementing a third-party library for a queuing tasks Implement a third-party library for a queuing tasks Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day.
Projects
Status: No status
Development

No branches or pull requests

6 participants