Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cron-style, recurring tasks #104

Closed
rosa opened this issue Dec 30, 2023 · 14 comments · Fixed by #155
Closed

Implement cron-style, recurring tasks #104

rosa opened this issue Dec 30, 2023 · 14 comments · Fixed by #155
Assignees

Comments

@rosa
Copy link
Member

rosa commented Dec 30, 2023

A dispatcher would handle these, and the configuration will be included in solid_queue.yml.

@rosa rosa self-assigned this Dec 30, 2023
@rmontgomery429
Copy link

rmontgomery429 commented Jan 21, 2024

I am excited about this project and would love to contribute. Before I get too far down the wrong path, I'd like to clarify a few aspects to ensure alignment with the project's vision.

Proposed Format for Cron-like Tasks:

I suggest the following YAML structure for defining recurring tasks:

recurring_tasks:
  - task: "DailyReport.generate"
    schedule: "every day at noon"
  - task: "DatabaseCleanup.run"
    schedule: "every 10 minutes"
  - task: "SendBroadcast.start"
    schedule: "0 4 * * 0"

This format aims to provide flexibility in scheduling any arbitrary class method. It supports both a human-readable syntax and traditional cron syntax for broader flexibility.

Questions for Clarification:

Dependency on Fugit: Is the project open to depending on external gems like Fugit for parsing the cron syntax? Or is there a preference for a simpler, in-house solution to avoid additional dependencies?

Task Execution Mechanism: Are there any specific considerations or constraints regarding how these tasks should be executed within the project's existing architecture? It seems like spinning the idea would be to spin up a dispatcher for each scheduled task but maybe you have already thought through this and another suggestion.

Your guidance on these points will help ensure that my contributions align well with the project's roadmap and existing design principles.

Looking forward to your feedback and excited to contribute to this feature.

@ianterrell
Copy link

ianterrell commented Jan 21, 2024

Just 2¢ as a user of GoodJob looking to migrate to SolidQueue when the time comes, I really like GoodJob's cron-style support: https://github.com/bensheldon/good_job#cron-style-repeatingrecurring-jobs

I most like:

  • it only schedules jobs, not arbitrary code (e.g. class: "DailyReportJob" not task: "DailyReport.run")
  • it allows setting args/kwargs for the jobs (due to above)
  • setting priority

I mainly like that it feels like a lower mental overhead in that only jobs get queued, whether now or on a schedule.

But there are arguably some objective measurements around that, like I can customize my retry_on and concurrency controls for scheduled jobs using the same mechanisms as any other job, etc, since they're all plain jobs.

@rmontgomery429
Copy link

rmontgomery429 commented Jan 23, 2024

@ianterrell Really appreciate you sharing your views on GoodJob's cron-style support. It's awesome to get insights from someone familiar with GoodJob looking at the transition to SolidQueue. I've used GoodJob in the past but never for cron tasks.

My Thoughts:

Testability: I see the appeal in GoodJob's approach of only scheduling jobs. My idea of using class methods like DailyReport.generate would add a layer where we can easily see and test what's happening.

class DailyReport
  def self.generate
    # ...
    DailyReportJob.perform_later(Time.now, answer: 42)
  end
end

This keeps the YAML config straightforward while packing the logic neatly in the method.

Simplicity: The YAML stays simple and intuitive:

recurring_tasks:
  - task: "DailyReport.generate"
    schedule: "every day at noon"

We maintain a clean setup, but with the power of a being able to put any complex task calling logic in the class method backing it.

This also allows calling jobs with no arguments and doesn't require the user to add a class method if none is required.

recurring_tasks:
  - task: "DailyReport.perform_later"
    schedule: "every day at noon"

Safety with Input Validation: Your point about calling "arbitrary" code is a valid one. We could validate the input to ensure it's just a class name and a method. This would prevent weird errors or the risk of evaluating arbitrary code.

While I totally get the straightforwardness of the GoodJob setup and putting all the configuration for scheduled jobs in one place, I think the class method approach could give us an just as much if not more flexibility and simplicity while making it possible to ensure those aspects of the job logic remain testable.

@rmontgomery429
Copy link

rmontgomery429 commented Jan 23, 2024

I've been giving this more thought and realized there's another snag with the class method approach. Imagine if someone calls a method that unexpectedly bogs down the system or even grinds it to a halt. Given this, it seems like it's important that the scheduler-dispatcher only backgrounds jobs. Calling a class method wouldn't allow it to know if it's safe to call so...

Thinking of a Fix:

Maybe the key is to simplify things. How about we put some constraints in place? Like, the scheduler could always fire off a job with no arguments, as in DailyReport.perform_later. This way, we're only queuing jobs at the scheduled times without the complexities of additional arguments or the risk of lengthy class method executions.

recurring_tasks:
  - job: "DailyReport"
    schedule: "every day at noon"

In this scenario, the scheduler automatically calls DailyReport.perform_later, keeping it neat and safe.

But Here's the Catch:

This solution definitely makes things safer and simpler. But I'm wondering, are we straying from the original goal of this feature? Is this sacrificing too much flexibility for the sake of simplicity and safety?

@ianterrell
Copy link

ianterrell commented Jan 23, 2024

Thanks for your careful thought on this. A few notes that come to mind.

Testability ... My idea of using class methods like DailyReport.generate would add a layer where we can easily see and test what's happening.

I am probably misunderstanding what you mean, but I think job testability is excellent in Rails already and I'm not sure what adding a layer on top would bring to that.

the scheduler could always fire off a job with no arguments

The use case I have for arguments is in re-use of jobs. Consider:

recurring_tasks:
  - job: ReportEmailer
    kwargs:
      report: daily-usage
      to: me@company.com
    schedule: "every day at 8am"
  - job: ReportEmailer
    kwargs:
      report: weekly-usage
      to: 
        - me@company.com
        - myboss@company.com
    schedule: "every Monday at 8am"

We do this both within applications for some jobs that are very similar and just need parameterized across schedulings, as well as with library code that is written and just needs slightly customized per application.

The args and kwargs are all purely optional. There's no additional mental overhead if you write jobs that do not need parameterized. Personally I think this hits safe, simple, and flexible.

Thanks again for considering the issue!

@rmontgomery429
Copy link

rmontgomery429 commented Jan 23, 2024

@ianterrell Thanks for your feedback and for bringing up some really valid points. Let me address them:

On Testability:

You're right that job testability in Rails is already top-notch. My thought with adding a class method layer, like DailyReport.generate, was more about centralizing and possibly adding custom logic before the job gets queued. It’s less about testing the job itself and more about testing the custom logic that decides when and how a job is queued. But I see your point — if we're mostly queuing straightforward jobs, this extra layer might be overkill.

Regarding Arguments and Keyword Arguments:

Your use case for reusing jobs with different parameters makes a lot of sense. Being able to specify arguments or kwargs in the config does add a lot of flexibility, allowing the same job to be tailored for different scenarios. That's a nice feature and it would be really handy to see the jobs like that in one place.

Striking a Balance:

I think my initial reaction was to the cron configuration example in the GoodJob docs. It struck me as maybe a little too feature rich. Seeing procs and lambdas alongside both and args and kwargs in a yaml file struck me as a bit of a foot-gun.

Perhaps we can have a system that defaults to queuing jobs without arguments for straightforward tasks, while still offering the ability to specify args/kwargs for more complex scheduling needs.

recurring_tasks:
  - job: SimpleJob
    schedule: every day at 9am
  - job: ReportEmailerJob
    kwargs:
      report: daily-usage
      to: me@company.com
    schedule: every day at 8am
  - job: ReportEmailerJob
    kwargs:
      report: weekly-usage
      to: 
        - me@company.com
        - myboss@company.com
    schedule: every Monday at 8am

@ianterrell
Copy link

Perhaps we can have a system that defaults to queuing jobs without arguments for straightforward tasks, while still offering the ability to specify args/kwargs for more complex scheduling needs.

I like this, yes. This is high priority for me.

(I think that's what GoodJob is aiming for for the average case. The args/kwargs are totally optional and the doc shows an example omitting them with another_task. )

The set key is likely useful also, mirroring the method on jobs. This is lower priority to me, but I think is useful as a robust solution integrating with ActiveJob.

description I could live without, but if a web interface is added could be handy. This is lowest priority for me.

I don't personally have a use case for the lambda-implemented cron schedule they show in their complex_task example in the doc. This isn't a priority to me at all.

@Caleb-T-Owens
Copy link

Has there been any progress on this? I was thinking I might at least try and spike the problem on the weekend if not

@rmontgomery429
Copy link

rmontgomery429 commented Feb 3, 2024

Has there been any progress on this? I was thinking I might at least try and spike the problem on the weekend if not

Not from me. I had spiked on a concept but got busy with work and that hasn't let up. I also think there is something to the suggestion that the work on unique jobs may be required first for this to work properly.

@Caleb-T-Owens
Copy link

I also think there is something to the suggestion that the work on unique jobs may be required for this to work properly.

Interesting. In most setups I've seen where there has been CRON scheduling jobs, there has been just one instance doing that scheduling so there has been no need for uniqueness. It might be a nice additional assurance, but shouldn't be critical for introducing this feature?

@Caleb-T-Owens
Copy link

Caleb-T-Owens commented Feb 5, 2024

I've made an initial implementation (not code tested or ready for review) of this which is very similar to how GoodJob handles it https://github.com/calebowens/solid_queue/tree/add-cron-tasks. There are some potential flaws with how this works in terms of cases when a thread may unexpectedly die, so I'm going to move to a polling-based system which will be more reliable and fit nicer into the run loop paradigm.

@rosa
Copy link
Member Author

rosa commented Feb 6, 2024

Hey all! Thanks a lot for all the input here. I'm on-call this week and I've been working on other stuff the previous couple of weeks, so haven't had time to make any progress on this one, but I plan to come back to this next week. I'll review your approach then, @Caleb-T-Owens 🙏

About uniqueness, we want to allow people to run more than one instance of the dispatcher/scheduler process that handles this, so there must be a way to guarantee that either there's a single process in charge of it, or that the jobs are themselves unique.

@rosa
Copy link
Member Author

rosa commented Feb 15, 2024

Hey @Caleb-T-Owens, thanks so much for putting that together, I've reviewed it, and it's pretty much the general idea I had about this. It's quite similar to my own PoC that I made back in December, I put it on pause to focus on publishing the rest and it's now quite outdated.

I think it could make sense to have the dispatcher take care of dispatching the recurring tasks 🤔 Since it's possible to configure multiple dispatchers with different settings, perhaps one of them could have the recurring tasks if someone doesn't want the same dispatcher to do everything.

I think I would also keep the task definition a bit simpler, just job_class, arguments and schedule. I think we don't need to distinguish between args and kwargs, but I might be missing something here.

Another thing that I still haven't decided how to do but I'd like to have is a way to track which jobs correspond to a given cron entry/recurring task, for debugging purposes mostly, and to show it in Mission Control 🤔

@rosa
Copy link
Member Author

rosa commented Feb 20, 2024

Finally rescued my last attempt and completed it on #155. Going to test this in production for a bit before merging.

@rosa rosa closed this as completed in #155 Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants