Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Singleton jobs can fail to get rescheduled in LimitModeReschedule on any overrun due to a channel error. #683

Closed
kj87au opened this issue Mar 4, 2024 · 4 comments · Fixed by #686
Labels
bug Something isn't working

Comments

@kj87au
Copy link

kj87au commented Mar 4, 2024

Describe the bug

Singleton jobs can fail to get rescheduled due to a channel error causing the executioner to never rerun the job and raise no error.

There is a race condition in the executioner which causes a singleton job to not get re-scheduled properly.

The error is in the executioner pushing a job to the jobIDsOut channel (executor.go line 180)

image

There seems to be an issue where there is something already in the channel, therefore it is skipping placing the job in the channel via the "default" case.

Removing the default case fixes the issue.

To Reproduce

Steps to reproduce the behavior:

  1. Run a overlapping gocron job in singleton format with LimitModeReschedule. Example:
package main

import (
    "runtime"
    "time"
 
    gocron "github.com/go-co-op/gocron/v2"
    "go.uber.org/zap"
)

const sleepTime = 30 * time.Second

func main() {
    zlog := zap.NewExample().Sugar()

    zlog.Info("Starting")

    s, _ := gocron.NewScheduler(
        gocron.WithLocation(time.UTC),
        gocron.WithLogger(gocron.NewLogger(gocron.LogLevelDebug)),
    )

    defer func() { _ = s.Shutdown() }()

    j, err := s.NewJob(
        gocron.CronJob(
            "*/1 * * * * *",
            true,
        ),
        gocron.NewTask(
            taskToRun,
            zlog,
        ),
        gocron.WithName("Job1"),
        gocron.WithSingletonMode(gocron.LimitModeReschedule),
    )

    if err != nil {
        zlog.Errorw("Error creating job", "error", err)
    }

 

    s.Start()
    // Do something else
    var count = 0
    for {
        time.Sleep(1 * time.Second)
        nextRun, err := j.NextRun()

        if err != nil {
            zlog.Errorw("Error getting next run", "error", err)
        } else {
            // If we overrun the next run, we will raise a Fatal error
            if time.Now().After(nextRun.Add(30 * time.Second)) {
                zlog.Fatal("Overrun")
            }
        }

        count += 1
        zlog.Infow("Main Loop",
            "time", time.Now(),
            "jobs", len(s.Jobs()),
            "nextRun", nextRun,
            "count", count,
            "goroutines", runtime.NumGoroutine(),
        )
    }
}

 

func taskToRun(log *zap.SugaredLogger) {
    log.Infow(
        "Job Running",
        "time", time.Now(),
        "sleepTime", sleepTime,
    )
    time.Sleep(sleepTime)
}
  1. Run this with the latest, fatal should be raised.
  2. Comment out executor.go, line 181.
  3. Rerun main, no problem

Version

Latest Version (v2.2.4)

Expected behaviour

Job should be re-scheduled

Additional context

@kj87au kj87au added the bug Something isn't working label Mar 4, 2024
@cpj555
Copy link

cpj555 commented Mar 5, 2024

It should be the same problem, my task will exceed the execution time, and after a while, it will not be executed

@JohnRoesler
Copy link
Contributor

Would you be willing to validate the release candidate solves the issue? v2.2.5-rc1

@kj87au
Copy link
Author

kj87au commented Mar 6, 2024

@JohnRoesler Release candidate v2.2.5-rc1 solves the issue, great work.

@JohnRoesler
Copy link
Contributor

Thank you for confirming!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants