[BUG] - Singleton jobs can fail to get rescheduled in LimitModeReschedule on any overrun due to a channel error. #683

kj87au · 2024-03-04T06:10:18Z

Describe the bug

Singleton jobs can fail to get rescheduled due to a channel error causing the executioner to never rerun the job and raise no error.

There is a race condition in the executioner which causes a singleton job to not get re-scheduled properly.

The error is in the executioner pushing a job to the jobIDsOut channel (executor.go line 180)

There seems to be an issue where there is something already in the channel, therefore it is skipping placing the job in the channel via the "default" case.

Removing the default case fixes the issue.

To Reproduce

Steps to reproduce the behavior:

Run a overlapping gocron job in singleton format with LimitModeReschedule. Example:

package main

import (
    "runtime"
    "time"
 
    gocron "github.com/go-co-op/gocron/v2"
    "go.uber.org/zap"
)

const sleepTime = 30 * time.Second

func main() {
    zlog := zap.NewExample().Sugar()

    zlog.Info("Starting")

    s, _ := gocron.NewScheduler(
        gocron.WithLocation(time.UTC),
        gocron.WithLogger(gocron.NewLogger(gocron.LogLevelDebug)),
    )

    defer func() { _ = s.Shutdown() }()

    j, err := s.NewJob(
        gocron.CronJob(
            "*/1 * * * * *",
            true,
        ),
        gocron.NewTask(
            taskToRun,
            zlog,
        ),
        gocron.WithName("Job1"),
        gocron.WithSingletonMode(gocron.LimitModeReschedule),
    )

    if err != nil {
        zlog.Errorw("Error creating job", "error", err)
    }

 

    s.Start()
    // Do something else
    var count = 0
    for {
        time.Sleep(1 * time.Second)
        nextRun, err := j.NextRun()

        if err != nil {
            zlog.Errorw("Error getting next run", "error", err)
        } else {
            // If we overrun the next run, we will raise a Fatal error
            if time.Now().After(nextRun.Add(30 * time.Second)) {
                zlog.Fatal("Overrun")
            }
        }

        count += 1
        zlog.Infow("Main Loop",
            "time", time.Now(),
            "jobs", len(s.Jobs()),
            "nextRun", nextRun,
            "count", count,
            "goroutines", runtime.NumGoroutine(),
        )
    }
}

 

func taskToRun(log *zap.SugaredLogger) {
    log.Infow(
        "Job Running",
        "time", time.Now(),
        "sleepTime", sleepTime,
    )
    time.Sleep(sleepTime)
}

Run this with the latest, fatal should be raised.
Comment out executor.go, line 181.
Rerun main, no problem

Version

Latest Version (v2.2.4)

Expected behaviour

Job should be re-scheduled

Additional context

cpj555 · 2024-03-05T06:21:03Z

It should be the same problem, my task will exceed the execution time, and after a while, it will not be executed

JohnRoesler · 2024-03-05T16:56:47Z

Would you be willing to validate the release candidate solves the issue? v2.2.5-rc1

kj87au · 2024-03-06T00:03:16Z

@JohnRoesler Release candidate v2.2.5-rc1 solves the issue, great work.

JohnRoesler · 2024-03-06T15:11:49Z

Thank you for confirming!!

kj87au added the bug Something isn't working label Mar 4, 2024

JohnRoesler mentioned this issue Mar 5, 2024

fix cases where default on send out is resulting in job not going out #686

Merged

2 tasks

JohnRoesler closed this as completed in #686 Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] - Singleton jobs can fail to get rescheduled in LimitModeReschedule on any overrun due to a channel error. #683

[BUG] - Singleton jobs can fail to get rescheduled in LimitModeReschedule on any overrun due to a channel error. #683

kj87au commented Mar 4, 2024

cpj555 commented Mar 5, 2024

JohnRoesler commented Mar 5, 2024

kj87au commented Mar 6, 2024

JohnRoesler commented Mar 6, 2024

[BUG] - Singleton jobs can fail to get rescheduled in LimitModeReschedule on any overrun due to a channel error. #683

[BUG] - Singleton jobs can fail to get rescheduled in LimitModeReschedule on any overrun due to a channel error. #683

Comments

kj87au commented Mar 4, 2024

Describe the bug

To Reproduce

Version

Expected behaviour

Additional context

cpj555 commented Mar 5, 2024

JohnRoesler commented Mar 5, 2024

kj87au commented Mar 6, 2024

JohnRoesler commented Mar 6, 2024