Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Panic when scheduler is about to reschedule a daily job #696

Closed
mhyllander opened this issue Mar 19, 2024 · 7 comments
Closed

[BUG] - Panic when scheduler is about to reschedule a daily job #696

mhyllander opened this issue Mar 19, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@mhyllander
Copy link

Describe the bug

Two times now a job that runs once daily has caused a panic. When this happens it seems that the job has not been executed. It should update a database and print some logs, but this does not happen. However, the panic occurs on a line that indicates the job has returned from the executor and is about to be rescheduled.

To Reproduce

I don't have a reliable way of reproducing the problem. Most of the time the job runs as expected, but two times a panic has occurred in the gocron code. The panics have occurred two times with 9 days in between, so they happen only sporadically.

Basically, a job is scheduled by this function:

func scheduleTimerJob(jobName, jobsetGen string, power uint, t codec.Time) (gocron.Job, error) {
	j, err := scheduler.NewJob(
		gocron.DailyJob(1, gocron.NewAtTimes(gocron.NewAtTime(t.Hour(), t.Minute(), 0))),
		gocron.NewTask(
			RunTimerJob,
			power,
			jobName,
		),
		gocron.WithName(jobName),
		gocron.WithTags(timerJobCategory, jobsetGen),
	)
	if err != nil {
		return nil, err
	}
	return j, nil
}

The scheduler is started like this:

	// create a scheduler
	scheduler, err = gocron.NewScheduler(
		gocron.WithLogger(slog.Default()),
		gocron.WithLimitConcurrentJobs(1, gocron.LimitModeWait),
	)
	if err != nil {
		return err
	}

	// start the scheduler
	scheduler.Start()

Version

2.2.5

I haven't tried the new 2.2.6, but it can take some time to see if there's any improvement.

Expected behavior

No panic.

Additional context

Stack traces:

Mar 09 23:00:00 piir paninv_controller[3149]: panic: runtime error: invalid memory address or nil pointer dereference
Mar 09 23:00:00 piir paninv_controller[3149]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x6af984]
Mar 09 23:00:00 piir paninv_controller[3149]: goroutine 23 [running]:
Mar 09 23:00:00 piir paninv_controller[3149]: github.com/go-co-op/gocron/v2.(*scheduler).selectExecJobIDsOut(0x400032c000, {0xf5, 0x78, 0xd9, 0x4d, 0x5f, 0xbf, 0x42, 0xe2, 0x99, ...})
Mar 09 23:00:00 piir paninv_controller[3149]:         /home/mhy/go/pkg/mod/github.com/go-co-op/gocron/v2@v2.2.5/scheduler.go:311 +0x154
Mar 09 23:00:00 piir paninv_controller[3149]: github.com/go-co-op/gocron/v2.NewScheduler.func1()
Mar 09 23:00:00 piir paninv_controller[3149]:         /home/mhy/go/pkg/mod/github.com/go-co-op/gocron/v2@v2.2.5/scheduler.go:151 +0x250
Mar 09 23:00:00 piir paninv_controller[3149]: created by github.com/go-co-op/gocron/v2.NewScheduler in goroutine 1
Mar 09 23:00:00 piir paninv_controller[3149]:         /home/mhy/go/pkg/mod/github.com/go-co-op/gocron/v2@v2.2.5/scheduler.go:146 +0x48c
Mar 18 22:00:00 piir paninv_controller[11965]: panic: runtime error: invalid memory address or nil pointer dereference
Mar 18 22:00:00 piir paninv_controller[11965]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x6afcb4]
Mar 18 22:00:00 piir paninv_controller[11965]: goroutine 23 [running]:
Mar 18 22:00:00 piir paninv_controller[11965]: github.com/go-co-op/gocron/v2.(*scheduler).selectExecJobIDsOut(0x40001c0160, {0xbe, 0xe4, 0x35, 0xb3, 0x4e, 0x60, 0x48, 0x89, 0xa4, ...})
Mar 18 22:00:00 piir paninv_controller[11965]:         /home/mhy/go/pkg/mod/github.com/go-co-op/gocron/v2@v2.2.5/scheduler.go:311 +0x154
Mar 18 22:00:00 piir paninv_controller[11965]: github.com/go-co-op/gocron/v2.NewScheduler.func1()
Mar 18 22:00:00 piir paninv_controller[11965]:         /home/mhy/go/pkg/mod/github.com/go-co-op/gocron/v2@v2.2.5/scheduler.go:151 +0x250
Mar 18 22:00:00 piir paninv_controller[11965]: created by github.com/go-co-op/gocron/v2.NewScheduler in goroutine 1
Mar 18 22:00:00 piir paninv_controller[11965]:         /home/mhy/go/pkg/mod/github.com/go-co-op/gocron/v2@v2.2.5/scheduler.go:146 +0x48c
@mhyllander mhyllander added the bug Something isn't working label Mar 19, 2024
@chih-ming-lin
Copy link

chih-ming-lin commented Mar 22, 2024

I have the same problem in 2.2.6.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xc6ecae]

goroutine 814426 [running]:
github.com/go-co-op/gocron/v2.(*scheduler).selectExecJobIDsOut(0xc00f528580, {0xb, 0x2f, 0x84, 0x8, 0xf6, 0xdb, 0x42, 0x9d, 0x82, ...})
	github.com/go-co-op/gocron/v2@v2.2.6/scheduler.go:311 +0x16e
github.com/go-co-op/gocron/v2.NewScheduler.func1()
	github.com/go-co-op/gocron/v2@v2.2.6/scheduler.go:151 +0x3a5
created by github.com/go-co-op/gocron/v2.NewScheduler in goroutine 814419
	github.com/go-co-op/gocron/v2@v2.2.6/scheduler.go:146 +0x598

I wrote a case to reproduce it.

func TestScheduler_RemoveJobs(t *testing.T) {
	s := newTestScheduler(t)
	s.Start()

	s.NewJob(
		DurationJob(500*time.Millisecond),
		NewTask(func() {}),
		WithEventListeners(
			BeforeJobRuns(
				func(jobID uuid.UUID, jobName string) {
					s.RemoveByTags("tag1")
				},
			),
		),
		WithTags("tag1"),
	)

	// wait job running
	time.Sleep(10 * time.Second)
}

@JohnRoesler
Copy link
Contributor

Hey @mhyllander thanks for reporting. Are you doing any job removal like the example @chih-ming-lin posted? The text does reliably reproduce the issue and I can patch the bug it highlights.

I'm just trying to understand if that is the same issue you are having or not? The issue there is that the job is being removed during the job run itself and i had a map lookup that didn't check if the value was present first.

@mhyllander
Copy link
Author

@JohnRoesler Thanks for looking into this. The timer job does not remove any jobs itself, but I have other jobs that can remove and add timer jobs. Existing timer jobs are removed using RemoveByTags. However, that happens a few hours before the timer job runs, so when the timer job runs, no other jobs are running and no jobs are being removed.

@mhyllander
Copy link
Author

You can see my code to reschedule timer jobs here. Due to the way jobs are removed asynchronously, I generate a new unique tag for each "generation" of the timer jobs, so that I can remove the old generation and schedule a new generation. Before using these unique tags, I had problems where the new timer jobs would be removed right after they were created.

@mhyllander
Copy link
Author

@JohnRoesler It happened again yesterday, using 2.2.6:

Mar 24 20:00:00 piir paninv_controller[22377]: {"time":"2024-03-24T20:00:00.000632435+01:00","level":"INFO","msg":"running settings job","jobName":"arbetsveckor_184 0 20 * * 0-4"}
Mar 24 20:00:00 piir paninv_controller[22377]: {"time":"2024-03-24T20:00:00.002759507+01:00","level":"INFO","msg":"sending config","power":"on","mode":"heat","powerful":"off","quiet":"off","temp":"22","fan":"high","vert":"high","horiz":"auto","ton":"on","tont":"06:20","toff":"on","tofft":"22:00>
Mar 24 20:00:00 piir paninv_controller[22377]: {"time":"2024-03-24T20:00:00.041236965+01:00","level":"INFO","msg":"scheduled timer_on_1 job","jobsetGen":"//timer/timer##3","at":"05:35"}
Mar 24 20:00:00 piir paninv_controller[22377]: {"time":"2024-03-24T20:00:00.0424205+01:00","level":"INFO","msg":"scheduled timer_on_2 job","jobsetGen":"//timer/timer##3","at":"06:20"}
Mar 24 20:00:00 piir paninv_controller[22377]: {"time":"2024-03-24T20:00:00.042693624+01:00","level":"INFO","msg":"scheduled timer_off job","jobsetGen":"//timer/timer##3","at":"22:00"}
Mar 24 22:00:00 piir paninv_controller[22377]: {"time":"2024-03-24T22:00:00.114151786+01:00","level":"INFO","msg":"running timer job","jobName":"timer_off","power":0}
Mar 24 23:00:00 piir paninv_controller[22377]: panic: runtime error: invalid memory address or nil pointer dereference
Mar 24 23:00:00 piir paninv_controller[22377]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x6b0394]
Mar 24 23:00:00 piir paninv_controller[22377]: goroutine 23 [running]:
Mar 24 23:00:00 piir paninv_controller[22377]: github.com/go-co-op/gocron/v2.(*scheduler).selectExecJobIDsOut(0x40001c8000, {0x64, 0x9e, 0xe1, 0xcd, 0xd, 0xb9, 0x41, 0x63, 0xb5, ...})
Mar 24 23:00:00 piir paninv_controller[22377]:         /home/mhy/go/pkg/mod/github.com/go-co-op/gocron/v2@v2.2.6/scheduler.go:311 +0x154
Mar 24 23:00:00 piir paninv_controller[22377]: github.com/go-co-op/gocron/v2.NewScheduler.func1()
Mar 24 23:00:00 piir paninv_controller[22377]:         /home/mhy/go/pkg/mod/github.com/go-co-op/gocron/v2@v2.2.6/scheduler.go:151 +0x250
Mar 24 23:00:00 piir paninv_controller[22377]: created by github.com/go-co-op/gocron/v2.NewScheduler in goroutine 1
Mar 24 23:00:00 piir paninv_controller[22377]:         /home/mhy/go/pkg/mod/github.com/go-co-op/gocron/v2@v2.2.6/scheduler.go:146 +0x48c
Mar 24 23:00:00 piir systemd[1]: paninv_controller.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Mar 24 23:00:00 piir systemd[1]: paninv_controller.service: Failed with result 'exit-code'.

The interesting thing is that at 20:00, the timer_off job at 23:00 was removed, and a new timer_off job was scheduled at 22:00. The new timer_off job ran as it should at 22:00, but then at 23:00, when the timer_off job that was removed would have run, a panic occurred. It seems that the job that was removed was still scheduled to run.

@JohnRoesler
Copy link
Contributor

Thanks for confirming. I found and fixed that bug. I'll get it released.

@JohnRoesler
Copy link
Contributor

v2.2.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants