Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes semaphore when 2 tasks has same time #1211

Open
fatihusta opened this issue Apr 12, 2023 · 7 comments
Open

Crashes semaphore when 2 tasks has same time #1211

fatihusta opened this issue Apr 12, 2023 · 7 comments
Labels

Comments

@fatihusta
Copy link

Task1: 0 0,8,16 * * *
Task2: 0 0,8,16 * * *

Apr 12 00:00:00 semaphore semaphore[368518]: panic: Connection schedule already exists
Apr 12 00:00:00 semaphore semaphore[368518]: goroutine 23 [running]:
Apr 12 00:00:00 semaphore semaphore[368518]: github.com/ansible-semaphore/semaphore/db/bolt.(*BoltDb).Connect(0xc0001918c0, {0xb9cbfe, 0x8})
Apr 12 00:00:00 semaphore semaphore[368518]:         /home/runner/work/semaphore/semaphore/db/bolt/BoltDb.go:85 +0x3c5
Apr 12 00:00:00 semaphore semaphore[368518]: github.com/ansible-semaphore/semaphore/services/schedules.ScheduleRunner.Run({0x0?, 0x0?, 0xc0000974a0?})
Apr 12 00:00:00 semaphore semaphore[368518]:         /home/runner/work/semaphore/semaphore/services/schedules/pool.go:56 +0xaa
Apr 12 00:00:00 semaphore semaphore[368518]: github.com/robfig/cron/v3.(*Cron).startJob.func1()
Apr 12 00:00:00 semaphore semaphore[368518]:         /home/runner/work/semaphore/semaphore/vendor/github.com/robfig/cron/v3/cron.go:312 +0x6a
Apr 12 00:00:00 semaphore semaphore[368518]: created by github.com/robfig/cron/v3.(*Cron).startJob
Apr 12 00:00:00 semaphore semaphore[368518]:         /home/runner/work/semaphore/semaphore/vendor/github.com/robfig/cron/v3/cron.go:310 +0xad
Apr 12 00:00:00 semaphore systemd[1]: semaphore.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 12 00:00:00 semaphore systemd[1]: semaphore.service: Failed with result 'exit-code'.
Apr 12 00:00:00 semaphore systemd[1]: semaphore.service: Consumed 1.733s CPU time.
Apr 12 00:00:03 semaphore systemd[1]: semaphore.service: Scheduled restart job, restart counter is at 22.
Apr 12 00:00:03 semaphore systemd[1]: Stopped Ansible Semaphore.
Apr 12 00:00:03 semaphore systemd[1]: semaphore.service: Consumed 1.733s CPU time.
Apr 12 00:00:03 semaphore systemd[1]: Started Ansible Semaphore.
Apr 12 00:00:03 semaphore semaphore[371629]: BoltDB /var/lib/semaphore/database.boltdb
Apr 12 00:00:03 semaphore semaphore[371629]: Tmp Path (projects home) /tmp/semaphore
Apr 12 00:00:03 semaphore semaphore[371629]: Semaphore v2.8.89
Apr 12 00:00:03 semaphore semaphore[371629]: Interface 127.0.0.1
Apr 12 00:00:03 semaphore semaphore[371629]: Port :3000
Apr 12 00:00:03 semaphore semaphore[371629]: Server is running
Apr 12 08:00:00 semaphore semaphore[371629]: panic: Connection schedule already exists
Apr 12 08:00:00 semaphore semaphore[371629]: goroutine 49 [running]:
Apr 12 08:00:00 semaphore semaphore[371629]: github.com/ansible-semaphore/semaphore/db/bolt.(*BoltDb).Connect(0xc0001918c0, {0xb9cbfe, 0x8})
Apr 12 08:00:00 semaphore semaphore[371629]:         /home/runner/work/semaphore/semaphore/db/bolt/BoltDb.go:85 +0x3c5
Apr 12 08:00:00 semaphore semaphore[371629]: github.com/ansible-semaphore/semaphore/services/schedules.ScheduleRunner.Run({0x0?, 0x0?, 0xc0000974a0?})
Apr 12 08:00:00 semaphore semaphore[371629]:         /home/runner/work/semaphore/semaphore/services/schedules/pool.go:56 +0xaa
Apr 12 08:00:00 semaphore semaphore[371629]: github.com/robfig/cron/v3.(*Cron).startJob.func1()
Apr 12 08:00:00 semaphore semaphore[371629]:         /home/runner/work/semaphore/semaphore/vendor/github.com/robfig/cron/v3/cron.go:312 +0x6a
Apr 12 08:00:00 semaphore semaphore[371629]: created by github.com/robfig/cron/v3.(*Cron).startJob
Apr 12 08:00:00 semaphore semaphore[371629]:         /home/runner/work/semaphore/semaphore/vendor/github.com/robfig/cron/v3/cron.go:310 +0xad
Apr 12 08:00:00 semaphore systemd[1]: semaphore.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 12 08:00:00 semaphore systemd[1]: semaphore.service: Failed with result 'exit-code'.
Apr 12 08:00:00 semaphore systemd[1]: semaphore.service: Consumed 1.390s CPU time.
Apr 12 08:00:03 semaphore systemd[1]: semaphore.service: Scheduled restart job, restart counter is at 23.
Apr 12 08:00:03 semaphore systemd[1]: Stopped Ansible Semaphore.
Apr 12 08:00:03 semaphore systemd[1]: semaphore.service: Consumed 1.390s CPU time.
@nliechti
Copy link

nliechti commented May 8, 2023

We are also having trouble with the embeded BoltDB with the same error. But not only limited to parallel execution, it happens unpredictable with cron jobs and manually triggered jobs.

@fiftin fiftin added the bug label Jul 9, 2023
@fiftin
Copy link
Collaborator

fiftin commented Jul 9, 2023

Hi @fatihusta , @nliechti thank for the reporting. I will try to reproduce.

@myownhost
Copy link

@fiftin, I can reproduce this in my environment. When I schedule tasks to start at the same time, I get the following error in the log and the second task does not start:

panic: Connection schedule already exists
time="2023-08-16T08:45:36+02:00" level=info msg="Release resource locker with TaskRunner 2147483062"
goroutine 54 [running]:
github.com/ansible-semaphore/semaphore/db/bolt.(*BoltDb).Connect(0xc000103920, {0xb9cbfe, 0x8})
#011/home/runner/work/semaphore/semaphore/db/bolt/BoltDb.go:85 +0x3c5
github.com/ansible-semaphore/semaphore/services/schedules.ScheduleRunner.Run({0x0?, 0x0?, 0xc000031a70?})
#011/home/runner/work/semaphore/semaphore/services/schedules/pool.go:56 +0xaa
github.com/robfig/cron/v3.(*Cron).startJob.func1()
#011/home/runner/work/semaphore/semaphore/vendor/github.com/robfig/cron/v3/cron.go:312 +0x6a
created by github.com/robfig/cron/v3.(*Cron).startJob
#011/home/runner/work/semaphore/semaphore/vendor/github.com/robfig/cron/v3/cron.go:310 +0xad

@ziouf
Copy link
Contributor

ziouf commented Nov 5, 2023

This is still an issue for me

Nov 05 00:00:00 ansible semaphore[38994]: panic: Connection schedule already exists
Nov 05 00:00:00 ansible semaphore[38994]: goroutine 395 [running]:
Nov 05 00:00:00 ansible semaphore[38994]: github.com/ansible-semaphore/semaphore/db/bolt.(*BoltDb).Connect(0xc0000348d0, {0xbfba9c, 0x8})
Nov 05 00:00:00 ansible semaphore[38994]:         /home/runner/work/semaphore/semaphore/db/bolt/BoltDb.go:86 +0x3f3
Nov 05 00:00:00 ansible semaphore[38994]: github.com/ansible-semaphore/semaphore/services/schedules.ScheduleRunner.Run({0xc0000f6580?, 0xc0000f6570?, 0xc0000>
Nov 05 00:00:00 ansible semaphore[38994]:         /home/runner/work/semaphore/semaphore/services/schedules/SchedulePool.go:56 +0xaa
Nov 05 00:00:00 ansible semaphore[38994]: github.com/robfig/cron/v3.(*Cron).startJob.func1()
Nov 05 00:00:00 ansible semaphore[38994]:         /home/runner/work/semaphore/semaphore/vendor/github.com/robfig/cron/v3/cron.go:312 +0x6a
Nov 05 00:00:00 ansible semaphore[38994]: created by github.com/robfig/cron/v3.(*Cron).startJob
Nov 05 00:00:00 ansible semaphore[38994]:         /home/runner/work/semaphore/semaphore/vendor/github.com/robfig/cron/v3/cron.go:310 +0xad

v2.9.37 with BoltDB

@fiftin

@canlot
Copy link

canlot commented Jan 30, 2024

I managed to find the issue, if 2 or more jobs are scheduled at the same time the cron scheduler executes ScheduleRunner 2 times one after the other and if Bolt database beeing used, second db "schedule" connection fails beeing created because only 1 connection allowed at the same time. I managed to sove the issue by locking a variable at start of the schedule at function Run that is called from cron and unlock it at the end:

package schedules

import (
	"sync"

	log "github.com/Sirupsen/logrus"
	"github.com/ansible-semaphore/semaphore/db"
	"github.com/ansible-semaphore/semaphore/lib"
	"github.com/ansible-semaphore/semaphore/services/tasks"
	"github.com/robfig/cron/v3"
)

type ScheduleRunner struct {
	projectID  int
	scheduleID int
	pool       *SchedulePool
}

var singleConnectionStoreLocker sync.Locker

func (r ScheduleRunner) tryUpdateScheduleCommitHash(schedule db.Schedule) (updated bool, err error) {
	repo, err := r.pool.store.GetRepository(schedule.ProjectID, *schedule.RepositoryID)
	if err != nil {
		return
	}

	err = repo.SSHKey.DeserializeSecret()
	if err != nil {
		return
	}

	remoteHash, err := lib.GitRepository{
		Logger:     nil,
		TemplateID: schedule.TemplateID,
		Repository: repo,
		Client:     lib.CreateDefaultGitClient(),
	}.GetLastRemoteCommitHash()

	if err != nil {
		return
	}

	if schedule.LastCommitHash != nil && remoteHash == *schedule.LastCommitHash {
		return
	}

	err = r.pool.store.SetScheduleCommitHash(schedule.ProjectID, schedule.ID, remoteHash)
	if err != nil {
		return
	}

	updated = true
	return
}

func (r ScheduleRunner) Run() {
	if !r.pool.store.PermanentConnection() {
		singleConnectionStoreLocker.Lock()
		log.Print("singleConnectStoreLocker locked")
		r.pool.store.Connect("schedule")
		log.Print("SCHEDULERUNNER: Store connected: schedule")
		defer log.Print("singleConnectStoreLocker unlocked")
		defer singleConnectionStoreLocker.Unlock()
		defer log.Print("SCHEDULERUNNER: Store closed: schedule")
		defer r.pool.store.Close("schedule")
	}

	schedule, err := r.pool.store.GetSchedule(r.projectID, r.scheduleID)
	if err != nil {
		log.Error(err)
		return
	}

	if schedule.RepositoryID != nil {
		var updated bool
		updated, err = r.tryUpdateScheduleCommitHash(schedule)
		if err != nil {
			log.Error(err)
			return
		}
		if !updated {
			return
		}
	}

	_, err = r.pool.taskPool.AddTask(db.Task{
		TemplateID: schedule.TemplateID,
		ProjectID:  schedule.ProjectID,
	}, nil, schedule.ProjectID)

	if err != nil {
		log.Error(err)
	}
}

type SchedulePool struct {
	cron     *cron.Cron
	locker   sync.Locker
	store    db.Store
	taskPool *tasks.TaskPool
}

func (p *SchedulePool) init() {
	p.cron = cron.New()
	p.locker = &sync.Mutex{}
}

func (p *SchedulePool) Refresh() {
	defer p.locker.Unlock()

	schedules, err := p.store.GetSchedules()

	if err != nil {
		log.Error(err)
		return
	}

	p.locker.Lock()
	p.clear()
	for _, schedule := range schedules {
		_, err := p.addRunner(ScheduleRunner{
			projectID:  schedule.ProjectID,
			scheduleID: schedule.ID,
			pool:       p,
		}, schedule.CronFormat)
		if err != nil {
			log.Error(err)
		}
	}
}

func (p *SchedulePool) addRunner(runner ScheduleRunner, cronFormat string) (int, error) {
	id, err := p.cron.AddJob(cronFormat, runner)

	if err != nil {
		return 0, err
	}

	return int(id), nil
}

func (p *SchedulePool) Run() {
	p.cron.Run()
}

func (p *SchedulePool) clear() {
	runners := p.cron.Entries()
	for _, r := range runners {
		p.cron.Remove(r.ID)
	}
}

func (p *SchedulePool) Destroy() {
	defer p.locker.Unlock()
	p.locker.Lock()
	p.cron.Stop()
	p.clear()
	p.cron = nil
}

func CreateSchedulePool(store db.Store, taskPool *tasks.TaskPool) SchedulePool {
	pool := SchedulePool{
		store:    store,
		taskPool: taskPool,
	}
	singleConnectionStoreLocker = &sync.Mutex{}
	pool.init()
	pool.Refresh()
	return pool
}

func ValidateCronFormat(cronFormat string) error {
	_, err := cron.ParseStandard(cronFormat)
	return err
}

@fiftin there are propably better solutions for this but it's a start and as far i can tell not disturbing things.

@jojoob
Copy link

jojoob commented May 8, 2024

This bug is fixed since Feb 10. See the referenced commit above.
And I just verified it on my own instance. After I upgraded to v2.9.75 I'm not able to reproduce the bug any more!

@fatihusta
Copy link
Author

I still have this issue.

semaphore version
v2.9.64

May 21 12:00:00 semaphore semaphore[3287470]: panic: Connection schedule 6 already exists
May 21 12:00:00 semaphore semaphore[3287470]: goroutine 60699 [running]:
May 21 12:00:00 semaphore semaphore[3287470]: github.com/ansible-semaphore/semaphore/db/bolt.(*BoltDb).Connect(0xc0001c1b30, {0xc009e8a0d0, 0xa})
May 21 12:00:00 semaphore semaphore[3287470]:         /home/runner/work/semaphore/semaphore/db/bolt/BoltDb.go:86 +0x3b4
May 21 12:00:00 semaphore semaphore[3287470]: github.com/ansible-semaphore/semaphore/services/schedules.ScheduleRunner.Run({0x1e67a40?, 0xc0048e0dc0?, 0xc0064b6f90?})
May 21 12:00:00 semaphore semaphore[3287470]:         /home/runner/work/semaphore/semaphore/services/schedules/SchedulePool.go:57 +0xc3
May 21 12:00:00 semaphore semaphore[3287470]: github.com/robfig/cron/v3.(*Cron).startJob.func1()
May 21 12:00:00 semaphore semaphore[3287470]:         /home/runner/work/semaphore/semaphore/vendor/github.com/robfig/cron/v3/cron.go:312 +0x5b
May 21 12:00:00 semaphore semaphore[3287470]: created by github.com/robfig/cron/v3.(*Cron).startJob in goroutine 7
May 21 12:00:00 semaphore semaphore[3287470]:         /home/runner/work/semaphore/semaphore/vendor/github.com/robfig/cron/v3/cron.go:310 +0xa5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants