Fix monitor/scheduler race condition. #1186

baldwinn860 · 2017-10-11T18:21:47Z

There was a race condition that caused the scheduler to update robot
tasks with incomplete information. Managers would monitor for changes
inside a goroutine that would compete with the update logic of the data
owner. All initial data would be read in at once, but still in another
thread, which caused the scheduler to update without full knowledge of
entity state, this would cause extraneous tasks to get spawned every
time robot was restarted.

The fix is to initialize the owner's data with a non monitoring search
in the main thread before spawning goroutines to continue monitoring
updates. This ensures we have the complete initial information prior to
the first update. The other changes ensure that the monitoring updates
do not append extraneous entries.

There was a race condition that caused the scheduler to update robot tasks with incomplete information. Managers would monitor for changes inside a goroutine that would compete with the update logic of the data owner. All initial data would be read in at once, but still in another thread, which caused the scheduler to update without full knowledge of entity state, this would cause extraneous tasks to get spawned every time robot was restarted. The fix is to initialize the owner's data with a non monitoring search in the main thread before spawning goroutines to continue monitoring updates. This ensures we have the complete initial information prior to the first update.

ben-clayton · 2017-10-18T15:28:35Z

test/robot/monitor/monitor.go

 	if managers.Job != nil {
-		crash.Go(func() { managers.Job.SearchDevices(ctx, all, owner.updateDevice) })
-		crash.Go(func() { managers.Job.SearchWorkers(ctx, all, owner.updateWorker) })
+		if err := managers.Job.SearchDevices(ctx, initial, owner.updateDevice); err != nil {


I'm a little upset that I can't find a way to refactor this into something more compact without resorting to reflection. :(

Yeah, me too, I had written like three different solutions but this was by far the most succinct, all the rest kind of exploded into lots of code changes.

baldwinn860 requested review from ben-clayton and pmuetschard October 11, 2017 18:21

ben-clayton reviewed Oct 18, 2017

View reviewed changes

ben-clayton approved these changes Oct 18, 2017

View reviewed changes

baldwinn860 merged commit db7975c into google:master Oct 18, 2017

baldwinn860 deleted the fix_scheduler branch October 18, 2017 17:30

purvisa-at-google-com pushed a commit that referenced this pull request Sep 29, 2022

fix lint errors (#1186)

35f18e8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix monitor/scheduler race condition. #1186

Fix monitor/scheduler race condition. #1186

baldwinn860 commented Oct 11, 2017

ben-clayton Oct 18, 2017

baldwinn860 Oct 18, 2017

Fix monitor/scheduler race condition. #1186

Fix monitor/scheduler race condition. #1186

Conversation

baldwinn860 commented Oct 11, 2017

ben-clayton Oct 18, 2017

Choose a reason for hiding this comment

baldwinn860 Oct 18, 2017

Choose a reason for hiding this comment