Skip to content

Conversation

@mshaverdo
Copy link
Contributor

Sometimes, when reading of cgroups tree is too slow, manager.getContainersDiff during global housekeeping could block any other actions due to acquiring mmanager.containersLock for a long time.

To fix it, we could release lock while list cgroups tree and then acquire it again to get diff.

@dashpole
Copy link
Collaborator

/ok-to-test

@dashpole
Copy link
Collaborator

The following files are not properly formatted:
manager/manager.go

@mshaverdo mshaverdo force-pushed the get_containers_diff_lock branch from a1c0974 to 1fdff28 Compare August 27, 2018 15:11
@mshaverdo
Copy link
Contributor Author

@dashpole
Fixed. Sorry.

Copy link
Collaborator

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Looks good with one nit.

}
allContainers, err := cont.handler.ListContainers(container.ListRecursive)
contHandler := cont.handler
m.containersLock.RUnlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rather than locking for all but the ListContainers call, can we instead lock around just the critical sections? I.E. lock and unlock around the := m.containers call above, and then Lock, and defer Unlock before the Determine which were added... section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's looks much better

return nil, nil, fmt.Errorf("failed to find container %q while checking for new containers", containerName)
}
allContainers, err := cont.handler.ListContainers(container.ListRecursive)
contHandler := cont.handler
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need to assign cont.handler to a new variable here?

Copy link
Contributor Author

@mshaverdo mshaverdo Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it's better to leave assignation to a new var, but move it under the lock, to avoid possible data race.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With assignation placed under the lock, code looks ugly without defer, but with defer we returning to the first variant. Maybe, it's better to use initial variant?

	m.containersLock.RLock()
	cont, ok := m.containers[namespacedContainerName{
		Name: containerName,
	}]
	if !ok {
		m.containersLock.RUnlock()
		return nil, nil, fmt.Errorf("failed to find container %q while checking for new containers", containerName)
	}
	contHandler := cont.handler
	m.containersLock.RUnlock()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced there is a data race in reading the handler attribute of cont. It is not modified once it is initialized. The containersLock is meant to prevent concurrent access to the containers map, which the current design does. I don't think you need to add a new contHandler variable here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are no data races in the current codebase, but i have one concern:
When m.containersLock locked, we expect exclusive access to the m.containers and it's members' fields. It's unexpected, that cont.handler could be read by anyone else during lock, so it could lead to tricky data race in future.
What do you think about it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need to reassess that when the time comes. Other functions, including getContainer and getSubcontainers, getAllDockerContainers, etc. return references to containerData structs, and could allow concurrent modification of containerData objects, if used in the way you are suggesting it could be. We would need much more than a change here to fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Fixed.

Copy link
Collaborator

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dashpole dashpole merged commit fcc77c6 into google:master Aug 29, 2018
@mshaverdo mshaverdo deleted the get_containers_diff_lock branch November 22, 2018 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants