Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform reads and modifications to Shoot.Info in a concurrency safe way #4459

Merged
merged 8 commits into from
Aug 5, 2021

Conversation

stoyanr
Copy link
Contributor

@stoyanr stoyanr commented Aug 2, 2021

How to categorize this PR?

/area control-plane
/kind bug

What this PR does / why we need it:

  • Ensures that patch operations on b.Shoot.Info (or o.Shoot.Info) are always performed on a copy, and the result is written back to b.Shoot.Info via a single pointer assignment. This is important to ensure that data races due to other goroutines reading from b.Shoot.Info while it's being patched are avoided as much as possible.
  • Replaces usages of o.Shoot.Info in shoot_control.go with the shoot object that was used to initialize the operation. This is a cosmetic change that doesn't have a real functional impact (the 2 pointers are equal), but is important for consistency. The original pointer can be written to safely in this file without copying since at this point there are no concurrent goroutines reading from it.
  • Replaces all naked usages of b.Shoot.Info in production code that could be executed concurrently with the newly introduced methods GetInfo(), UpdateInfo(), and UpdateInfoStatus() that perform reading and modification in a concurrency safe way.
  • Introduces another new method SetInfo() and replaces all naked assignments to b.Shoot.Info in test code with this method.
  • Renames Info to info, ensuring that it can only be read from and written to using the above methods.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
There are several issues with the current patching approach that are hopefully addressed by this change:

  • Most importantly, it is dangerous to pass an object to Patch functions that is concurrently read from (or even written to) by other goroutines. There is a lot happening during patching (decoding, conversion, etc.) and a possibility that the object is invalid at some point during this operation. This wouldn't matter in single-threaded code, but in multi-threaded code this could lead to issues similar to what we observed in Several bugs after migrating azure seed/shoot to v1.21 gardener-extension-provider-azure#328 (comment), due to the read taking place exactly at the point when the object is invalid (e.g status is empty).
  • If the patch operation failed, we would have a mismatch between b.Shoot.Info (which was already updated) and the actual resource (which was not updated due to the error). This could lead to other unforseen issues, especially if the reconciliation is allowed to continue after such an error.

All places where o.Shoot.Info is modified and the associated patch operations are therefore refactored to follow this pattern:

	shoot = o.Shoot.Info.DeepCopy()
	patch := client.MergeFrom(shoot.DeepCopy())
	// Modify shoot ...
	if err := o.K8sGardenClient.Client().Patch(ctx, shoot, patch); err != nil {
		return err
	}
	o.Shoot.Info = shoot

Note that this doesn't completely address all potential issues:

  • The pointer o.Shoot.Info is still unprotected and therefore data races are still possible because assignment to a pointer is also not inherently an atomic operation (although it's much faster than Patch and therefore the probability for a data race is much lower).
  • If multiple modifications and patches happen concurrently to each other (assuming this is actually possible), at the end one would overwrite the result of the other (the shoot resource itself would be updated but the change to o.Shoot.Info would be lost).

To address the above:

  • The Info field is renamed to info of type atomic.Value.
  • All usages of naked Info are replaced by appropriate methods. There are four such methods: GetInfo, SetInfo, UpdateInfo, and UpdateInfoStatus. They all perform appropriate atomic loading and storing of the info value.
  • All updates to info (UpdateInfo, UpdateInfoStatus) are protected with a sync.Mutex to ensure that there is only one concurrent update, and performed on a copy of the value stored in info, in order to protect readers (GetInfo itself is not protected apart from loading the value atomically).
  • All methods are appropriately documented how to use them correctly.

The above approach seems to be a pattern, see the ReadMostly example in https://pkg.go.dev/sync/atomic#Value.

Note that it's still possible to cause data races by using GetInfo and SetInfo in production code inappropriately. I couldn't think of a way to prevent this aside from doing a copy inside these methods, which would result in massive unneeded copying. I think it's still much better than before since it would be hard not to notice the explicit documentation, and if this happens errors could still be prevented by proper reviewing.

Release note:

Improved handling of the shoot resource in the shoot controller to ensure that data races are avoided as much as possible.

@stoyanr stoyanr requested a review from a team as a code owner August 2, 2021 09:56
@gardener-robot gardener-robot added area/control-plane Control plane related kind/bug Bug needs/review size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 2, 2021
@stoyanr
Copy link
Contributor Author

stoyanr commented Aug 2, 2021

/invite @timuthy @plkokanov @vpnachev @voelzmo

@gardener-robot gardener-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs/second-opinion and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 2, 2021
@stoyanr stoyanr marked this pull request as draft August 2, 2021 14:46
@timuthy
Copy link
Member

timuthy commented Aug 2, 2021

/assign

@stoyanr stoyanr changed the title Ensure modifications and patch operations on b.Shoot.Info are performed on a copy Perform reads and modifications to b.Shoot.Info in a concurrency safe way Aug 2, 2021
Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the efforts @stoyanr. Apart from what we discussed in person, I left one more comment.

pkg/operation/botanist/secrets.go Outdated Show resolved Hide resolved
@gardener-robot
Copy link

@plkokanov, @voelzmo You have pull request review open invite, please check

Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @stoyanr. It looks good to me, I only have one suggestion.

pkg/operation/shoot/shoot.go Outdated Show resolved Hide resolved
@timuthy timuthy added this to the v1.29 milestone Aug 5, 2021
Copy link
Member

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@timuthy timuthy merged commit 543217d into gardener:master Aug 5, 2021
@stoyanr stoyanr deleted the fix-shoot-info-patching branch August 5, 2021 13:04
krgostev pushed a commit to krgostev/gardener that referenced this pull request Apr 21, 2022
…ay (gardener#4459)

* Ensure patch operations on b.Shoot.Info are performed on a copy

* Introduce GetInfo and UpdateInfo methods and replace naked usages of Info in production code

* Introduce SetInfo method and replace naked usages of Info in test code

* Reintroduce strategic merge where it was used previously

* Minor comment updates

* Use atomic.Value for info and remove read locking

* Address code review comments

* Avoid multiple GetInfo calls in helper methods.
krgostev pushed a commit to krgostev/gardener that referenced this pull request Jul 5, 2022
…ay (gardener#4459)

* Ensure patch operations on b.Shoot.Info are performed on a copy

* Introduce GetInfo and UpdateInfo methods and replace naked usages of Info in production code

* Introduce SetInfo method and replace naked usages of Info in test code

* Reintroduce strategic merge where it was used previously

* Minor comment updates

* Use atomic.Value for info and remove read locking

* Address code review comments

* Avoid multiple GetInfo calls in helper methods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Control plane related kind/bug Bug size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants