-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform reads and modifications to Shoot.Info in a concurrency safe way #4459
Conversation
/invite @timuthy @plkokanov @vpnachev @voelzmo |
ad1f163
to
e4f92e4
Compare
e4f92e4
to
08d6bf1
Compare
/assign |
…Info in production code
08d6bf1
to
06191ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the efforts @stoyanr. Apart from what we discussed in person, I left one more comment.
@plkokanov, @voelzmo You have pull request review open invite, please check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @stoyanr. It looks good to me, I only have one suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
…ay (gardener#4459) * Ensure patch operations on b.Shoot.Info are performed on a copy * Introduce GetInfo and UpdateInfo methods and replace naked usages of Info in production code * Introduce SetInfo method and replace naked usages of Info in test code * Reintroduce strategic merge where it was used previously * Minor comment updates * Use atomic.Value for info and remove read locking * Address code review comments * Avoid multiple GetInfo calls in helper methods.
…ay (gardener#4459) * Ensure patch operations on b.Shoot.Info are performed on a copy * Introduce GetInfo and UpdateInfo methods and replace naked usages of Info in production code * Introduce SetInfo method and replace naked usages of Info in test code * Reintroduce strategic merge where it was used previously * Minor comment updates * Use atomic.Value for info and remove read locking * Address code review comments * Avoid multiple GetInfo calls in helper methods.
How to categorize this PR?
/area control-plane
/kind bug
What this PR does / why we need it:
b.Shoot.Info
(oro.Shoot.Info
) are always performed on a copy, and the result is written back tob.Shoot.Info
via a single pointer assignment. This is important to ensure that data races due to other goroutines reading fromb.Shoot.Info
while it's being patched are avoided as much as possible.o.Shoot.Info
inshoot_control.go
with the shoot object that was used to initialize the operation. This is a cosmetic change that doesn't have a real functional impact (the 2 pointers are equal), but is important for consistency. The original pointer can be written to safely in this file without copying since at this point there are no concurrent goroutines reading from it.b.Shoot.Info
in production code that could be executed concurrently with the newly introduced methodsGetInfo()
,UpdateInfo()
, andUpdateInfoStatus()
that perform reading and modification in a concurrency safe way.SetInfo()
and replaces all naked assignments tob.Shoot.Info
in test code with this method.Info
toinfo
, ensuring that it can only be read from and written to using the above methods.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
There are several issues with the current patching approach that are hopefully addressed by this change:
Patch
functions that is concurrently read from (or even written to) by other goroutines. There is a lot happening during patching (decoding, conversion, etc.) and a possibility that the object is invalid at some point during this operation. This wouldn't matter in single-threaded code, but in multi-threaded code this could lead to issues similar to what we observed in Several bugs after migrating azure seed/shoot to v1.21 gardener-extension-provider-azure#328 (comment), due to the read taking place exactly at the point when the object is invalid (e.g status is empty).b.Shoot.Info
(which was already updated) and the actual resource (which was not updated due to the error). This could lead to other unforseen issues, especially if the reconciliation is allowed to continue after such an error.All places where
o.Shoot.Info
is modified and the associated patch operations are therefore refactored to follow this pattern:Note that this doesn't completely address all potential issues:
o.Shoot.Info
is still unprotected and therefore data races are still possible because assignment to a pointer is also not inherently an atomic operation (although it's much faster thanPatch
and therefore the probability for a data race is much lower).o.Shoot.Info
would be lost).To address the above:
Info
field is renamed toinfo
of typeatomic.Value
.Info
are replaced by appropriate methods. There are four such methods:GetInfo
,SetInfo
,UpdateInfo
, andUpdateInfoStatus
. They all perform appropriate atomic loading and storing of theinfo
value.info
(UpdateInfo
,UpdateInfoStatus
) are protected with async.Mutex
to ensure that there is only one concurrent update, and performed on a copy of the value stored ininfo
, in order to protect readers (GetInfo
itself is not protected apart from loading the value atomically).The above approach seems to be a pattern, see the
ReadMostly
example in https://pkg.go.dev/sync/atomic#Value.Note that it's still possible to cause data races by using
GetInfo
andSetInfo
in production code inappropriately. I couldn't think of a way to prevent this aside from doing a copy inside these methods, which would result in massive unneeded copying. I think it's still much better than before since it would be hard not to notice the explicit documentation, and if this happens errors could still be prevented by proper reviewing.Release note: