-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix two different causes of infinite reconciliation loop #1473
Conversation
@@ -285,6 +285,10 @@ func (r *IstioReconciler) updateStatus(ctx context.Context, log logr.Logger, ist | |||
} | |||
status.AppliedValues = appliedValues | |||
|
|||
if reflect.DeepEqual(istio.Status, *status) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth adding a simple comment reminding readers why this is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty clear from the code itself. Why update the status if it hasn't changed? Additionally, the commit message fully explains why this is needed. I wouldn't want to add that long explanation here.
@@ -158,6 +158,11 @@ check_ready() { | |||
${COMMAND} wait deployment "${DEPLOYMENT_NAME}" -n "${NS}" --for condition=Available=True --timeout=${TIMEOUT} | |||
} | |||
|
|||
fail() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: When I read a function called fail
I expect it to fail (i.e., exit 1
) which is not the case here. It's confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to logFailure
@@ -172,13 +177,33 @@ main_test() { | |||
${COMMAND} get ns "${CONTROL_PLANE_NS}" >/dev/null 2>&1 || ${COMMAND} create namespace "${CONTROL_PLANE_NS}" | |||
sed -e "s/version:.*/version: ${ver}/g" "${ISTIO_MANIFEST}" | ${COMMAND} apply -f - -n "${CONTROL_PLANE_NS}" | |||
|
|||
echo "Wait for Istio to be Reconciled" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? Or just wait for Ready
is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although Ready
should be enough, it makes sense to check Reconciled
. Take this as an additional test assertion.
echo "Wait for Istio to be Ready" | ||
${COMMAND} wait istio/istio-sample -n "${CONTROL_PLANE_NS}" --for condition=Ready=True --timeout=${TIMEOUT} | ||
|
||
echo "Give the operator some time to settle down" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/some time/30s/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
echo "Give the operator some time to settle down" | ||
sleep 30 | ||
|
||
echo "Check that the operator has stopped reconciling the resource" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: append sth like "Waiting 30s"
just so that people running this test manually knows what to expect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests waits in a few other places as well, and it doesn't say how long... but I've made the change as suggested
…of operator and istiod Istiod updates the `failurePolicy` in the istio-validator-<rev>-<ns> ValidatingWebhookConfiguration when the webhook endpoint becomes ready. The operator should ignore this change in order to prevent an infinite reconciliation loop caused by the operator and istiod reverting each other's changes to the `failurePolicy` field.
…ache Before updating the status, the operator should check whether the status has changed. If it doesn't do this, it may cause the lastTransitionTimestamp in the status conditions to flip between two values. This happens in the following scenario: 1. operator deploys istiod Deployment and other resources 2. istiod Deployment becomes ready; this triggers a reconciliation of the Istio resource 3. the operator changes the state of the Ready condition to True and updates the lastTransitionTimestamp; this change is posted to the API server, but as per normal controller behavior, the local cache isn't yet updated 4. if something else now triggers another reconciliation before the controller's local cache is synced (i.e. if the reconciliation is triggered before the status update makes it into the local cache), the operator will now again set the Ready condition to True and update the lastTransitionTimestamp; again, it posts this change to the API server (NOTE: because a patch operation is used to update the status, this change is not treated as a conflict) 5. during this previous reconciliation, the operator is notified of the status changed it made before; this causes another reconciliation of the locally-cached Istio resource 6. the object's status doesn't change, because the Ready condition is already true, however the lastTransitionTimestamp in the locally cached copy is different from the one in the API server; the operator updates the status again, overwriting the correct timestamp with the old one. 7. this process is then repeated continuously; the operator keeps switching the lastTransitionTimestamp between two values, since it keeps processing two different states of the Istio resource. By performing the status update only if the status in the locally cached copy has changed during reconciliation, we break this infinite loop.
No description provided.