Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition between operator's watch and e2eutil.WaitForDeployment. #549

Closed
pratikjagrut opened this issue Jul 10, 2020 · 3 comments
Closed

Comments

@pratikjagrut
Copy link
Contributor

I've noticed that if we create SBR before the application and then the operator is watching for the application deployment to be ready and at the same time e2eutil.WaitForDeployment(t, f.KubeClient, ns, appName, 1, retryInterval, timeout) is also watching the same deployment. So here these two watches are in deadlock or sort of deadlock condition if the operator watch detects the deployment is ready before e2eutil watch then it will redeploy it with the secrets hence the e2eutil watch might miss the first deployment readiness and will still wait for deployment this time deployment will take some more time to reach 1 replica ensuing e2eutil watch timeout. Hence failing the e2e tests.

@isutton
Copy link
Contributor

isutton commented Jul 13, 2020

I've noticed that if we create SBR before the application and then the operator is watching for the application deployment to be ready and at the same time e2eutil.WaitForDeployment(t, f.KubeClient, ns, appName, 1, retryInterval, timeout) is also watching the same deployment. So here these two watches are in deadlock or sort of deadlock condition if the operator watch detects the deployment is ready before e2eutil watch then it will redeploy it with the secrets hence the e2eutil watch might miss the first deployment readiness and will still wait for deployment this time deployment will take some more time to reach 1 replica ensuing e2eutil watch timeout. Hence failing the e2e tests.

A couple of points to take in consideration when evaluating this information below.

Application resources without SBO annotations can't be reconciled

Currently when the application is created after the SBR, even if the GVK of the application is being observed (corev1.Deployment for example), there is no way to correlate the application resource and the SBR to be reconciled since there are no annotations in the application resource yet (the annotation is created after the first reconciliation).

This can be seen in the code below, where the incoming resource is evaluated and ignored in the case the relevant annotation is not present:

// Map execute the mapping of a resource with the requests it would produce. Here we inspect the
// given object trying to identify if this object is part of a SBR, or a actual SBR resource.
func (m *sbrRequestMapper) Map(obj handler.MapObject) []reconcile.Request {
log := mapperLog.WithValues(
"Object.Namespace", obj.Meta.GetNamespace(),
"Object.Name", obj.Meta.GetName(),
)
toReconcile := []reconcile.Request{}
sbrNamespacedName, err := getSBRNamespacedNameFromObject(obj.Object)
if err != nil {
log.Error(err, "on inspecting object for annotations for SBR object")
return toReconcile
}
if isNamespacedNameEmpty(sbrNamespacedName) {
log.Debug("not able to extract SBR namespaced-name")
return toReconcile
}
return append(toReconcile, reconcile.Request{NamespacedName: sbrNamespacedName})
}

Updates not retrying on error

There are some situations SBO doesn't reconcile on error, as captured on the search results shown below:

pkg/controller/servicebindingrequest/common.go:
  25  	if errors.IsNotFound(err) {
  26: 		return requeue(nil, requeueAfter)
  27  	}
  28: 	return noRequeue(err)
  29  }

  33  	if errors.IsConflict(err) {
  34: 		return requeueError(err)
  35  	}

  61  	}
  62: 	return noRequeue(err)
  63  }

pkg/controller/servicebindingrequest/reconciler.go:
  148  		// being worked in https://github.com/redhat-developer/service-binding-operator/pull/442.
  149: 		return requeueError(errEmptyBackingServiceSelectors)
  150  	}

  159  	if err != nil {
  160: 		return requeueError(err)
  161  	}

  169  	if err != nil {
  170: 		return requeueError(err)
  171  	}

  187  		logger.Error(err, "Building ServiceBinder")
  188: 		return noRequeue(err)
  189  	}

pkg/controller/servicebindingrequest/servicebinder.go:
  146  		logger.Error(err, "On removing annotations from related objects.")
  147: 		return requeueError(err)
  148  	}

  151  		logger.Error(err, "On unbinding related objects")
  152: 		return requeueError(err)
  153  	}

  157  	if _, err := b.updateServiceBindingRequest(b.sbr); err != nil {
  158: 		return noRequeue(err)
  159  	}

  231  	if errStatus != nil {
  232: 		return requeueError(errStatus)
  233  	}

  235  
  236: 	return requeueOnNotFound(err, requeueAfter)
  237  }

  273  	if err != nil {
  274: 		return requeueOnConflict(err)
  275  	}

  279  	if _, err = b.updateServiceBindingRequest(sbr); err != nil {
  280: 		return noRequeue(err)
  281  	}

It might be required to review the noRequeue usage, to make sure the SBR is re-queued on errors that can be indeed retried (mostly conflict errors and, some circumstances, not found errors); for example, there are no re-queues on SBR status update error.

@sbose78
Copy link
Member

sbose78 commented Jul 13, 2020

Taking discussion to the PR
#556 (comment)

@pratikjagrut
Copy link
Contributor Author

Closing this issue as we moved to acceptance tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants