New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Services are not re-created when deleted manually #61

Closed
alexellis opened this Issue Dec 2, 2018 · 4 comments

Comments

Projects
None yet
1 participant
@alexellis
Member

alexellis commented Dec 2, 2018

Expected Behaviour

If someone deletes a resource which belongs to the operator, then the operator should re-create it - whether that be the deployment or the service entry.

Current Behaviour

It doesn't - it won't re-create the ClusterIP service.

Possible Solution

Steps to Reproduce (for bugs)

alex@nuc7:~/go/src/github.com/openfaas/certifier$ faas-cli store deploy figlet -g 127.0.0.1:31112

Deployed. 202 Accepted.
URL: http://127.0.0.1:31112/function/figlet

alex@nuc7:~/go/src/github.com/openfaas/certifier$ kubectl get svc,deploy -n openfaas-fn
NAME      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
figlet    ClusterIP   10.96.217.119   <none>        8080/TCP   6s

NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
figlet    1         1         1            0           6s
alex@nuc7:~/go/src/github.com/openfaas/certifier$ kubectl delete svc/figlet -n openfaas-fn
service "figlet" deleted
alex@nuc7:~/go/src/github.com/openfaas/certifier$ kubectl get svc,deploy -n openfaas-fn
NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
figlet    1         1         1            1           17s
alex@nuc7:~/go/src/github.com/openfaas/certifier$ kubectl get svc,deploy -n openfaas-fn
NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
figlet    1         1         1            1           19s
alex@nuc7:~/go/src/github.com/openfaas/certifier$ kubectl get svc,deploy -n openfaas-fn
NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
figlet    1         1         1            1           22s
alex@nuc7:~/go/src/github.com/openfaas/certifier$ kubectl get svc,deploy -n openfaas-fn
NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
figlet    1         1         1            1           23s
alex@nuc7:~/go/src/github.com/openfaas/certifier$ 

These are the logs that we get:

I1202 20:42:50.747457       1 apply.go:65] Function figlet created
I1202 20:42:50.752514       1 controller.go:300] Creating ClusterIP service for 'figlet'
I1202 20:42:50.760943       1 controller.go:315] Creating deployment for 'figlet'
E1202 20:42:50.764466       1 controller.go:257] error syncing 'openfaas-fn/figlet': deployment.apps "figlet" not found

The error 'openfaas-fn/figlet': deployment.apps "figlet" not found seems to be a false error since the CRD entry still exists and the deployment.

If I delete the Deployment then the Deployment and the Service are both re-created.

Context

This seems like a fringe-case, but one we'd need to cover for automation.

Your Environment

  • Docker version docker version (e.g. Docker 17.0.05 ):
 Version:	18.04.0-ce
  • What version and distriubtion of Kubernetes are you using? kubectl version
alex@nuc7:~/go/src/github.com/openfaas/certifier$ kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:55:54Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.8", GitCommit:"7eab6a49736cc7b01869a15f9f05dc5b49efb9fc", GitTreeState:"clean", BuildDate:"2018-09-14T15:54:20Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
  • Operating System and version (e.g. Linux, Windows, MacOS):

Ubuntu Linux 16.04

  • What network driver are you using and what CIDR? i.e. Weave net / Flannel

Weave

@alexellis

This comment has been minimized.

Member

alexellis commented Dec 2, 2018

Found by @LucasRoesler via #60

@alexellis

This comment has been minimized.

Member

alexellis commented Dec 2, 2018

Looking into this I'm seeing something like the following:

is there a deployment object?

no -> create the ClusterIP Service, then the Deployment
yes -> carry on

https://github.com/openfaas-incubator/openfaas-operator/blob/master/pkg/controller/controller.go#L299

So if that's the case, it may be why we're not getting the Service created.

@alexellis

This comment has been minimized.

Member

alexellis commented Dec 2, 2018

We may be looking at a patch that looks something like this:

diff --git a/pkg/controller/controller.go b/pkg/controller/controller.go
index 7ce4731..80988b8 100644
--- a/pkg/controller/controller.go
+++ b/pkg/controller/controller.go
@@ -297,16 +297,6 @@ func (c *Controller) syncHandler(key string) error {
 	deployment, err := c.deploymentsLister.Deployments(function.Namespace).Get(deploymentName)
 	// If the resource doesn't exist, we'll create it
 	if errors.IsNotFound(err) {
-		glog.Infof("Creating ClusterIP service for '%s'", function.Spec.Name)
-		if _, err := c.kubeclientset.CoreV1().Services(function.Namespace).Create(newService(function)); err != nil {
-			// If an error occurs during Service Create, we'll requeue the item
-			if errors.IsAlreadyExists(err) {
-				glog.V(2).Infof("ClusterIP service '%s' already exists. Skipping creation.", function.Spec.Name)
-			} else {
-				return err
-			}
-		}
-
 		existingSecrets, err := c.getSecrets(function.Namespace, function.Spec.Secrets)
 		if err != nil {
 			return err
@@ -318,6 +308,20 @@ func (c *Controller) syncHandler(key string) error {
 		)
 	}
 
+	svcGetOptions := metav1.GetOptions{}
+	_, getSvcErr := c.kubeclientset.CoreV1().Services(function.Namespace).Get(deploymentName, svcGetOptions)
+	if errors.IsNotFound(getSvcErr) {
+		glog.Infof("Creating ClusterIP service for '%s'", function.Spec.Name)
+		if _, err := c.kubeclientset.CoreV1().Services(function.Namespace).Create(newService(function)); err != nil {
+			// If an error occurs during Service Create, we'll requeue the item
+			if errors.IsAlreadyExists(err) {
+				glog.V(2).Infof("ClusterIP service '%s' already exists. Skipping creation.", function.Spec.Name)
+			} else {
+				return err
+			}
+		}
+	}
+
 	// If an error occurs during Get/Create, we'll requeue the item so we can
 	// attempt processing again later. This could have been caused by a
 	// temporary network failure, or any other transient reason.

I tested it on my cluster but it only "works" if the syncHandler runs.

On deployment it remediated by creating the Service, but not after that, so we may need some more tweaking:

I1202 21:09:11.654006       1 controller.go:314] Creating ClusterIP service for 'figlet'
@alexellis

This comment has been minimized.

Member

alexellis commented Dec 2, 2018

Seems like we needed an informer for the Service definitions in the namespace too ->

diff --git a/pkg/controller/controller.go b/pkg/controller/controller.go
index 7ce4731..4687635 100644
--- a/pkg/controller/controller.go
+++ b/pkg/controller/controller.go
@@ -92,6 +92,8 @@ func NewController(
 
 	// obtain references to shared index informers for the Deployment and Function types
 	deploymentInformer := kubeInformerFactory.Apps().V1beta2().Deployments()
+	serviceInformer := kubeInformerFactory.Core().V1().Services()
+
 	faasInformer := faasInformerFactory.Openfaas().V1alpha2().Functions()
 
 	// Create event broadcaster
@@ -133,6 +135,7 @@ func NewController(
 			}
 		},
 	})
+
 	// Set up an event handler for when Deployment resources change. This
 	// handler will lookup the owner of the given Deployment, and if it is
 	// owned by a Function resource will enqueue that Function resource for
@@ -154,6 +157,21 @@ func NewController(
 		DeleteFunc: controller.handleObject,
 	})
 
+	serviceInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
+		AddFunc: controller.handleObject,
+		UpdateFunc: func(old, new interface{}) {
+			newDepl := new.(*corev1.Service)
+			oldDepl := old.(*corev1.Service)
+			if newDepl.ResourceVersion == oldDepl.ResourceVersion {
+				// Periodic resync will send update events for all known Services.
+				// Two different versions of the same Deployment will always have different RVs.
+				return
+			}
+			controller.handleObject(new)
+		},
+		DeleteFunc: controller.handleObject,
+	})
+
 	// Set up an event handler for when functions related resources like pods, deployments, replica sets
 	// can't be materialized. This logs abnormal events like ImagePullBackOff, back-off restarting failed container,
 	// failed to start container, oci runtime errors, etc
@@ -297,16 +315,6 @@ func (c *Controller) syncHandler(key string) error {
 	deployment, err := c.deploymentsLister.Deployments(function.Namespace).Get(deploymentName)
 	// If the resource doesn't exist, we'll create it
 	if errors.IsNotFound(err) {
-		glog.Infof("Creating ClusterIP service for '%s'", function.Spec.Name)
-		if _, err := c.kubeclientset.CoreV1().Services(function.Namespace).Create(newService(function)); err != nil {
-			// If an error occurs during Service Create, we'll requeue the item
-			if errors.IsAlreadyExists(err) {
-				glog.V(2).Infof("ClusterIP service '%s' already exists. Skipping creation.", function.Spec.Name)
-			} else {
-				return err
-			}
-		}
-
 		existingSecrets, err := c.getSecrets(function.Namespace, function.Spec.Secrets)
 		if err != nil {
 			return err
@@ -318,6 +326,20 @@ func (c *Controller) syncHandler(key string) error {
 		)
 	}
 
+	svcGetOptions := metav1.GetOptions{}
+	_, getSvcErr := c.kubeclientset.CoreV1().Services(function.Namespace).Get(deploymentName, svcGetOptions)
+	if errors.IsNotFound(getSvcErr) {
+		glog.Infof("Creating ClusterIP service for '%s'", function.Spec.Name)
+		if _, err := c.kubeclientset.CoreV1().Services(function.Namespace).Create(newService(function)); err != nil {
+			// If an error occurs during Service Create, we'll requeue the item
+			if errors.IsAlreadyExists(err) {
+				glog.V(2).Infof("ClusterIP service '%s' already exists. Skipping creation.", function.Spec.Name)
+			} else {
+				return err
+			}
+		}
+	}
+
 	// If an error occurs during Get/Create, we'll requeue the item so we can
 	// attempt processing again later. This could have been caused by a
 	// temporary network failure, or any other transient reason.

alexellis added a commit that referenced this issue Dec 2, 2018

Add informer for Service definitions
- adds informer for Service definitions to fix #61 and #60 -
since it seemed that deleting Service definitions was not tracked
or synchronized and the code was depending on the Deployment
from being deleted or updated.

Tested on K8s 1.10 on single node Linux cluster by deleting
Service and seeing it re-created in the logs. Happy path of
deploying from store also works unaffected.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>

@alexellis alexellis referenced this issue Dec 2, 2018

Merged

Add informer for Service definitions #62

5 of 12 tasks complete

@alexellis alexellis closed this in #62 Dec 3, 2018

alexellis added a commit that referenced this issue Dec 3, 2018

Add informer for Service definitions
- adds informer for Service definitions to fix #61 and #60 -
since it seemed that deleting Service definitions was not tracked
or synchronized and the code was depending on the Deployment
from being deleted or updated.

Tested on K8s 1.10 on single node Linux cluster by deleting
Service and seeing it re-created in the logs. Happy path of
deploying from store also works unaffected.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment