Creation of ES instance fails on OCP 4.1.2 #166

kevinearls · 2019-06-27T13:07:03Z

When I try to create an elasticsearch backed Jaeger Operator instance using on of the example yaml files (https://github.com/jaegertracing/jaeger-operator/blob/master/deploy/examples/simple-prod-deploy-es.yaml) the operator instance never starts as the elasticsearch pod never gets out of pending state.

There is nothing in the log, but I get the following output from oc get elasticsearch -o yaml:

ovpn-118-43:jaeger-operator kearls$ oc get elasticsearch -o yaml
apiVersion: v1
items:

apiVersion: logging.openshift.io/v1
kind: Elasticsearch
metadata:
creationTimestamp: "2019-06-27T12:56:31Z"
generation: 8
labels:
app: jaeger
app.kubernetes.io/co app.kubernetes.io/co app.kubernetes.io/co app.kubernetes.io/component: elasticsearch
app.kubernetes.io/instance: simple-prod
app.kubernetes.io/name: elasticsearch
app.kubernetes.io/part-of: jaeger
name: elasticsearch
namespace: fud
ownerReferences:
- apiVersion: jaegertracing.io/v1
  controller: true
  kind: Jaeger
  name: simple-prod
  uid: fbd9d6b0-98da-11e9-8a21-fa163e292f36
  resourceVersion: "412161"
  selfLink: /apis/logging.openshift.io/v1/namespaces/fud/elasticsearches/elasticsearch
  uid: fc4ed617-98da-11e9-8a21-fa163e292f36
  spec:
  managementState: Managed
  nodeSpec:
  resources: {}
  nodes:
- nodeCount: 1
  resources: {}
  roles:
  - client
  - data
  - master
    storage: {}
    redundancyPolicy: ""
    status:
    cluster:
    activePrimaryShards: 0
    activeShards: 0
    initializingShards: 0
    numDataNodes: 0
    numNodes: 0
    pendingTasks: 0
    relocatingShards: 0
    status: ""
    unassignedShards: 0
    clusterHealth: ""
    conditions:
- lastTransitionTime: "2019-06-27T12:58:33Z"
  message: Previously used GenUUID "x1s6chde" is no longer found in Spec.Nodes
  reason: Invalid Spec
  status: "True"
  type: InvalidUUID
  nodes:
- conditions:
  - lastTransitionTime: "2019-06-27T12:56:31Z"
    message: '0/5 nodes are available: 5 node(s) didn''t match node selector.'
    reason: Unschedulable
    status: "True"
    type: Unschedulable
    deploymentName: elasticsearch-cdm-x1s6chde-1
    upgradeStatus: {}
    pods:
    client:
    failed: []
    notReady:
    - elasticsearch-cdm-x1s6chde-1-644b55ccdd-b2pzg
      ready: []
      data:
      failed: []
      notReady:
    - elasticsearch-cdm-x1s6chde-1-644b55ccdd-b2pzg
      ready: []
      master:
      failed: []
      notReady:
    - elasticsearch-cdm-x1s6chde-1-644b55ccdd-b2pzg
      ready: []
      shardAllocationEnabled: shard allocation unknown
      kind: List
      metadata:
      resourceVersion: ""
      selfLink: ""

richm · 2019-06-27T13:57:36Z

@ewolinetz does this have anything to do with openshift/cluster-logging-operator#205 ?

kevinearls · 2019-06-27T14:22:35Z

fyi this worked previously in an OCP 4.1 cluster on AWS

pavolloffay · 2019-06-27T14:28:32Z

I get the same error when deployed locally on 3.11

Events:
  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  8s (x10 over 1m)  default-scheduler  0/1 nodes are available: 1 node(s) didn't match node selector.

ES operator shows an error, not sure if that is related though:

oc logs elasticsearch-operator-6f94cb54bb-dnwp5 -n openshift-logging                          4:27 
{"level":"info","ts":1561645015.1384616,"logger":"cmd","msg":"Go Version: go1.10.8"}
{"level":"info","ts":1561645015.1385818,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1561645015.1385922,"logger":"cmd","msg":"Version of operator-sdk: v0.7.0"}
{"level":"info","ts":1561645015.1387944,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1561645015.210691,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1561645015.2138357,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1561645015.2449596,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1561645015.245202,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"elasticsearch-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1561645015.3064365,"logger":"cmd","msg":"failed to create or get service for metrics: services \"elasticsearch-operator\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: no RBAC policy matched, <nil>"}
{"level":"info","ts":1561645015.3065236,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1561645015.4067228,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"}
{"level":"info","ts":1561645015.5070202,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1}
{"level":"error","ts":1561645302.627982,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"elasticsearch-controller","request":"myproject/elasticsearch","error":"Failed to reconcile Elasticsearch deployment spec: Unsupported change to UUIDs made: Previously used GenUUID \"5ojat11f\" is no longer found in Spec.Nodes","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88"

objectiser · 2019-07-04T14:22:59Z

@richm @ewolinetz As a potential workaround for the UUID issue, it seems that the CR can specify the GenUUID value to be used, instead of a random value being created by the es-operator. Do you see any issues with this approach?

We only expect to have a single ES cluster (per tenant) so hopefully using some stable UUID value would not be a problem. Although I guess each node in the CR should have a different value?

ewolinetz · 2019-07-08T14:35:31Z

Do you see any issues with this approach?

No, that should work fine so long as you don't attempt to change it after the initial creation.

Although I guess each node in the CR should have a different value?

Yes, each node should be unique within a CR so that the Operator can correctly check if node configurations have changed for that node.

pavolloffay · 2019-07-18T09:13:24Z

I think this issue can be closed.

The main reason why it was opened is that change in master broke our test pipeline. We switched our test to use release-4.1 branch instead of master to avoid breaking us. We will bump es-operator version once new stable version is out e.g. for OCP 4.2.

The check on kubernetes.io/os=linux could also use older beta.kubernetes.io/os=linux to support older OCP versions, this way it would not break us.

kevinearls mentioned this issue Jun 27, 2019

simple-prod-deploy-es.yaml example fails on OCP 4.1.2 jaegertracing/jaeger-operator#481

Closed

ewolinetz closed this as completed Jul 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creation of ES instance fails on OCP 4.1.2 #166

Creation of ES instance fails on OCP 4.1.2 #166

kevinearls commented Jun 27, 2019

richm commented Jun 27, 2019

kevinearls commented Jun 27, 2019

pavolloffay commented Jun 27, 2019

objectiser commented Jul 4, 2019

ewolinetz commented Jul 8, 2019

pavolloffay commented Jul 18, 2019

Creation of ES instance fails on OCP 4.1.2 #166

Creation of ES instance fails on OCP 4.1.2 #166

Comments

kevinearls commented Jun 27, 2019

richm commented Jun 27, 2019

kevinearls commented Jun 27, 2019

pavolloffay commented Jun 27, 2019

objectiser commented Jul 4, 2019

ewolinetz commented Jul 8, 2019

pavolloffay commented Jul 18, 2019